summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Disable -dynamic-too if -dynamic is also passedwip/T20436Matthew Pickering2021-10-046-0/+36
| | | | | | | | | | | Before if you passed both options then you would generate two identical hi/dyn_hi and o/dyn_o files, both in the dynamic way. It's better to warn this is happening rather than duplicating the work and causing potential confusion. -dynamic-too should only be used with -static. Fixes #20436
* configure: Fix redundant-argument warning from -no-pie checkBen Gamari2021-10-031-2/+2
| | | | | | | | | | | | | | | Modern clang versions are quite picky when it comes to reporting redundant arguments. In particular, they will warn when -no-pie is passed when no linking is necessary. Previously the configure script used a `$CC -Werror -no-pie -E` invocation to test whether `-no-pie` is necessary. Unfortunately, this meant that clang would throw a redundant argument warning, causing configure to conclude that `-no-pie` was not supported. We now rather use `$CC -Werror -no-pie`, ensuring that linking is necessary and avoiding this failure mode. Fixes #20463.
* base: Update Unicode database to 14.0Ben Gamari2021-10-035-112/+208
| | | | Closes #20404.
* ci/test-metrics: Clean up various bash quoting issuesBen Gamari2021-10-031-6/+6
|
* ci: Use https:// transport and access token to push perf notesBen Gamari2021-10-032-36/+10
| | | | | | Previously we would push perf notes using a standard user and SSH key-based authentication. However, configuring SSH is unnecessarily fiddling. We now rather use HTTPS and a project access token.
* Don't use FastString for UTF-8 encoding onlySylvain Henry2021-10-023-4/+14
|
* Add (++)/literal ruleSylvain Henry2021-10-021-0/+5
| | | | | | | | | | When we derive the Show instance of the big record in #16577, I get the following compilation times (with -O): Before: 0.91s After: 0.77s Metric Decrease: T19695
* rts: Unify stack dirtiness checkBen Gamari2021-10-022-6/+6
| | | | | | This fixes an inconsistency where one dirtiness check would not mask out the STACK_DIRTY flag, meaning it may also be affected by the STACK_SANE flag.
* rts: Add missing write barriers in MVar wake-up pathsBen Gamari2021-10-022-12/+37
| | | | | | | | | | | Previously PerformPut failed to respect the non-moving collector's snapshot invariant, hiding references to an MVar and its new value by overwriting a stack frame without dirtying the stack. Fix this. PerformTake exhibited a similar bug, failing to dirty (and therefore mark) the blocked stack before mutating it. Closes #20399.
* Use eqType, not tcEqType, in metavar kind checkRichard Eisenberg2021-10-028-41/+61
| | | | | | | | | | | | Close #20356. See addendum to Note [coreView vs tcView] in GHC.Core.Type for the details. Also killed old Note about metaTyVarUpdateOK, which has been gone for some time. test case: typecheck/should_fail/T20356
* CmmToLlvm: Sign/Zero extend parameters for foreign calls on RISC-VAndreas Schwab2021-10-021-0/+1
| | | | | Like S390 and PPC64, RISC-V requires parameters for foreign calls to be extended to full words.
* Bump terminfo submodule to 0.4.1.5Ben Gamari2021-10-021-0/+0
| | | | Closes #20307.
* gitlab-ci: Bump docker imagesBen Gamari2021-10-021-1/+1
| | | | To install libncurses-dev on Debian targets.
* ci: Unset CI_* variables before run_hadrian and test_makeMatthew Pickering2021-10-011-12/+16
| | | | | | | | | | | | | | | | | | | | | | The goal here is to somewhat sanitize the environment so that performance tests don't fluctuate as much as they have been doing. In particular the length of the commit message was causing benchmarks to increase because gitlab stored the whole commit message twice in environment variables. Therefore when we used `getEnvironment` it would cause more allocation because more string would be created. See #20431 ------------------------- Metric Decrease: T10421 T13035 T18140 T18923 T9198 T12234 T12425 -------------------------
* code gen: Improve efficiency of findPrefRealRegMatthew Pickering2021-10-014-19/+54
| | | | | | | | | | | | | | | | | | Old strategy: For each variable linearly scan through all the blocks and check to see if the variable is any of the block register mappings. This is very slow when you have a lot of blocks. New strategy: Maintain a map from virtual registers to the first real register the virtual register was assigned to. Consult this map in findPrefRealReg. The map is updated when the register mapping is updated and is hidden behind the BlockAssigment abstraction. On the mmark package this reduces compilation time from about 44s to 32s. Ticket: #19471
* Convert Diagnostics GHC.Tc.Gen.* (Part 3)Aaron Allen2021-10-013-107/+266
| | | | | | Converts all diagnostics in the `GHC.Tc.Gen.Expr` module. (#20116)
* NCG: Linear-reg-alloc: A few small implemenation tweaks.Andreas Klebinger2021-09-303-25/+40
| | | | | | Removed an intermediate list via a fold. realRegsAlias: Manually inlined the list functions to get better code. Linear.hs added a bang somewhere.
* Recompilation: Handle -plugin-package correctlyMatthew Pickering2021-09-304-5/+25
| | | | | | | | | | | | | | | If a plugins was specified using the -plugin-package-(id) flag then the module it applied to was always recompiled. The recompilation checker was previously using `findImportedModule`, which looked for packages in the HPT and then in the package database but only for modules specified using `-package`. The correct lookup function for plugins is `findPluginModule`, therefore we check normal imports with `findImportedModule` and plugins with `findPluginModule`. Fixes #20417
* driver: Fix -E -XCPP, copy output from CPP ouput rather than .hs outputMatthew Pickering2021-09-301-2/+2
| | | | | | | | | Fixes #20416 I thought about adding a test for this case but I struggled to think of something robust. Grepping -v3 will include different paths on different systems and the structure of the result file depends on which preprocessor you are using.
* Rules for sized conversion primops (#19769)Sylvain Henry2021-09-3011-28/+337
| | | | | Metric Decrease: T12545
* ghc-boot: Eliminate unnecessary use of getEnvironmentBen Gamari2021-09-301-2/+2
| | | | | | | | | Previously we were using `System.Environment.getEnvironment`, which decodes all environment variables into Haskell `String`s, where a simple environment lookup would do. This made the compiler's allocations unnecessarily dependent on the environment. Fixes #20431.
* Trees That Grow refactor for HsTick and HsBinTickAndrea Condoluci2021-09-3012-86/+74
| | | | | | Move HsTick and HsBinTick to XExpr, the extension tree of HsExpr. Part of #16830 .
* Nested CPR light unleashed (#18174)Sebastian Graf2021-09-3035-443/+1631
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
* testsuite: Make cabal01 more robust to large environmentsMatthew Pickering2021-09-301-2/+8
| | | | | | | | Sebastian unfortunately wrote a very long commit message in !5667 which caused `xargs` to fail on windows because the environment was too big. Fortunately `xargs` and `rm` don't need anything from the environment so just run those commands in an empty environment (which is what env -i achieves).
* compiler: occEnvElts -> nonDetOccEnvEltsBen Gamari2021-09-296-10/+10
|
* compiler: Use seqEltsNameEnv rather that nameEnvEltsBen Gamari2021-09-292-1/+4
|
* compiler: Rename nameEnvElts -> nonDetNameEnvEltsBen Gamari2021-09-2910-13/+13
|
* compiler: Make nubAvails deterministicBen Gamari2021-09-292-5/+12
| | | | | Surprisingly this previously didn't appear to introduce any visible non-determinism but it seems worth avoiding non-determinism here.
* compiler: Fix name of GHC.Core.TyCon.Env.nameEnvEltsBen Gamari2021-09-291-3/+3
| | | | Rename to nonDetTyConEnvElts.
* compiler: Rewrite all eltsUFM occurrences to nonDetEltsUFMBen Gamari2021-09-298-11/+8
| | | | And remove the former.
* compiler: Reimplement seqEltsUFM in terms of foldBen Gamari2021-09-292-3/+3
| | | | | Rather than nonDetEltsUFM; this should eliminate some unnecessary list allocations.
* GHC: Drop dead packageDbModulesBen Gamari2021-09-291-24/+0
| | | | | It was already commented out and contained a reference to the non-deterministic nameEnvElts so let's just drop it.
* Add tests for T17820Andrea Condoluci2021-09-2911-0/+69
|
* TH stage restriction check for constructors, selectors, and class methodsAndrea Condoluci2021-09-298-31/+68
| | | | Closes ticket #17820.
* Document that `eqType`/`coreView` do not look through type familiesZiyang Liu2021-09-291-2/+4
| | | | This isn't clear from the existing doc.
* Rectifying COMMENT and `mkComment` across platforms to work with SDocBenjamin Maurer2021-09-2912-25/+27
| | | | and exhibit similar behaviors. Issue 20400
* Add a regression test for #17912Kamil Dworakowski2021-09-293-0/+36
|
* Document interaction between unsafe FFI and GCAlexander Kjeldaas2021-09-291-5/+23
| | | | | In the multi-threaded RTS this can lead to hard to debug performance issues.
* Fix comment typosKirill Zaborsky2021-09-291-2/+2
|
* Remove special case for large objects in allocateForCompactFabian Thorand2021-09-295-11/+47
| | | | | | | | | | | | | | | | | | | | | | | | allocateForCompact() is called when the current allocation for the compact region does not fit in the nursery. It previously had a special case for objects exceeding the large object threshold. In that case, it would allocate a new compact region block just for that object. That led to a lot of small blocks being allocated in compact regions with a larger default block size (`autoBlockW`). This commit removes this special case because having a lot of small compact region blocks contributes significantly to memory fragmentation. The removal should be valid because - a more generic case for allocating a new compact region block follows at the end of allocateForCompact(), and that one takes `autoBlockW` into account - the reason for allocating separate blocks for large objects in the main heap seems to be to avoid copying during GCs, but once inside the compact region, the object will never be copied anyway. Fixes #18757. A regression test T18757 was added.
* Compare FunTys as if they were TyConApps.Richard Eisenberg2021-09-2912-72/+215
| | | | | | | | | | | See Note [Equality on FunTys] in TyCoRep. Close #17675. Close #17655, about documentation improvements included in this patch. Close #19677, about a further mistake around FunTy. test cases: typecheck/should_compile/T19677
* Documented yet undocumented dump flags #18641Benjamin Maurer2021-09-282-6/+38
|
* Remove outdated note about pragma layouttaylorfausak2021-09-281-3/+1
|
* hadrian: Update comments on verbosity handlingMatthew Pickering2021-09-282-2/+14
|
* hadrian: Update documentation for new verbosity optionsMatthew Pickering2021-09-282-37/+5
|
* ci: Increase default verbosity level to `-V` (Verbose)Matthew Pickering2021-09-281-0/+1
| | | | | | | | | Given the previous commit, `-V` allows us to see some useful information in CI (such as the call stack on failure) which normally people don't want to see. As a result the $VERBOSE variable now tweaks the diagnostic level one level higher (to Diagnostic), which produces a lot of output.
* hadrian: Rework the verbosity levelsMatthew Pickering2021-09-286-26/+27
| | | | | | | | | | | | | Before we really only had two verbosity levels, normal and verbose. There are now three levels: Normal: Commands show stderr (no stdout) and minimal build failure messages. Verbose (-V): Commands also show stdout, build failure message contains callstack and additional information Diagnostic (-VV): Very verbose output showing all command lines and passing -v3 to cabal commands. -V is similar to the default verbosity from before (but a little more verbose)
* hadrian: Remove deprecated tracing functionsMatthew Pickering2021-09-2814-27/+27
|
* hadrian: Reduce default verbosityMatthew Pickering2021-09-283-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | This change reduces the default verbosity of error messages to omit the stack trace information from the printed output. For example, before all errors would have a long call trace: ``` Error when running Shake build system: at action, called at src/Rules.hs:39:19 in main:Rules at need, called at src/Rules.hs:61:5 in main:Rules * Depends on: _build/stage1/lib/package.conf.d/ghc-9.3.conf * Depends on: _build/stage1/compiler/build/libHSghc-9.3.a * Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o * Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.hi at cmd', called at src/Builder.hs:330:23 in main:Builder at cmd, called at src/Builder.hs:432:8 in main:Builder * Raised the exception: ``` Which can be useful but it confusing for GHC rather than hadrian developers. Ticket #20386
* Add `-dsuppress-core-sizes` flag (#20342)Sylvain Henry2021-09-2825-8/+201
| | | | | This flag is used to remove the output of core stats per binding in Core dumps.