summaryrefslogtreecommitdiff
path: root/testsuite/tests/perf/compiler
Commit message (Collapse)AuthorAgeFilesLines
* Add performance test for ghci, -fno-code and reloading (#20509)Matthew Pickering2021-10-193-0/+48
| | | | | | This test triggers the bad code path identified by #20509 where an entry into the EPS caused by importing Control.Applicative will retain a stale HomePackageTable.
* Add GHCi recompilation performance testMatthew Pickering2021-10-123-0/+44
|
* Nested CPR light unleashed (#18174)Sebastian Graf2021-09-301-0/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
* Add performance test for #19695nineonine2021-09-112-0/+141
|
* Pmc: Better SCC annotations and trace outputSebastian Graf2021-08-231-1/+1
| | | | | | | | | | | | | While investigating #20106, I made a few refactorings to the pattern-match checker that I don't want to lose. Here are the changes: * Some key functions of the checker now have SCC annotations * Better `-ddump-ec-trace` diagnostics for easier debugging. I added 'traceWhenFailPm' to see *why* a particular `MaybeT` computation fails and made use of it in `instCon`. I also increased the acceptance threshold of T11545, which seems to fail randomly lately due to ghc/max flukes.
* Better sharing of join points (#19996)wip/T19557Simon Peyton Jones2021-07-192-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch, provoked by regressions in the text package (#19557), improves sharing of join points. This also fixes the terrible behaviour in #20049. See Note [Duplicating join points] in GHC.Core.Opt.Simplify. * In the StrictArg case of mkDupableContWithDmds, don't use Plan A for data constructors * In postInlineUnconditionally, don't inline JoinIds Avoids inlining join $j x = Just x in case blah of A -> $j x1 B -> $j x2 C -> $j x3 * In mkDupableStrictBind and mkDupableStrictAlt, create join points (much) more often: exprIsTrivial rather than exprIsDupable. This may be much, but we'll see. Metric Decrease: T12545 T13253-spj T13719 T18140 T18282 T18304 T18698a T18698b Metric Increase: T16577 T18923 T9961
* testsuite: Widen acceptance window of T12545 (#19414)Sebastian Graf2021-06-272-4/+44
| | | | | | | | | | | | | | | | | | In a sequel of #19414, I wrote a script that measures min and max allocation bounds of T12545 based on randomly modifying -dunique-increment. I got a spread of as much as 4.8%. But instead of widening the acceptance window further (to 5%), I committed the script as part of this commit, so that false positive increases can easily be diagnosed by comparing min and max bounds to HEAD. Indeed, for !5814 we have seen T12545 go from -0.3% to 3.3% after a rebase. I made sure that the min and max bounds actually stayed the same. In the future, this kind of check can very easily be done in a matter of a minute. Maybe we should increase the acceptance threshold if we need to check often (leave a comment on #19414 if you had to check), but I've not been bitten by it for half a year, which seems OK. Metric Increase: T12545
* Driver Rework PatchMatthew Pickering2021-06-032-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch comprises of four different but closely related ideas. The net result is fixing a large number of open issues with the driver whilst making it simpler to understand. 1. Use the hash of the source file to determine whether the source file has changed or not. This makes the recompilation checking more robust to modern build systems which are liable to copy files around changing their modification times. 2. Remove the concept of a "stable module", a stable module was one where the object file was older than the source file, and all transitive dependencies were also stable. Now we don't rely on the modification time of the source file, the notion of stability is moot. 3. Fix TH/plugin recompilation after the removal of stable modules. The TH recompilation check used to rely on stable modules. Now there is a uniform and simple way, we directly track the linkables which were loaded into the interpreter whilst compiling a module. This is an over-approximation but more robust wrt package dependencies changing. 4. Fix recompilation checking for dynamic object files. Now we actually check if the dynamic object file exists when compiling with -dynamic-too Fixes #19774 #19771 #19758 #17434 #11556 #9121 #8211 #16495 #7277 #16093
* Turn T11545 into a normal performance testMatthew Pickering2021-04-121-2/+2
| | | | | This makes it more robust to people running it with `quick` flavour and so on.
* Add perf test for #15304Matthew Pickering2021-04-093-0/+778
| | | | | The test max memory usage improves dramatically with the fixes to memory usage in demand analyser from #15455
* T11545 now also passes due to modifications in demand analysisMatthew Pickering2021-04-082-2/+2
| | | | Fixes #11545
* Add (expect_broken) test for #11545Matthew Pickering2021-04-052-0/+16
|
* Add compiler perf regression test for #9198Simon Jakobi2021-03-202-0/+16
|
* Use GHC2021 as default languageJoachim Breitner2021-03-107-2/+7
|
* Widen acceptance window of `MultiLayerModules` (#19293) [skip ci]Sebastian Graf2021-03-011-1/+4
| | | | | | | As #19293 realises, this one keeps on flip flopping by 2.5% depending on how many modules there are within the GHC package. We should revert this once we figured out how to fix what's going on.
* Widen acceptance window of T12545 (#19414)Sebastian Graf2021-02-281-1/+4
| | | | | | | | | This test flip-flops by +-1% in arbitrary changes in CI. While playing around with `-dunique-increment`, I could reproduce variations of 3% in compiler allocations, so I set the acceptance window accordingly. Fixes #19414.
* Revert "testsuite: Mark tests affected by #19025"Ben Gamari2021-02-241-14/+2
| | | | This reverts commit 4a9d856d21c67b3328e26aa68a071ec9a824a7bb.
* testsuite: Mark tests affected byBen Gamari2021-02-231-2/+14
|
* Remove some redundant validity checks.Richard Eisenberg2021-01-271-0/+1
| | | | | | | | | | | | This commit also consolidates documentation in the user manual around UndecidableSuperClasses, UndecidableInstances, and FlexibleContexts. Close #19186. Close #19187. Test case: typecheck/should_compile/T19186, typecheck/should_fail/T19187{,a}
* When deriving Eq always use tag based comparisons for nullary constructorsAndreas Klebinger2021-01-221-1/+1
| | | | | | | Instead of producing auxiliary con2tag bindings we now rely on dataToTag#, eliminating a fair bit of generated code. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Add regression test for #16577Sylvain Henry2021-01-172-0/+2088
|
* testsuite: Add a test for #18923Ben Gamari2020-12-052-0/+20
|
* Demand: Interleave usage and strictness demands (#18903)Sebastian Graf2020-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are *relative* in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. *Whenever `g` is called*, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
* Merge remote-tracking branch 'origin/wip/tsan/all'Ben Gamari2020-11-081-3/+6
|\
| * testsuite: Mark T9872[abc] as high_memory_usageBen Gamari2020-10-241-3/+6
| | | | | | | | These all have a maximum residency of over 2 GB.
* | testsuite: Add performance test for #18698Ben Gamari2020-11-012-0/+100
| |
* | Widen acceptance threshold for T10421aJohn Ericson2020-10-291-1/+1
|/ | | | | | | Progress towards #18842. As @sgraf812 points out, widening the window is dangerous until the exponential described in #17658 is fixed. But this test has caused enough misery and is low stakes enough that we and @bgamari think it's worth it in this one case for the time being.
* Add flags for annotating Generic{,1} methods INLINE[1] (#11068)Andrzej Rybczak2020-10-155-0/+705
| | | | | | | | Makes it possible for GHC to optimize away intermediate Generic representation for more types. Metric Increase: T12227
* Add -Wnoncanonical-{monad,monoid}-instances to standardWarningsFumiaki Kinoshita2020-10-142-2/+0
| | | | | | | | | ------------------------- Metric Decrease: T12425 Metric Increase: T17516 -------------------------
* Testsuite: increase timeout for T18223 (#18795)Sylvain Henry2020-10-091-0/+2
|
* Better eta-expansion (again) and don't specilise DFunsSimon Peyton Jones2020-09-224-102/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes #18223, which made GHC generate an exponential amount of code. There are three quite separate changes in here 1. Re-engineer eta-expansion (again). The eta-expander was generating lots of intermediate stuff, which could be optimised away, but which choked the simplifier meanwhile. Relatively easy to kill it off at source. See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity. The main new thing is the use of pushCoArg in getArg_maybe. 2. Stop Specialise specalising DFuns. This is the cause of a huge (and utterly unnecessary) blowup in program size in #18223. See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise. I also refactored the Specialise monad a bit... it was silly, because it passed on unchanging values as if they were mutable state. 3. Do an extra Simplifer run, after SpecConstra and before late-Specialise. I found (investigating perf/compiler/T16473) that failing to do this was crippling *both* SpecConstr *and* Specialise. See Note [Simplify after SpecConstr] in GHC.Core.Opt.Pipeline. This change does mean an extra run of the Simplifier, but only with -O2, and I think that's acceptable. T16473 allocates *three* times less with this change. (I changed it to check runtime rather than compile time.) Some smaller consequences * I moved pushCoercion, pushCoArg and friends from SimpleOpt to Arity, because it was needed by the new etaInfoApp. And pushCoValArg now returns a MCoercion rather than Coercion for the argument Coercion. * A minor, incidental improvement to Core pretty-printing This does fix #18223, (which was otherwise uncompilable. Hooray. But there is still a big intermediate because there are some very deeply nested types in that program. Modest reductions in compile-time allocation on a couple of benchmarks T12425 -2.0% T13253 -10.3% Metric increase with -O2, due to extra simplifier run T9233 +5.8% T12227 +1.8% T15630 +5.0% There is a spurious apparent increase on heap residency on T9630, on some architectures at least. I tried it with -G1 and the residency is essentially unchanged. Metric Increase T9233 T12227 T9630 Metric Decrease T12425 T13253
* testsuite: Specify metrics collected by T17516Ben Gamari2020-08-111-1/+1
| | | | | | | Previously it collected everything, including "max bytes used". This is problematic since the test makes no attempt to control for deviations in GC timing, resulting in high variability. Fix this by only collecting "bytes allocated".
* This patch addresses the exponential blow-up in the simplifier.Simon Peyton Jones2020-07-289-0/+386
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specifically: #13253 exponential inlining #10421 ditto #18140 strict constructors #18282 another nested-function call case This patch makes one really significant changes: change the way that mkDupableCont handles StrictArg. The details are explained in GHC.Core.Opt.Simplify Note [Duplicating StrictArg]. Specific changes * In mkDupableCont, when making auxiliary bindings for the other arguments of a call, add extra plumbing so that we don't forget the demand on them. Otherwise we haev to wait for another round of strictness analysis. But actually all the info is to hand. This change affects: - Make the strictness list in ArgInfo be [Demand] instead of [Bool], and rename it to ai_dmds. - Add as_dmd to ValArg - Simplify.makeTrivial takes a Demand - mkDupableContWithDmds takes a [Demand] There are a number of other small changes 1. For Ids that are used at most once in each branch of a case, make the occurrence analyser record the total number of syntactic occurrences. Previously we recorded just OneBranch or MultipleBranches. I thought this was going to be useful, but I ended up barely using it; see Note [Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils Actual changes: * See the occ_n_br field of OneOcc. * postInlineUnconditionally 2. I found a small perf buglet in SetLevels; see the new function GHC.Core.Opt.SetLevels.hasFreeJoin 3. Remove the sc_cci field of StrictArg. I found I could get its information from the sc_fun field instead. Less to get wrong! 4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler invariant: they line up with the value arguments beyond ai_args This allowed a bit of nice refactoring; see isStrictArgInfo, lazyArgcontext, strictArgContext There is virtually no difference in nofib. (The runtime numbers are bogus -- I tried a few manually.) Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fft +0.0% -2.0% -48.3% -49.4% 0.0% multiplier +0.0% -2.2% -50.3% -50.9% 0.0% -------------------------------------------------------------------------------- Min -0.4% -2.2% -59.2% -60.4% 0.0% Max +0.0% +0.1% +3.3% +4.9% 0.0% Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0% Test T18282 is an existing example of these deeply-nested strict calls. We get a big decrease in compile time (-85%) because so much less inlining takes place. Metric Decrease: T18282
* testsuite: Widen acceptance threshold on T5837wip/T18282Ben Gamari2020-07-131-1/+1
| | | | | | This test is positively tiny and consequently the bytes allocated measurement will be relatively noisy. Consequently I have seen this fail spuriously quite often.
* Reduce result discount in conSizeSimon Peyton Jones2020-07-132-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ticket #18282 showed that the result discount given by conSize was massively too large. This patch reduces that discount to a constant 10, which just balances the cost of the constructor application itself. Note [Constructor size and result discount] elaborates, as does the ticket #18282. Reducing result discount reduces inlining, which affects perf. I found that I could increase the unfoldingUseThrehold from 80 to 90 in compensation; in combination with the result discount change I get these overall nofib numbers: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- boyer -0.2% +5.4% -3.2% -3.4% 0.0% cichelli -0.1% +5.9% -11.2% -11.7% 0.0% compress2 -0.2% +9.6% -6.0% -6.8% 0.0% cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0% gamteb -0.2% +2.6% -13.8% -14.4% 0.0% genfft -0.1% -1.6% -29.5% -29.9% 0.0% gg -0.0% -2.2% -17.2% -17.8% -20.0% life -0.1% -2.2% -62.3% -63.4% 0.0% mate +0.0% +1.4% -5.1% -5.1% -14.3% parser -0.2% -2.1% +7.4% +6.7% 0.0% primetest -0.2% -12.8% -14.3% -14.2% 0.0% puzzle -0.2% +2.1% -10.0% -10.4% 0.0% rsa -0.2% -11.7% -3.7% -3.8% 0.0% simple -0.2% +2.8% -36.7% -38.3% -2.2% wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9% -------------------------------------------------------------------------------- Min -0.4% -19.2% -62.3% -63.4% -42.9% Max +0.3% +9.6% +7.4% +11.0% +16.7% Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7% I'm ok with these numbers, remembering that this change removes an *exponential* increase in code size in some in-the-wild cases. I investigated compress2. The difference is entirely caused by this function no longer inlining WriteRoutines.$woutputCodes = \ (w :: [CodeEvent]) -> let result_s1Sr = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of (# ww1, ww2 #) -> (ww1, ww2) in (# case result_s1Sr of (x, _) -> map @Int @Char WriteRoutines.outputCodes1 x , case result_s1Sr of { (_, y) -> y } #) It was right on the cusp before, driven by the excessive result discount. Too bad! Happily, the compiler/perf tests show a number of improvements: T12227 compiler bytes-alloc -6.6% T12545 compiler bytes-alloc -4.7% T13056 compiler bytes-alloc -3.3% T15263 runtime bytes-alloc -13.1% T17499 runtime bytes-alloc -14.3% T3294 compiler bytes-alloc -1.1% T5030 compiler bytes-alloc -11.7% T9872a compiler bytes-alloc -2.0% T9872b compiler bytes-alloc -1.2% T9872c compiler bytes-alloc -1.5% Metric Decrease: T12227 T12545 T13056 T15263 T17499 T3294 T5030 T9872a T9872b T9872c
* add reproducer for #15630Artem Pelenitsyn2020-07-122-0/+61
|
* testsuite: Widen T12234 acceptance window to 2%Ben Gamari2020-07-071-1/+1
| | | | | Previously it wasn't uncommon to see +/-1% fluctuations in compiler allocations on this test.
* test suite: add reproducer for #17516Artem Pelenitsyn2020-06-253-0/+478
|
* Relax allocation threshold for T12150.Andreas Klebinger2020-06-181-1/+6
| | | | | | | This test performs little work, so the most minor allocation changes often cause the test to fail. Increasing the threshold to 2% should help with this.
* Update testsuiteSylvain Henry2020-06-171-1/+1
| | | | | | | | | | | | | | * support detection of slow ghc-bignum backend (to replace the detection of integer-simple use). There are still some test cases that the native backend doesn't handle efficiently enough. * remove tests for GMP only functions that have been removed from ghc-bignum * fix test results showing dependent packages (e.g. integer-gmp) or showing suggested instances * fix test using Integer/Natural API or showing internal names
* T16190: only measure bytes_allocatedSylvain Henry2020-06-171-1/+1
| | | | | Just adding `{-# LANGUAGE BangPatterns #-}` makes the two other metrics fluctuate by 13%.
* testsuite: Increase size of T12150Ben Gamari2020-06-131-0/+410
| | | | | | | | | As noted in #18319, this test was previously very fragile. Increase its size to make it more likely that its fails with its newly-increased acceptance threshold. Metric Increase: T12150
* Trim the demand for recursive product typesSimon Peyton Jones2020-06-132-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ticket #18304 showed that we need to be very careful when exploring the demand (esp usage demand) on recursive product types. This patch solves the problem by trimming the demand on such types -- in effect, a form of "widening". See the Note [Trimming a demand to a type] in DmdAnal, which explains how I did this by piggy-backing on an existing mechansim for trimming demands becuase of GADTs. The significant payload of this patch is very small indeed: * Make GHC.Core.Opt.WorkWrap.Utils.typeShape use RecTcChecker to avoid looking through recursive types. But on the way * I found that ae_rec_tc was entirely inoperative and did nothing. So I removed it altogether from DmdAnal. * I moved some code around in DmdAnal and Demand. (There are no actual changes in dmdFix.) * I changed the API of DmsAnal.dmdAnalRhsLetDown to return a StrictSig rather than a decorated Id * I removed the dead function peelTsFuns from Demand Performance effects: Nofib: 0.0% changes. Not surprising, because they don't use recursive products Perf tests T12227: 1% increase in compiler allocation, becuase $cto gets w/w'd. It did not w/w before because it takes a deeply nested argument, so the worker gets too many args, so we abandon w/w altogether (see GHC.Core.Opt.WorkWrap.Utils.isWorkerSmallEnough) With this patch we trim the demands. That is not strictly necessary (since these Generic type constructors are like tuples -- they can't cause a loop) but the net result is that we now w/w $cto which is fine. UniqLoop: 16% decrease in /runtime/ allocation. The UniqSupply is a recursive product, so currently we abandon all strictness on 'churn'. With this patch 'churn' gets useful strictness, and we w/w it. Hooray Metric Decrease: UniqLoop Metric Increase: T12227
* Only test T16190 with the NCGSylvain Henry2020-06-131-1/+4
| | | | | T16190 is meant to test a NCG feature. It has already caused spurious failures in other MRs (e.g. !2165) when LLVM is used.
* Implement cast worker/wrapper properlySimon Peyton Jones2020-06-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cast worker/wrapper transformation transforms x = e |> co into y = e x = y |> co This is done by the simplifier, but we were being careless about transferring IdInfo from x to y, and about what to do if x is a NOINLNE function. This resulted in a series of bugs: #17673, #18093, #18078. This patch fixes all that: * Main change is in GHC.Core.Opt.Simplify, and the new prepareBinding function, which does this cast worker/wrapper transform. See Note [Cast worker/wrappers]. * There is quite a bit of refactoring around prepareRhs, makeTrivial etc. It's nicer now. * Some wrappers from strictness and cast w/w, notably those for a function with a NOINLINE, should inline very late. There wasn't really a mechanism for that, which was an existing bug really; so I invented a new finalPhase = Phase (-1). It's used for all simplifier runs after the user-visible phase 2,1,0 have run. (No new runs of the simplifier are introduced thereby.) See new Note [Compiler phases] in GHC.Types.Basic; the main changes are in GHC.Core.Opt.Driver * Doing this made me trip over two places where the AnonArgFlag on a FunTy was being lost so we could end up with (Num a -> ty) rather than (Num a => ty) - In coercionLKind/coercionRKind - In contHoleType in the Simplifier I fixed the former by defining mkFunctionType and using it in coercionLKind/RKind. I could have done the same for the latter, but the information is almost to hand. So I fixed the latter by - adding sc_hole_ty to ApplyToVal (like ApplyToTy), - adding as_hole_ty to ValArg (like TyArg) - adding sc_fun_ty to StrictArg Turned out I could then remove ai_type from ArgInfo. This is just moving the deck chairs around, but it worked out nicely. See the new Note [AnonArgFlag] in GHC.Types.Var * When looking at the 'arity decrease' thing (#18093) I discovered that stable unfoldings had a much lower arity than the actual optimised function. That's what led to the arity-decrease message. Simple solution: eta-expand. It's described in Note [Eta-expand stable unfoldings] in GHC.Core.Opt.Simplify * I also discovered that unsafeCoerce wasn't being inlined if the context was boring. So (\x. f (unsafeCoerce x)) would create a thunk -- yikes! I fixed that by making inlineBoringOK a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold. I also found that unsafeCoerceName was unused, so I removed it. I made a test case for #18078, and a very similar one for #17673. The net effect of all this on nofib is very modest, but positive: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- anna -0.4% -0.1% -3.1% -3.1% 0.0% fannkuch-redux -0.4% -0.3% -0.1% -0.1% 0.0% maillist -0.4% -0.1% -7.8% -1.0% -14.3% primetest -0.4% -15.6% -7.1% -6.6% 0.0% -------------------------------------------------------------------------------- Min -0.9% -15.6% -13.3% -14.2% -14.3% Max -0.3% 0.0% +12.1% +12.4% 0.0% Geometric Mean -0.4% -0.2% -2.3% -2.2% -0.1% All following metric decreases are compile-time allocation decreases between -1% and -3%: Metric Decrease: T5631 T13701 T14697 T15164
* Simple subsumptionwip/T17775Simon Peyton Jones2020-06-052-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies GHC to use simple subsumption. Ticket #17775 Implements GHC proposal #287 https://github.com/ghc-proposals/ghc-proposals/blob/master/ proposals/0287-simplify-subsumption.rst All the motivation is described there; I will not repeat it here. The implementation payload: * tcSubType and friends become noticably simpler, because it no longer uses eta-expansion when checking subsumption. * No deeplyInstantiate or deeplySkolemise That in turn means that some tests fail, by design; they can all be fixed by eta expansion. There is a list of such changes below. Implementing the patch led me into a variety of sticky corners, so the patch includes several othe changes, some quite significant: * I made String wired-in, so that "foo" :: String rather than "foo" :: [Char] This improves error messages, and fixes #15679 * The pattern match checker relies on knowing about in-scope equality constraints, andd adds them to the desugarer's environment using addTyCsDs. But the co_fn in a FunBind was missed, and for some reason simple-subsumption ends up with dictionaries there. So I added a call to addTyCsDs. This is really part of #18049. * I moved the ic_telescope field out of Implication and into ForAllSkol instead. This is a nice win; just expresses the code much better. * There was a bug in GHC.Tc.TyCl.Instance.tcDataFamInstHeader. We called checkDataKindSig inside tc_kind_sig, /before/ solveEqualities and zonking. Obviously wrong, easily fixed. * solveLocalEqualitiesX: there was a whole mess in here, around failing fast enough. I discovered a bad latent bug where we could successfully kind-check a type signature, and use it, but have unsolved constraints that could fill in coercion holes in that signature -- aargh. It's all explained in Note [Failure in local type signatures] in GHC.Tc.Solver. Much better now. * I fixed a serious bug in anonymous type holes. IN f :: Int -> (forall a. a -> _) -> Int that "_" should be a unification variable at the /outer/ level; it cannot be instantiated to 'a'. This was plain wrong. New fields mode_lvl and mode_holes in TcTyMode, and auxiliary data type GHC.Tc.Gen.HsType.HoleMode. This fixes #16292, but makes no progress towards the more ambitious #16082 * I got sucked into an enormous refactoring of the reporting of equality errors in GHC.Tc.Errors, especially in mkEqErr1 mkTyVarEqErr misMatchMsg misMatchMsgOrCND In particular, the very tricky mkExpectedActualMsg function is gone. It took me a full day. But the result is far easier to understand. (Still not easy!) This led to various minor improvements in error output, and an enormous number of test-case error wibbles. One particular point: for occurs-check errors I now just say Can't match 'a' against '[a]' rather than using the intimidating language of "occurs check". * Pretty-printing AbsBinds Tests review * Eta expansions T11305: one eta expansion T12082: one eta expansion (undefined) T13585a: one eta expansion T3102: one eta expansion T3692: two eta expansions (tricky) T2239: two eta expansions T16473: one eta determ004: two eta expansions (undefined) annfail06: two eta (undefined) T17923: four eta expansions (a strange program indeed!) tcrun035: one eta expansion * Ambiguity check at higher rank. Now that we have simple subsumption, a type like f :: (forall a. Eq a => Int) -> Int is no longer ambiguous, because we could write g :: (forall a. Eq a => Int) -> Int g = f and it'd typecheck just fine. But f's type is a bit suspicious, and we might want to consider making the ambiguity check do a check on each sub-term. Meanwhile, these tests are accepted, whereas they were previously rejected as ambiguous: T7220a T15438 T10503 T9222 * Some more interesting error message wibbles T13381: Fine: one error (Int ~ Exp Int) rather than two (Int ~ Exp Int, Exp Int ~ Int) T9834: Small change in error (improvement) T10619: Improved T2414: Small change, due to order of unification, fine T2534: A very simple case in which a change of unification order means we get tow unsolved constraints instead of one tc211: bizarre impredicative tests; just accept this for now Updates Cabal and haddock submodules. Metric Increase: T12150 T12234 T5837 haddock.base Metric Decrease: haddock.compiler haddock.Cabal haddock.base Merge note: This appears to break the `UnliftedNewtypesDifficultUnification` test. It has been marked as broken in the interest of merging. (cherry picked from commit 66b7b195cb3dce93ed5078b80bf568efae904cc5)
* testsuite: Move no_lint to the top level, tweak hie002Ömer Sinan Ağacan2020-04-101-5/+0
| | | | | | | | | | | | | | | | | | | | | | | - We don't want to benchmark linting so disable lints in hie002 perf test - Move no_lint to the top-level to be able to use it in tests other than those in `testsuite/tests/perf/compiler`. - Filter out -dstg-lint in no_lint. - hie002 allocation numbers on 32-bit are unstable, so skip it on 32-bit Metric Decrease: hie002 ManyConstructors T12150 T12234 T13035 T1969 T4801 T9233 T9961
* simplifier: Kill off ufKeenessFactorBen Gamari2020-04-071-55/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to have another factor, ufKeenessFactor, which would scale the discounts before they were subtracted from the size. This was justified with the following comment: -- We multiple the raw discounts (args_discount and result_discount) -- ty opt_UnfoldingKeenessFactor because the former have to do with -- *size* whereas the discounts imply that there's some extra -- *efficiency* to be gained (e.g. beta reductions, case reductions) -- by inlining. However, this is highly suspect since it means that we subtract a *scaled* size from an absolute size, resulting in crazy (e.g. negative) scores in some cases (#15304). We consequently killed off ufKeenessFactor and bumped up the ufUseThreshold to compensate. Adjustment of unfolding use threshold ===================================== Since this removes a discount from our inlining heuristic, I revisited our default choice of -funfolding-use-threshold to minimize the change in overall inlining behavior. Specifically, I measured runtime allocations and executable size of nofib and the testsuite performance tests built using compilers (and core libraries) built with several values of -funfolding-use-threshold. This comes as a result of a quantitative comparison of testsuite performance and code size as a function of ufUseThreshold, comparing GHC trees using values of 50, 60, 70, 80, 90, and 100. The test set consisted of nofib and the testsuite performance tests. A full summary of these measurements are found in the description of !2608 Comparing executable sizes (relative to the base commit) across all nofib tests, we see that sizes are similar to the baseline: gmean min max median thresh 50 -6.36% -7.04% -4.82% -6.46% 60 -5.04% -5.97% -3.83% -5.11% 70 -2.90% -3.84% -2.31% -2.92% 80 -0.75% -2.16% -0.42% -0.73% 90 +0.24% -0.41% +0.55% +0.26% 100 +1.36% +0.80% +1.64% +1.37% baseline +0.00% +0.00% +0.00% +0.00% Likewise, looking at runtime allocations we see that 80 gives slightly better optimisation than the baseline: gmean min max median thresh 50 +0.16% -0.16% +4.43% +0.00% 60 +0.09% -0.00% +3.10% +0.00% 70 +0.04% -0.09% +2.29% +0.00% 80 +0.02% -1.17% +2.29% +0.00% 90 -0.02% -2.59% +1.86% +0.00% 100 +0.00% -2.59% +7.51% -0.00% baseline +0.00% +0.00% +0.00% +0.00% Finally, I had to add a NOINLINE in T4306 to ensure that `upd` is worker-wrappered as the test expects. This makes me wonder whether the inlining heuristic is now too liberal as `upd` is quite a large function. The same measure was taken in T12600. Wall clock time compiling Cabal with -O0 thresh 50 60 70 80 90 100 baseline build-Cabal 93.88 89.58 92.59 90.09 100.26 94.81 89.13 Also, this change happens to avoid the spurious test output in `plugin-recomp-change` and `plugin-recomp-change-prof` (see #17308). Metric Decrease: hie002 T12234 T13035 T13719 T14683 T4801 T5631 T5642 T9020 T9872d T9961 Metric Increase: T12150 T12425 T13701 T14697 T15426 T1969 T3064 T5837 T6048 T9203 T9872a T9872b T9872c T9872d haddock.Cabal haddock.base haddock.compiler
* Major improvements to the specialiserSimon Peyton Jones2020-04-031-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is joint work of Alexis King and Simon PJ. It does some significant refactoring of the type-class specialiser. Main highlights: * We can specialise functions with types like f :: Eq a => a -> Ord b => b => blah where the classes aren't all at the front (#16473). Here we can correctly specialise 'f' based on a call like f @Int @Bool dEqInt x dOrdBool This change really happened in an earlier patch commit 2d0cf6252957b8980d89481ecd0b79891da4b14b Author: Sandy Maguire <sandy@sandymaguire.me> Date: Thu May 16 12:12:10 2019 -0400 work that this new patch builds directly on that work, and refactors it a bit. * We can specialise functions with implicit parameters (#17930) g :: (?foo :: Bool, Show a) => a -> String Previously we could not, but now they behave just like a non-class argument as in 'f' above. * We can specialise under-saturated calls, where some (but not all of the dictionary arguments are provided (#17966). For example, we can specialise the above 'f' based on a call map (f @Int dEqInt) xs even though we don't (and can't) give Ord dictionary. This may sound exotic, but #17966 is a program from the wild, and showed significant perf loss for functions like f, if you need saturation of all dictionaries. * We fix a buglet in which a floated dictionary had a bogus demand (#17810), by using zapIdDemandInfo in the NonRec case of specBind. * A tiny side benefit: we can drop dead arguments to specialised functions; see Note [Drop dead args from specialisations] * Fixed a bug in deciding what dictionaries are "interesting"; see Note [Keep the old dictionaries interesting] This is all achieved by by building on Sandy Macguire's work in defining SpecArg, which mkCallUDs uses to describe the arguments of the call. Main changes: * Main work is in specHeader, which marched down the [InBndr] from the function definition and the [SpecArg] from the call site, together. * specCalls no longer has an arity check; the entire mechanism now handles unders-saturated calls fine. * mkCallUDs decides on an argument-by-argument basis whether to specialise a particular dictionary argument; this is new. See mk_spec_arg in mkCallUDs. It looks as if there are many more lines of code, but I think that all the extra lines are comments!
* Testsuite: measure compiler stats for T16190Sylvain Henry2020-04-031-1/+1
| | | | We were mistakenly measuring program stats