delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add performance test for ghci, -fno-code and reloading (#20509)	Matthew Pickering	2021-10-19	3	-0/+48
\| \| \| \| \| \|	This test triggers the bad code path identified by #20509 where an entry into the EPS caused by importing Control.Applicative will retain a stale HomePackageTable.
*	Add GHCi recompilation performance test	Matthew Pickering	2021-10-12	3	-0/+44
\|
*	New test case: Variant of T14052 with data type definitions	Joachim Breitner	2021-10-08	3	-0/+1011
\| \| \| \| \| \| \| \| \| \| \| \| \|	previous attempts at fixing #11547 and #20455 were reverted because they showed some quadratic behaviour, and the test case T15052 was added to catch that. I believe that similar quadratic behavor can be triggered with current master, by using type definitions rather than value definitions, so this adds a test case similar to T14052. I have hopes that my attempts at fixing #11547 will lead to code that avoid the quadratic increase here. Or not, we will see. In any case, having this in `master` and included in future comparisons will be useful.
*	Nested CPR light unleashed (#18174)	Sebastian Graf	2021-09-30	1	-0/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG improved. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
*	Add performance test for #19695	nineonine	2021-09-11	2	-0/+141
\|
*	Pmc: Better SCC annotations and trace output	Sebastian Graf	2021-08-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	While investigating #20106, I made a few refactorings to the pattern-match checker that I don't want to lose. Here are the changes: * Some key functions of the checker now have SCC annotations * Better `-ddump-ec-trace` diagnostics for easier debugging. I added 'traceWhenFailPm' to see why a particular `MaybeT` computation fails and made use of it in `instCon`. I also increased the acceptance threshold of T11545, which seems to fail randomly lately due to ghc/max flukes.
*	Better sharing of join points (#19996)wip/T19557	Simon Peyton Jones	2021-07-19	2	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch, provoked by regressions in the text package (#19557), improves sharing of join points. This also fixes the terrible behaviour in #20049. See Note [Duplicating join points] in GHC.Core.Opt.Simplify. * In the StrictArg case of mkDupableContWithDmds, don't use Plan A for data constructors * In postInlineUnconditionally, don't inline JoinIds Avoids inlining join $j x = Just x in case blah of A -> $j x1 B -> $j x2 C -> $j x3 * In mkDupableStrictBind and mkDupableStrictAlt, create join points (much) more often: exprIsTrivial rather than exprIsDupable. This may be much, but we'll see. Metric Decrease: T12545 T13253-spj T13719 T18140 T18282 T18304 T18698a T18698b Metric Increase: T16577 T18923 T9961
*	testsuite: Widen acceptance window of T12545 (#19414)	Sebastian Graf	2021-06-27	2	-4/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a sequel of #19414, I wrote a script that measures min and max allocation bounds of T12545 based on randomly modifying -dunique-increment. I got a spread of as much as 4.8%. But instead of widening the acceptance window further (to 5%), I committed the script as part of this commit, so that false positive increases can easily be diagnosed by comparing min and max bounds to HEAD. Indeed, for !5814 we have seen T12545 go from -0.3% to 3.3% after a rebase. I made sure that the min and max bounds actually stayed the same. In the future, this kind of check can very easily be done in a matter of a minute. Maybe we should increase the acceptance threshold if we need to check often (leave a comment on #19414 if you had to check), but I've not been bitten by it for half a year, which seems OK. Metric Increase: T12545
*	Driver Rework Patch	Matthew Pickering	2021-06-03	4	-109/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch comprises of four different but closely related ideas. The net result is fixing a large number of open issues with the driver whilst making it simpler to understand. 1. Use the hash of the source file to determine whether the source file has changed or not. This makes the recompilation checking more robust to modern build systems which are liable to copy files around changing their modification times. 2. Remove the concept of a "stable module", a stable module was one where the object file was older than the source file, and all transitive dependencies were also stable. Now we don't rely on the modification time of the source file, the notion of stability is moot. 3. Fix TH/plugin recompilation after the removal of stable modules. The TH recompilation check used to rely on stable modules. Now there is a uniform and simple way, we directly track the linkables which were loaded into the interpreter whilst compiling a module. This is an over-approximation but more robust wrt package dependencies changing. 4. Fix recompilation checking for dynamic object files. Now we actually check if the dynamic object file exists when compiling with -dynamic-too Fixes #19774 #19771 #19758 #17434 #11556 #9121 #8211 #16495 #7277 #16093
*	test driver: Make sure RESIDENCY_OPTS is passed for 'all' perf tests	Matthew Pickering	2021-05-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes #19731 ------------------------- Metric Decrease: T11545 Metric Increase: T12545 T15304 -------------------------
*	Turn T11545 into a normal performance test	Matthew Pickering	2021-04-12	1	-2/+2
\| \| \| \| \|	This makes it more robust to people running it with `quick` flavour and so on.
*	Add perf test for #15304	Matthew Pickering	2021-04-09	3	-0/+778
\| \| \| \| \|	The test max memory usage improves dramatically with the fixes to memory usage in demand analyser from #15455
*	T11545 now also passes due to modifications in demand analysis	Matthew Pickering	2021-04-08	2	-2/+2
\| \| \| \|	Fixes #11545
*	Add (expect_broken) test for #11545	Matthew Pickering	2021-04-05	2	-0/+16
\|
*	Add regression test for T19474.	Andreas Klebinger	2021-04-05	2	-0/+19
\| \| \| \| \| \| \|	In version 0.12.2.0 of vector when used with GHC-9.0 we rebox values from storeable mutable vectors. This should catch such a change in the future.
*	Data.List specialization to []	Oleg Grenrus	2021-04-01	1	-0/+1
\| \| \| \| \| \| \|	- Remove GHC.OldList - Remove Data.OldList - compat-unqualified-imports is no-op - update haddock submodule
*	Add fold vs. mconcat test T17123	Viktor Dukhovni	2021-03-20	2	-0/+14
\|
*	Add compiler perf regression test for #9198	Simon Jakobi	2021-03-20	2	-0/+16
\|
*	Use GHC2021 as default language	Joachim Breitner	2021-03-10	8	-4/+10
\|
*	Bump bytestring submodule to 0.11.1.0	Ben Gamari	2021-03-10	1	-2/+1
\|
*	Widen acceptance window of `MultiLayerModules` (#19293) [skip ci]	Sebastian Graf	2021-03-01	1	-1/+4
\| \| \| \| \| \| \|	As #19293 realises, this one keeps on flip flopping by 2.5% depending on how many modules there are within the GHC package. We should revert this once we figured out how to fix what's going on.
*	Widen acceptance window of T12545 (#19414)	Sebastian Graf	2021-02-28	1	-1/+4
\| \| \| \| \| \| \| \| \|	This test flip-flops by +-1% in arbitrary changes in CI. While playing around with `-dunique-increment`, I could reproduce variations of 3% in compiler allocations, so I set the acceptance window accordingly. Fixes #19414.
*	Revert "testsuite: Mark tests affected by #19025"	Ben Gamari	2021-02-24	1	-14/+2
\| \| \| \|	This reverts commit 4a9d856d21c67b3328e26aa68a071ec9a824a7bb.
*	testsuite: Mark tests affected by	Ben Gamari	2021-02-23	1	-2/+14
\|
*	Fix over-eager inlining in SimpleOpt	Simon Peyton Jones	2021-02-14	3	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In GHC.Core.SimpleOpt, I found that its inlining could duplicate an arbitary redex inside a lambda! Consider (\xyz. x+y). The occurrence-analysis treats the lamdda as a group, and says that both x and y occur once, even though the occur under the lambda-z. See Note [Occurrence analysis for lambda binders] in OccurAnal. When the lambda is under-applied in a call, the Simplifier is careful to zap the occ-info on x,y, because they appear under the \z. (See the call to zapLamBndrs in simplExprF1.) But SimpleOpt missed this test, resulting in #19347. So this patch * commons up the binder-zapping in GHC.Core.Utils.zapLamBndrs. * Calls this new function from GHC.Core.Opt.Simplify * Adds a call to zapLamBndrs to GHC.Core.SimpleOpt.simple_app This change makes test T12990 regress somewhat, but it was always very delicate, so I'm going to put up with that. In this voyage I also discovered a small, rather unrelated infelicity in the Simplifier: * In GHC.Core.Opt.Simplify.simplNonRecX we should apply isStrictId to the OutId not the InId. See Note [Dark corner with levity polymorphism] It may never "bite", because SimpleOpt should have inlined all the levity-polymorphic compulsory inlnings already, but somehow it bit me at one point and it's generally a more solid thing to do. Fixing the main bug increases runtime allocation in test perf/should_run/T12990, for (acceptable) reasons explained in a comement on Metric Increase: T12990
*	Remove some redundant validity checks.	Richard Eisenberg	2021-01-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This commit also consolidates documentation in the user manual around UndecidableSuperClasses, UndecidableInstances, and FlexibleContexts. Close #19186. Close #19187. Test case: typecheck/should_compile/T19186, typecheck/should_fail/T19187{,a}
*	When deriving Eq always use tag based comparisons for nullary constructors	Andreas Klebinger	2021-01-22	1	-1/+1
\| \| \| \| \| \| \|	Instead of producing auxiliary con2tag bindings we now rely on dataToTag#, eliminating a fair bit of generated code. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	Add regression test for #16577	Sylvain Henry	2021-01-17	2	-0/+2088
\|
*	testsuite: Add a test for #18923	Ben Gamari	2020-12-05	2	-0/+20
\|
*	Demand: Interleave usage and strictness demands (#18903)	Sebastian Graf	2020-11-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are relative in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. Whenever `g` is called, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
*	Merge remote-tracking branch 'origin/wip/tsan/all'	Ben Gamari	2020-11-08	1	-3/+6
\|\
\| *	testsuite: Mark T9872[abc] as high_memory_usage	Ben Gamari	2020-10-24	1	-3/+6
\| \| \| \| \| \| \| \|	These all have a maximum residency of over 2 GB.
* \|	testsuite: Add performance test for #18698	Ben Gamari	2020-11-01	2	-0/+100
\| \|
* \|	Widen acceptance threshold for T10421a	John Ericson	2020-10-29	1	-1/+1
\|/ \| \| \| \| \| \|	Progress towards #18842. As @sgraf812 points out, widening the window is dangerous until the exponential described in #17658 is fixed. But this test has caused enough misery and is low stakes enough that we and @bgamari think it's worth it in this one case for the time being.
*	Add flags for annotating Generic{,1} methods INLINE[1] (#11068)	Andrzej Rybczak	2020-10-15	5	-0/+705
\| \| \| \| \| \| \| \|	Makes it possible for GHC to optimize away intermediate Generic representation for more types. Metric Increase: T12227
*	Add -Wnoncanonical-{monad,monoid}-instances to standardWarnings	Fumiaki Kinoshita	2020-10-14	2	-2/+0
\| \| \| \| \| \| \| \| \|	------------------------- Metric Decrease: T12425 Metric Increase: T17516 -------------------------
*	Testsuite: increase timeout for T18223 (#18795)	Sylvain Henry	2020-10-09	1	-0/+2
\|
*	Use proper RTS flags when collecting residency in perf tests.	Andreas Klebinger	2020-10-09	2	-15/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace options like collect_stats(['peak_megabytes_allocated'],4) with collect_runtime_residency(4) and so forth. Reason being that the later also supplies some default RTS arguments which make sure residency does not fluctuate too much. The new flags mean we get new (hopefully more accurate) baselines so accept the stat changes. ------------------------- Metric Decrease: T4029 T4334 T7850 Metric Increase: T13218 T7436 -------------------------
*	Add test for T18574	Sylvain Henry	2020-10-01	3	-0/+14
\|
*	Better eta-expansion (again) and don't specilise DFuns	Simon Peyton Jones	2020-09-22	4	-102/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes #18223, which made GHC generate an exponential amount of code. There are three quite separate changes in here 1. Re-engineer eta-expansion (again). The eta-expander was generating lots of intermediate stuff, which could be optimised away, but which choked the simplifier meanwhile. Relatively easy to kill it off at source. See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity. The main new thing is the use of pushCoArg in getArg_maybe. 2. Stop Specialise specalising DFuns. This is the cause of a huge (and utterly unnecessary) blowup in program size in #18223. See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise. I also refactored the Specialise monad a bit... it was silly, because it passed on unchanging values as if they were mutable state. 3. Do an extra Simplifer run, after SpecConstra and before late-Specialise. I found (investigating perf/compiler/T16473) that failing to do this was crippling both SpecConstr and Specialise. See Note [Simplify after SpecConstr] in GHC.Core.Opt.Pipeline. This change does mean an extra run of the Simplifier, but only with -O2, and I think that's acceptable. T16473 allocates three times less with this change. (I changed it to check runtime rather than compile time.) Some smaller consequences * I moved pushCoercion, pushCoArg and friends from SimpleOpt to Arity, because it was needed by the new etaInfoApp. And pushCoValArg now returns a MCoercion rather than Coercion for the argument Coercion. * A minor, incidental improvement to Core pretty-printing This does fix #18223, (which was otherwise uncompilable. Hooray. But there is still a big intermediate because there are some very deeply nested types in that program. Modest reductions in compile-time allocation on a couple of benchmarks T12425 -2.0% T13253 -10.3% Metric increase with -O2, due to extra simplifier run T9233 +5.8% T12227 +1.8% T15630 +5.0% There is a spurious apparent increase on heap residency on T9630, on some architectures at least. I tried it with -G1 and the residency is essentially unchanged. Metric Increase T9233 T12227 T9630 Metric Decrease T12425 T13253
*	testsuite: Specify metrics collected by T17516	Ben Gamari	2020-08-11	1	-1/+1
\| \| \| \| \| \| \|	Previously it collected everything, including "max bytes used". This is problematic since the test makes no attempt to control for deviations in GC timing, resulting in high variability. Fix this by only collecting "bytes allocated".
*	This patch addresses the exponential blow-up in the simplifier.	Simon Peyton Jones	2020-07-28	9	-0/+386
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically: #13253 exponential inlining #10421 ditto #18140 strict constructors #18282 another nested-function call case This patch makes one really significant changes: change the way that mkDupableCont handles StrictArg. The details are explained in GHC.Core.Opt.Simplify Note [Duplicating StrictArg]. Specific changes * In mkDupableCont, when making auxiliary bindings for the other arguments of a call, add extra plumbing so that we don't forget the demand on them. Otherwise we haev to wait for another round of strictness analysis. But actually all the info is to hand. This change affects: - Make the strictness list in ArgInfo be [Demand] instead of [Bool], and rename it to ai_dmds. - Add as_dmd to ValArg - Simplify.makeTrivial takes a Demand - mkDupableContWithDmds takes a [Demand] There are a number of other small changes 1. For Ids that are used at most once in each branch of a case, make the occurrence analyser record the total number of syntactic occurrences. Previously we recorded just OneBranch or MultipleBranches. I thought this was going to be useful, but I ended up barely using it; see Note [Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils Actual changes: * See the occ_n_br field of OneOcc. * postInlineUnconditionally 2. I found a small perf buglet in SetLevels; see the new function GHC.Core.Opt.SetLevels.hasFreeJoin 3. Remove the sc_cci field of StrictArg. I found I could get its information from the sc_fun field instead. Less to get wrong! 4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler invariant: they line up with the value arguments beyond ai_args This allowed a bit of nice refactoring; see isStrictArgInfo, lazyArgcontext, strictArgContext There is virtually no difference in nofib. (The runtime numbers are bogus -- I tried a few manually.) Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fft +0.0% -2.0% -48.3% -49.4% 0.0% multiplier +0.0% -2.2% -50.3% -50.9% 0.0% -------------------------------------------------------------------------------- Min -0.4% -2.2% -59.2% -60.4% 0.0% Max +0.0% +0.1% +3.3% +4.9% 0.0% Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0% Test T18282 is an existing example of these deeply-nested strict calls. We get a big decrease in compile time (-85%) because so much less inlining takes place. Metric Decrease: T18282
*	testsuite: Widen acceptance threshold on T5837wip/T18282	Ben Gamari	2020-07-13	1	-1/+1
\| \| \| \| \| \|	This test is positively tiny and consequently the bytes allocated measurement will be relatively noisy. Consequently I have seen this fail spuriously quite often.
*	Reduce result discount in conSize	Simon Peyton Jones	2020-07-13	2	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ticket #18282 showed that the result discount given by conSize was massively too large. This patch reduces that discount to a constant 10, which just balances the cost of the constructor application itself. Note [Constructor size and result discount] elaborates, as does the ticket #18282. Reducing result discount reduces inlining, which affects perf. I found that I could increase the unfoldingUseThrehold from 80 to 90 in compensation; in combination with the result discount change I get these overall nofib numbers: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- boyer -0.2% +5.4% -3.2% -3.4% 0.0% cichelli -0.1% +5.9% -11.2% -11.7% 0.0% compress2 -0.2% +9.6% -6.0% -6.8% 0.0% cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0% gamteb -0.2% +2.6% -13.8% -14.4% 0.0% genfft -0.1% -1.6% -29.5% -29.9% 0.0% gg -0.0% -2.2% -17.2% -17.8% -20.0% life -0.1% -2.2% -62.3% -63.4% 0.0% mate +0.0% +1.4% -5.1% -5.1% -14.3% parser -0.2% -2.1% +7.4% +6.7% 0.0% primetest -0.2% -12.8% -14.3% -14.2% 0.0% puzzle -0.2% +2.1% -10.0% -10.4% 0.0% rsa -0.2% -11.7% -3.7% -3.8% 0.0% simple -0.2% +2.8% -36.7% -38.3% -2.2% wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9% -------------------------------------------------------------------------------- Min -0.4% -19.2% -62.3% -63.4% -42.9% Max +0.3% +9.6% +7.4% +11.0% +16.7% Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7% I'm ok with these numbers, remembering that this change removes an exponential increase in code size in some in-the-wild cases. I investigated compress2. The difference is entirely caused by this function no longer inlining WriteRoutines.$woutputCodes = \ (w :: [CodeEvent]) -> let result_s1Sr = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of (# ww1, ww2 #) -> (ww1, ww2) in (# case result_s1Sr of (x, _) -> map @Int @Char WriteRoutines.outputCodes1 x , case result_s1Sr of { (_, y) -> y } #) It was right on the cusp before, driven by the excessive result discount. Too bad! Happily, the compiler/perf tests show a number of improvements: T12227 compiler bytes-alloc -6.6% T12545 compiler bytes-alloc -4.7% T13056 compiler bytes-alloc -3.3% T15263 runtime bytes-alloc -13.1% T17499 runtime bytes-alloc -14.3% T3294 compiler bytes-alloc -1.1% T5030 compiler bytes-alloc -11.7% T9872a compiler bytes-alloc -2.0% T9872b compiler bytes-alloc -1.2% T9872c compiler bytes-alloc -1.5% Metric Decrease: T12227 T12545 T13056 T15263 T17499 T3294 T5030 T9872a T9872b T9872c
*	add reproducer for #15630	Artem Pelenitsyn	2020-07-12	2	-0/+61
\|
*	testsuite: Widen T12234 acceptance window to 2%	Ben Gamari	2020-07-07	1	-1/+1
\| \| \| \| \|	Previously it wasn't uncommon to see +/-1% fluctuations in compiler allocations on this test.
*	test suite: add reproducer for #17516	Artem Pelenitsyn	2020-06-25	3	-0/+478
\|
*	Relax allocation threshold for T12150.	Andreas Klebinger	2020-06-18	1	-1/+6
\| \| \| \| \| \| \|	This test performs little work, so the most minor allocation changes often cause the test to fail. Increasing the threshold to 2% should help with this.
*	Update testsuite	Sylvain Henry	2020-06-17	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* support detection of slow ghc-bignum backend (to replace the detection of integer-simple use). There are still some test cases that the native backend doesn't handle efficiently enough. * remove tests for GMP only functions that have been removed from ghc-bignum * fix test results showing dependent packages (e.g. integer-gmp) or showing suggested instances * fix test using Integer/Natural API or showing internal names
*	T16190: only measure bytes_allocated	Sylvain Henry	2020-06-17	1	-1/+1
\| \| \| \| \|	Just adding `{-# LANGUAGE BangPatterns #-}` makes the two other metrics fluctuate by 13%.