| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
This test triggers the bad code path identified by #20509 where an entry
into the EPS caused by importing Control.Applicative will retain a stale
HomePackageTable.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
While investigating #20106, I made a few refactorings to the pattern-match
checker that I don't want to lose. Here are the changes:
* Some key functions of the checker now have SCC annotations
* Better `-ddump-ec-trace` diagnostics for easier debugging. I added
'traceWhenFailPm' to see *why* a particular `MaybeT` computation fails and
made use of it in `instCon`.
I also increased the acceptance threshold of T11545, which seems to fail
randomly lately due to ghc/max flukes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch, provoked by regressions in the text package
(#19557), improves sharing of join points. This also fixes
the terrible behaviour in #20049.
See Note [Duplicating join points] in GHC.Core.Opt.Simplify.
* In the StrictArg case of mkDupableContWithDmds, don't
use Plan A for data constructors
* In postInlineUnconditionally, don't inline JoinIds
Avoids inlining join $j x = Just x
in case blah of
A -> $j x1
B -> $j x2
C -> $j x3
* In mkDupableStrictBind and mkDupableStrictAlt, create
join points (much) more often: exprIsTrivial rather than
exprIsDupable. This may be much, but we'll see.
Metric Decrease:
T12545
T13253-spj
T13719
T18140
T18282
T18304
T18698a
T18698b
Metric Increase:
T16577
T18923
T9961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a sequel of #19414, I wrote a script that measures min and max allocation
bounds of T12545 based on randomly modifying -dunique-increment. I got a spread
of as much as 4.8%. But instead of widening the acceptance window further (to
5%), I committed the script as part of this commit, so that false positive
increases can easily be diagnosed by comparing min and max bounds to HEAD.
Indeed, for !5814 we have seen T12545 go from -0.3% to 3.3% after a rebase.
I made sure that the min and max bounds actually stayed the same.
In the future, this kind of check can very easily be done in a matter of a
minute. Maybe we should increase the acceptance threshold if we need to check
often (leave a comment on #19414 if you had to check), but I've not been bitten
by it for half a year, which seems OK.
Metric Increase:
T12545
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch comprises of four different but closely related ideas. The
net result is fixing a large number of open issues with the driver
whilst making it simpler to understand.
1. Use the hash of the source file to determine whether the source file
has changed or not. This makes the recompilation checking more robust to
modern build systems which are liable to copy files around changing
their modification times.
2. Remove the concept of a "stable module", a stable module was one
where the object file was older than the source file, and all transitive
dependencies were also stable. Now we don't rely on the modification
time of the source file, the notion of stability is moot.
3. Fix TH/plugin recompilation after the removal of stable modules. The
TH recompilation check used to rely on stable modules. Now there is a
uniform and simple way, we directly track the linkables which were
loaded into the interpreter whilst compiling a module. This is an
over-approximation but more robust wrt package dependencies changing.
4. Fix recompilation checking for dynamic object files. Now we actually
check if the dynamic object file exists when compiling with -dynamic-too
Fixes #19774 #19771 #19758 #17434 #11556 #9121 #8211 #16495 #7277 #16093
|
| |
|
|
|
| |
This makes it more robust to people running it with `quick` flavour and
so on.
|
| |
|
|
|
| |
The test max memory usage improves dramatically with the fixes to
memory usage in demand analyser from #15455
|
| |
|
|
| |
Fixes #11545
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
As #19293 realises, this one keeps on flip flopping by 2.5%
depending on how many modules there are within the GHC package.
We should revert this once we figured out how to fix what's going on.
|
| |
|
|
|
|
|
|
|
| |
This test flip-flops by +-1% in arbitrary changes in CI.
While playing around with `-dunique-increment`, I could reproduce
variations of 3% in compiler allocations, so I set the acceptance window
accordingly.
Fixes #19414.
|
| |
|
|
| |
This reverts commit 4a9d856d21c67b3328e26aa68a071ec9a824a7bb.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This commit also consolidates documentation in the user
manual around UndecidableSuperClasses, UndecidableInstances,
and FlexibleContexts.
Close #19186.
Close #19187.
Test case: typecheck/should_compile/T19186,
typecheck/should_fail/T19187{,a}
|
| |
|
|
|
|
|
| |
Instead of producing auxiliary con2tag bindings we now rely on
dataToTag#, eliminating a fair bit of generated code.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As outlined in #18903, interleaving usage and strictness demands not
only means a more compact demand representation, but also allows us to
express demands that we weren't easily able to express before.
Call demands are *relative* in the sense that a call demand `Cn(cd)`
on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
result is used according to `cd`". Example from #18903:
```hs
h :: Int -> Int
h m =
let g :: Int -> (Int,Int)
g 1 = (m, 0)
g n = (2 * n, 2 `div` n)
{-# NOINLINE g #-}
in case m of
1 -> 0
2 -> snd (g m)
_ -> uncurry (+) (g m)
```
Without the interleaved representation, we would just get `L` for the
strictness demand on `g`. Now we are able to express that whenever
`g` is called, its second component is used strictly in denoting `g`
by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
division, for example.
Fixes #18903.
While fixing regressions, I also discovered and fixed #18957.
Metric Decrease:
T13253-spj
|
| |\ |
|
| | |
| |
| |
| | |
These all have a maximum residency of over 2 GB.
|
| | | |
|
| |/
|
|
|
|
|
| |
Progress towards #18842. As @sgraf812 points out, widening the window is
dangerous until the exponential described in #17658 is fixed. But this
test has caused enough misery and is low stakes enough that we and
@bgamari think it's worth it in this one case for the time being.
|
| |
|
|
|
|
|
|
| |
Makes it possible for GHC to optimize away intermediate Generic representation
for more types.
Metric Increase:
T12227
|
| |
|
|
|
|
|
|
|
| |
-------------------------
Metric Decrease:
T12425
Metric Increase:
T17516
-------------------------
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes #18223, which made GHC generate an exponential
amount of code. There are three quite separate changes in here
1. Re-engineer eta-expansion (again). The eta-expander was
generating lots of intermediate stuff, which could be optimised
away, but which choked the simplifier meanwhile. Relatively
easy to kill it off at source.
See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity.
The main new thing is the use of pushCoArg in getArg_maybe.
2. Stop Specialise specalising DFuns. This is the cause of a huge
(and utterly unnecessary) blowup in program size in #18223.
See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise.
I also refactored the Specialise monad a bit... it was silly,
because it passed on unchanging values as if they were mutable
state.
3. Do an extra Simplifer run, after SpecConstra and before
late-Specialise. I found (investigating perf/compiler/T16473)
that failing to do this was crippling *both* SpecConstr *and*
Specialise. See Note [Simplify after SpecConstr] in
GHC.Core.Opt.Pipeline.
This change does mean an extra run of the Simplifier, but only
with -O2, and I think that's acceptable.
T16473 allocates *three* times less with this change. (I changed
it to check runtime rather than compile time.)
Some smaller consequences
* I moved pushCoercion, pushCoArg and friends from SimpleOpt
to Arity, because it was needed by the new etaInfoApp.
And pushCoValArg now returns a MCoercion rather than Coercion for
the argument Coercion.
* A minor, incidental improvement to Core pretty-printing
This does fix #18223, (which was otherwise uncompilable. Hooray. But
there is still a big intermediate because there are some very deeply
nested types in that program.
Modest reductions in compile-time allocation on a couple of benchmarks
T12425 -2.0%
T13253 -10.3%
Metric increase with -O2, due to extra simplifier run
T9233 +5.8%
T12227 +1.8%
T15630 +5.0%
There is a spurious apparent increase on heap residency on T9630,
on some architectures at least. I tried it with -G1 and the residency
is essentially unchanged.
Metric Increase
T9233
T12227
T9630
Metric Decrease
T12425
T13253
|
| |
|
|
|
|
|
| |
Previously it collected everything, including "max bytes used". This is
problematic since the test makes no attempt to control for deviations in
GC timing, resulting in high variability. Fix this by only collecting
"bytes allocated".
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically:
#13253 exponential inlining
#10421 ditto
#18140 strict constructors
#18282 another nested-function call case
This patch makes one really significant changes: change the way that
mkDupableCont handles StrictArg. The details are explained in
GHC.Core.Opt.Simplify Note [Duplicating StrictArg].
Specific changes
* In mkDupableCont, when making auxiliary bindings for the other arguments
of a call, add extra plumbing so that we don't forget the demand on them.
Otherwise we haev to wait for another round of strictness analysis. But
actually all the info is to hand. This change affects:
- Make the strictness list in ArgInfo be [Demand] instead of [Bool],
and rename it to ai_dmds.
- Add as_dmd to ValArg
- Simplify.makeTrivial takes a Demand
- mkDupableContWithDmds takes a [Demand]
There are a number of other small changes
1. For Ids that are used at most once in each branch of a case, make
the occurrence analyser record the total number of syntactic
occurrences. Previously we recorded just OneBranch or
MultipleBranches.
I thought this was going to be useful, but I ended up barely
using it; see Note [Note [Suppress exponential blowup] in
GHC.Core.Opt.Simplify.Utils
Actual changes:
* See the occ_n_br field of OneOcc.
* postInlineUnconditionally
2. I found a small perf buglet in SetLevels; see the new
function GHC.Core.Opt.SetLevels.hasFreeJoin
3. Remove the sc_cci field of StrictArg. I found I could get
its information from the sc_fun field instead. Less to get
wrong!
4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler
invariant: they line up with the value arguments beyond ai_args
This allowed a bit of nice refactoring; see isStrictArgInfo,
lazyArgcontext, strictArgContext
There is virtually no difference in nofib. (The runtime numbers
are bogus -- I tried a few manually.)
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
fft +0.0% -2.0% -48.3% -49.4% 0.0%
multiplier +0.0% -2.2% -50.3% -50.9% 0.0%
--------------------------------------------------------------------------------
Min -0.4% -2.2% -59.2% -60.4% 0.0%
Max +0.0% +0.1% +3.3% +4.9% 0.0%
Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0%
Test T18282 is an existing example of these deeply-nested strict calls.
We get a big decrease in compile time (-85%) because so much less
inlining takes place.
Metric Decrease:
T18282
|
| |
|
|
|
|
| |
This test is positively tiny and consequently the bytes allocated
measurement will be relatively noisy. Consequently I have seen this
fail spuriously quite often.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ticket #18282 showed that the result discount given by conSize
was massively too large. This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.
Note [Constructor size and result discount] elaborates, as
does the ticket #18282.
Reducing result discount reduces inlining, which affects perf. I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
boyer -0.2% +5.4% -3.2% -3.4% 0.0%
cichelli -0.1% +5.9% -11.2% -11.7% 0.0%
compress2 -0.2% +9.6% -6.0% -6.8% 0.0%
cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0%
gamteb -0.2% +2.6% -13.8% -14.4% 0.0%
genfft -0.1% -1.6% -29.5% -29.9% 0.0%
gg -0.0% -2.2% -17.2% -17.8% -20.0%
life -0.1% -2.2% -62.3% -63.4% 0.0%
mate +0.0% +1.4% -5.1% -5.1% -14.3%
parser -0.2% -2.1% +7.4% +6.7% 0.0%
primetest -0.2% -12.8% -14.3% -14.2% 0.0%
puzzle -0.2% +2.1% -10.0% -10.4% 0.0%
rsa -0.2% -11.7% -3.7% -3.8% 0.0%
simple -0.2% +2.8% -36.7% -38.3% -2.2%
wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9%
--------------------------------------------------------------------------------
Min -0.4% -19.2% -62.3% -63.4% -42.9%
Max +0.3% +9.6% +7.4% +11.0% +16.7%
Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7%
I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.
I investigated compress2. The difference is entirely caused by this
function no longer inlining
WriteRoutines.$woutputCodes
= \ (w :: [CodeEvent]) ->
let result_s1Sr
= case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
(# ww1, ww2 #) -> (ww1, ww2)
in (# case result_s1Sr of (x, _) ->
map @Int @Char WriteRoutines.outputCodes1 x
, case result_s1Sr of { (_, y) -> y } #)
It was right on the cusp before, driven by the excessive result
discount. Too bad!
Happily, the compiler/perf tests show a number of improvements:
T12227 compiler bytes-alloc -6.6%
T12545 compiler bytes-alloc -4.7%
T13056 compiler bytes-alloc -3.3%
T15263 runtime bytes-alloc -13.1%
T17499 runtime bytes-alloc -14.3%
T3294 compiler bytes-alloc -1.1%
T5030 compiler bytes-alloc -11.7%
T9872a compiler bytes-alloc -2.0%
T9872b compiler bytes-alloc -1.2%
T9872c compiler bytes-alloc -1.5%
Metric Decrease:
T12227
T12545
T13056
T15263
T17499
T3294
T5030
T9872a
T9872b
T9872c
|
| | |
|
| |
|
|
|
| |
Previously it wasn't uncommon to see +/-1% fluctuations in compiler
allocations on this test.
|
| | |
|
| |
|
|
|
|
|
| |
This test performs little work, so the most minor allocation
changes often cause the test to fail.
Increasing the threshold to 2% should help with this.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* support detection of slow ghc-bignum backend (to replace the detection
of integer-simple use). There are still some test cases that the
native backend doesn't handle efficiently enough.
* remove tests for GMP only functions that have been removed from
ghc-bignum
* fix test results showing dependent packages (e.g. integer-gmp) or
showing suggested instances
* fix test using Integer/Natural API or showing internal names
|
| |
|
|
|
| |
Just adding `{-# LANGUAGE BangPatterns #-}` makes the two other metrics
fluctuate by 13%.
|
| |
|
|
|
|
|
|
|
| |
As noted in #18319, this test was previously very fragile. Increase its
size to make it more likely that its fails with its newly-increased
acceptance threshold.
Metric Increase:
T12150
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ticket #18304 showed that we need to be very careful
when exploring the demand (esp usage demand) on recursive
product types.
This patch solves the problem by trimming the demand on such types --
in effect, a form of "widening".
See the Note [Trimming a demand to a type] in DmdAnal, which explains
how I did this by piggy-backing on an existing mechansim for trimming
demands becuase of GADTs. The significant payload of this patch is
very small indeed:
* Make GHC.Core.Opt.WorkWrap.Utils.typeShape use RecTcChecker to
avoid looking through recursive types.
But on the way
* I found that ae_rec_tc was entirely inoperative and did nothing.
So I removed it altogether from DmdAnal.
* I moved some code around in DmdAnal and Demand.
(There are no actual changes in dmdFix.)
* I changed the API of DmsAnal.dmdAnalRhsLetDown to return
a StrictSig rather than a decorated Id
* I removed the dead function peelTsFuns from Demand
Performance effects:
Nofib: 0.0% changes. Not surprising, because they don't
use recursive products
Perf tests
T12227:
1% increase in compiler allocation, becuase $cto gets w/w'd.
It did not w/w before because it takes a deeply nested
argument, so the worker gets too many args, so we abandon w/w
altogether (see GHC.Core.Opt.WorkWrap.Utils.isWorkerSmallEnough)
With this patch we trim the demands. That is not strictly
necessary (since these Generic type constructors are like
tuples -- they can't cause a loop) but the net result is that
we now w/w $cto which is fine.
UniqLoop:
16% decrease in /runtime/ allocation. The UniqSupply is a
recursive product, so currently we abandon all strictness on
'churn'. With this patch 'churn' gets useful strictness, and
we w/w it. Hooray
Metric Decrease:
UniqLoop
Metric Increase:
T12227
|
| |
|
|
|
| |
T16190 is meant to test a NCG feature. It has already caused spurious
failures in other MRs (e.g. !2165) when LLVM is used.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cast worker/wrapper transformation transforms
x = e |> co
into
y = e
x = y |> co
This is done by the simplifier, but we were being
careless about transferring IdInfo from x to y,
and about what to do if x is a NOINLNE function.
This resulted in a series of bugs:
#17673, #18093, #18078.
This patch fixes all that:
* Main change is in GHC.Core.Opt.Simplify, and
the new prepareBinding function, which does this
cast worker/wrapper transform.
See Note [Cast worker/wrappers].
* There is quite a bit of refactoring around
prepareRhs, makeTrivial etc. It's nicer now.
* Some wrappers from strictness and cast w/w, notably those for
a function with a NOINLINE, should inline very late. There
wasn't really a mechanism for that, which was an existing bug
really; so I invented a new finalPhase = Phase (-1). It's used
for all simplifier runs after the user-visible phase 2,1,0 have
run. (No new runs of the simplifier are introduced thereby.)
See new Note [Compiler phases] in GHC.Types.Basic;
the main changes are in GHC.Core.Opt.Driver
* Doing this made me trip over two places where the AnonArgFlag on a
FunTy was being lost so we could end up with (Num a -> ty)
rather than (Num a => ty)
- In coercionLKind/coercionRKind
- In contHoleType in the Simplifier
I fixed the former by defining mkFunctionType and using it in
coercionLKind/RKind.
I could have done the same for the latter, but the information
is almost to hand. So I fixed the latter by
- adding sc_hole_ty to ApplyToVal (like ApplyToTy),
- adding as_hole_ty to ValArg (like TyArg)
- adding sc_fun_ty to StrictArg
Turned out I could then remove ai_type from ArgInfo. This is
just moving the deck chairs around, but it worked out nicely.
See the new Note [AnonArgFlag] in GHC.Types.Var
* When looking at the 'arity decrease' thing (#18093) I discovered
that stable unfoldings had a much lower arity than the actual
optimised function. That's what led to the arity-decrease
message. Simple solution: eta-expand.
It's described in Note [Eta-expand stable unfoldings]
in GHC.Core.Opt.Simplify
* I also discovered that unsafeCoerce wasn't being inlined if
the context was boring. So (\x. f (unsafeCoerce x)) would
create a thunk -- yikes! I fixed that by making inlineBoringOK
a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold.
I also found that unsafeCoerceName was unused, so I removed it.
I made a test case for #18078, and a very similar one for #17673.
The net effect of all this on nofib is very modest, but positive:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
anna -0.4% -0.1% -3.1% -3.1% 0.0%
fannkuch-redux -0.4% -0.3% -0.1% -0.1% 0.0%
maillist -0.4% -0.1% -7.8% -1.0% -14.3%
primetest -0.4% -15.6% -7.1% -6.6% 0.0%
--------------------------------------------------------------------------------
Min -0.9% -15.6% -13.3% -14.2% -14.3%
Max -0.3% 0.0% +12.1% +12.4% 0.0%
Geometric Mean -0.4% -0.2% -2.3% -2.2% -0.1%
All following metric decreases are compile-time allocation decreases
between -1% and -3%:
Metric Decrease:
T5631
T13701
T14697
T15164
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch simplifies GHC to use simple subsumption.
Ticket #17775
Implements GHC proposal #287
https://github.com/ghc-proposals/ghc-proposals/blob/master/
proposals/0287-simplify-subsumption.rst
All the motivation is described there; I will not repeat it here.
The implementation payload:
* tcSubType and friends become noticably simpler, because it no
longer uses eta-expansion when checking subsumption.
* No deeplyInstantiate or deeplySkolemise
That in turn means that some tests fail, by design; they can all
be fixed by eta expansion. There is a list of such changes below.
Implementing the patch led me into a variety of sticky corners, so
the patch includes several othe changes, some quite significant:
* I made String wired-in, so that
"foo" :: String rather than
"foo" :: [Char]
This improves error messages, and fixes #15679
* The pattern match checker relies on knowing about in-scope equality
constraints, andd adds them to the desugarer's environment using
addTyCsDs. But the co_fn in a FunBind was missed, and for some reason
simple-subsumption ends up with dictionaries there. So I added a
call to addTyCsDs. This is really part of #18049.
* I moved the ic_telescope field out of Implication and into
ForAllSkol instead. This is a nice win; just expresses the code
much better.
* There was a bug in GHC.Tc.TyCl.Instance.tcDataFamInstHeader.
We called checkDataKindSig inside tc_kind_sig, /before/
solveEqualities and zonking. Obviously wrong, easily fixed.
* solveLocalEqualitiesX: there was a whole mess in here, around
failing fast enough. I discovered a bad latent bug where we
could successfully kind-check a type signature, and use it,
but have unsolved constraints that could fill in coercion
holes in that signature -- aargh.
It's all explained in Note [Failure in local type signatures]
in GHC.Tc.Solver. Much better now.
* I fixed a serious bug in anonymous type holes. IN
f :: Int -> (forall a. a -> _) -> Int
that "_" should be a unification variable at the /outer/
level; it cannot be instantiated to 'a'. This was plain
wrong. New fields mode_lvl and mode_holes in TcTyMode,
and auxiliary data type GHC.Tc.Gen.HsType.HoleMode.
This fixes #16292, but makes no progress towards the more
ambitious #16082
* I got sucked into an enormous refactoring of the reporting of
equality errors in GHC.Tc.Errors, especially in
mkEqErr1
mkTyVarEqErr
misMatchMsg
misMatchMsgOrCND
In particular, the very tricky mkExpectedActualMsg function
is gone.
It took me a full day. But the result is far easier to understand.
(Still not easy!) This led to various minor improvements in error
output, and an enormous number of test-case error wibbles.
One particular point: for occurs-check errors I now just say
Can't match 'a' against '[a]'
rather than using the intimidating language of "occurs check".
* Pretty-printing AbsBinds
Tests review
* Eta expansions
T11305: one eta expansion
T12082: one eta expansion (undefined)
T13585a: one eta expansion
T3102: one eta expansion
T3692: two eta expansions (tricky)
T2239: two eta expansions
T16473: one eta
determ004: two eta expansions (undefined)
annfail06: two eta (undefined)
T17923: four eta expansions (a strange program indeed!)
tcrun035: one eta expansion
* Ambiguity check at higher rank. Now that we have simple
subsumption, a type like
f :: (forall a. Eq a => Int) -> Int
is no longer ambiguous, because we could write
g :: (forall a. Eq a => Int) -> Int
g = f
and it'd typecheck just fine. But f's type is a bit
suspicious, and we might want to consider making the
ambiguity check do a check on each sub-term. Meanwhile,
these tests are accepted, whereas they were previously
rejected as ambiguous:
T7220a
T15438
T10503
T9222
* Some more interesting error message wibbles
T13381: Fine: one error (Int ~ Exp Int)
rather than two (Int ~ Exp Int, Exp Int ~ Int)
T9834: Small change in error (improvement)
T10619: Improved
T2414: Small change, due to order of unification, fine
T2534: A very simple case in which a change of unification order
means we get tow unsolved constraints instead of one
tc211: bizarre impredicative tests; just accept this for now
Updates Cabal and haddock submodules.
Metric Increase:
T12150
T12234
T5837
haddock.base
Metric Decrease:
haddock.compiler
haddock.Cabal
haddock.base
Merge note: This appears to break the
`UnliftedNewtypesDifficultUnification` test. It has been marked as
broken in the interest of merging.
(cherry picked from commit 66b7b195cb3dce93ed5078b80bf568efae904cc5)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- We don't want to benchmark linting so disable lints in hie002 perf
test
- Move no_lint to the top-level to be able to use it in tests other than
those in `testsuite/tests/perf/compiler`.
- Filter out -dstg-lint in no_lint.
- hie002 allocation numbers on 32-bit are unstable, so skip it on 32-bit
Metric Decrease:
hie002
ManyConstructors
T12150
T12234
T13035
T1969
T4801
T9233
T9961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have another factor, ufKeenessFactor, which would scale the
discounts before they were subtracted from the size. This was justified
with the following comment:
-- We multiple the raw discounts (args_discount and result_discount)
-- ty opt_UnfoldingKeenessFactor because the former have to do with
-- *size* whereas the discounts imply that there's some extra
-- *efficiency* to be gained (e.g. beta reductions, case reductions)
-- by inlining.
However, this is highly suspect since it means that we subtract a
*scaled* size from an absolute size, resulting in crazy (e.g. negative)
scores in some cases (#15304). We consequently killed off
ufKeenessFactor and bumped up the ufUseThreshold to compensate.
Adjustment of unfolding use threshold
=====================================
Since this removes a discount from our inlining heuristic, I revisited our
default choice of -funfolding-use-threshold to minimize the change in
overall inlining behavior. Specifically, I measured runtime allocations
and executable size of nofib and the testsuite performance tests built
using compilers (and core libraries) built with several values of
-funfolding-use-threshold.
This comes as a result of a quantitative comparison of testsuite
performance and code size as a function of ufUseThreshold, comparing
GHC trees using values of 50, 60, 70, 80, 90, and 100. The test set
consisted of nofib and the testsuite performance tests.
A full summary of these measurements are found in the description of
!2608
Comparing executable sizes (relative to the base commit) across all
nofib tests, we see that sizes are similar to the baseline:
gmean min max median
thresh
50 -6.36% -7.04% -4.82% -6.46%
60 -5.04% -5.97% -3.83% -5.11%
70 -2.90% -3.84% -2.31% -2.92%
80 -0.75% -2.16% -0.42% -0.73%
90 +0.24% -0.41% +0.55% +0.26%
100 +1.36% +0.80% +1.64% +1.37%
baseline +0.00% +0.00% +0.00% +0.00%
Likewise, looking at runtime allocations we see that 80 gives slightly
better optimisation than the baseline:
gmean min max median
thresh
50 +0.16% -0.16% +4.43% +0.00%
60 +0.09% -0.00% +3.10% +0.00%
70 +0.04% -0.09% +2.29% +0.00%
80 +0.02% -1.17% +2.29% +0.00%
90 -0.02% -2.59% +1.86% +0.00%
100 +0.00% -2.59% +7.51% -0.00%
baseline +0.00% +0.00% +0.00% +0.00%
Finally, I had to add a NOINLINE in T4306 to ensure that `upd` is
worker-wrappered as the test expects. This makes me wonder whether the
inlining heuristic is now too liberal as `upd` is quite a large
function. The same measure was taken in T12600.
Wall clock time compiling Cabal with -O0
thresh 50 60 70 80 90 100 baseline
build-Cabal 93.88 89.58 92.59 90.09 100.26 94.81 89.13
Also, this change happens to avoid the spurious test output in
`plugin-recomp-change` and `plugin-recomp-change-prof` (see #17308).
Metric Decrease:
hie002
T12234
T13035
T13719
T14683
T4801
T5631
T5642
T9020
T9872d
T9961
Metric Increase:
T12150
T12425
T13701
T14697
T15426
T1969
T3064
T5837
T6048
T9203
T9872a
T9872b
T9872c
T9872d
haddock.Cabal
haddock.base
haddock.compiler
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is joint work of Alexis King and Simon PJ. It does some
significant refactoring of the type-class specialiser. Main highlights:
* We can specialise functions with types like
f :: Eq a => a -> Ord b => b => blah
where the classes aren't all at the front (#16473). Here we can
correctly specialise 'f' based on a call like
f @Int @Bool dEqInt x dOrdBool
This change really happened in an earlier patch
commit 2d0cf6252957b8980d89481ecd0b79891da4b14b
Author: Sandy Maguire <sandy@sandymaguire.me>
Date: Thu May 16 12:12:10 2019 -0400
work that this new patch builds directly on that work, and refactors
it a bit.
* We can specialise functions with implicit parameters (#17930)
g :: (?foo :: Bool, Show a) => a -> String
Previously we could not, but now they behave just like a non-class
argument as in 'f' above.
* We can specialise under-saturated calls, where some (but not all of
the dictionary arguments are provided (#17966). For example, we can
specialise the above 'f' based on a call
map (f @Int dEqInt) xs
even though we don't (and can't) give Ord dictionary.
This may sound exotic, but #17966 is a program from the wild, and
showed significant perf loss for functions like f, if you need
saturation of all dictionaries.
* We fix a buglet in which a floated dictionary had a bogus demand
(#17810), by using zapIdDemandInfo in the NonRec case of specBind.
* A tiny side benefit: we can drop dead arguments to specialised
functions; see Note [Drop dead args from specialisations]
* Fixed a bug in deciding what dictionaries are "interesting"; see
Note [Keep the old dictionaries interesting]
This is all achieved by by building on Sandy Macguire's work in
defining SpecArg, which mkCallUDs uses to describe the arguments of
the call. Main changes:
* Main work is in specHeader, which marched down the [InBndr] from the
function definition and the [SpecArg] from the call site, together.
* specCalls no longer has an arity check; the entire mechanism now
handles unders-saturated calls fine.
* mkCallUDs decides on an argument-by-argument basis whether to
specialise a particular dictionary argument; this is new.
See mk_spec_arg in mkCallUDs.
It looks as if there are many more lines of code, but I think that
all the extra lines are comments!
|
| |
|
|
| |
We were mistakenly measuring program stats
|