diff options
author | Simon Peyton Jones <simonpj@microsoft.com> | 2020-08-24 14:36:57 +0100 |
---|---|---|
committer | Simon Peyton Jones <simonpj@microsoft.com> | 2020-09-21 16:15:14 +0100 |
commit | ff5a843e2003abed15f99d10eb1195cf9d572e06 (patch) | |
tree | 74e015a51530a05adb25daab808f97dd5b050a54 /compiler/GHC/Core/Opt/Pipeline.hs | |
parent | 9df77fed8918bb335874a584a829ee32325cefb5 (diff) | |
download | haskell-wip/T18223.tar.gz |
Better eta-expansion (again) and don't specilise DFunswip/T18223
This patch fixes #18223, which made GHC generate an exponential
amount of code. There are three quite separate changes in here
1. Re-engineer eta-expansion (again). The eta-expander was
generating lots of intermediate stuff, which could be optimised
away, but which choked the simplifier meanwhile. Relatively
easy to kill it off at source.
See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity.
The main new thing is the use of pushCoArg in getArg_maybe.
2. Stop Specialise specalising DFuns. This is the cause of a huge
(and utterly unnecessary) blowup in program size in #18223.
See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise.
I also refactored the Specialise monad a bit... it was silly,
because it passed on unchanging values as if they were mutable
state.
3. Do an extra Simplifer run, after SpecConstra and before
late-Specialise. I found (investigating perf/compiler/T16473)
that failing to do this was crippling *both* SpecConstr *and*
Specialise. See Note [Simplify after SpecConstr] in
GHC.Core.Opt.Pipeline.
This change does mean an extra run of the Simplifier, but only
with -O2, and I think that's acceptable.
T16473 allocates *three* times less with this change. (I changed
it to check runtime rather than compile time.)
Some smaller consequences
* I moved pushCoercion, pushCoArg and friends from SimpleOpt
to Arity, because it was needed by the new etaInfoApp.
And pushCoValArg now returns a MCoercion rather than Coercion for
the argument Coercion.
* A minor, incidental improvement to Core pretty-printing
This does fix #18223, (which was otherwise uncompilable. Hooray. But
there is still a big intermediate because there are some very deeply
nested types in that program.
Modest reductions in compile-time allocation on a couple of benchmarks
T12425 -2.0%
T13253 -10.3%
Metric increase with -O2, due to extra simplifier run
T9233 +5.8%
T12227 +1.8%
T15630 +5.0%
There is a spurious apparent increase on heap residency on T9630,
on some architectures at least. I tried it with -G1 and the residency
is essentially unchanged.
Metric Increase
T9233
T12227
T9630
Metric Decrease
T12425
T13253
Diffstat (limited to 'compiler/GHC/Core/Opt/Pipeline.hs')
-rw-r--r-- | compiler/GHC/Core/Opt/Pipeline.hs | 52 |
1 files changed, 39 insertions, 13 deletions
diff --git a/compiler/GHC/Core/Opt/Pipeline.hs b/compiler/GHC/Core/Opt/Pipeline.hs index c3f2fc9f85..6d0712e634 100644 --- a/compiler/GHC/Core/Opt/Pipeline.hs +++ b/compiler/GHC/Core/Opt/Pipeline.hs @@ -315,33 +315,38 @@ getCoreToDo dflags runWhen do_float_in CoreDoFloatInwards, + simplify "final", -- Final tidy-up + maybe_rule_check FinalPhase, + -------- After this we have -O2 passes ----------------- + -- None of them run with -O + -- Case-liberation for -O2. This should be after -- strictness analysis and the simplification which follows it. - runWhen liberate_case (CoreDoPasses [ - CoreLiberateCase, - simplify "post-liberate-case" - ]), -- Run the simplifier after LiberateCase to vastly - -- reduce the possibility of shadowing - -- Reason: see Note [Shadowing] in GHC.Core.Opt.SpecConstr + runWhen liberate_case $ CoreDoPasses + [ CoreLiberateCase, simplify "post-liberate-case" ], + -- Run the simplifier after LiberateCase to vastly + -- reduce the possibility of shadowing + -- Reason: see Note [Shadowing] in GHC.Core.Opt.SpecConstr - runWhen spec_constr CoreDoSpecConstr, + runWhen spec_constr $ CoreDoPasses + [ CoreDoSpecConstr, simplify "post-spec-constr"], + -- See Note [Simplify after SpecConstr] maybe_rule_check FinalPhase, - runWhen late_specialise - (CoreDoPasses [ CoreDoSpecialising - , simplify "post-late-spec"]), + runWhen late_specialise $ CoreDoPasses + [ CoreDoSpecialising, simplify "post-late-spec"], -- LiberateCase can yield new CSE opportunities because it peels -- off one layer of a recursive function (concretely, I saw this -- in wheel-sieve1), and I'm guessing that SpecConstr can too -- And CSE is a very cheap pass. So it seems worth doing here. - runWhen ((liberate_case || spec_constr) && cse) CoreCSE, + runWhen ((liberate_case || spec_constr) && cse) $ CoreDoPasses + [ CoreCSE, simplify "post-final-cse" ], - -- Final clean-up simplification: - simplify "final", + --------- End of -O2 passes -------------- runWhen late_dmd_anal $ CoreDoPasses ( dmd_cpr_ww ++ [simplify "post-late-ww"] @@ -410,6 +415,27 @@ or with -O0. Two reasons: But watch out: list fusion can prevent floating. So use phase control to switch off those rules until after floating. +Note [Simplify after SpecConstr] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We want to run the simplifier after SpecConstr, and before late-Specialise, +for two reasons, both shown up in test perf/compiler/T16473, +with -O2 -flate-specialise + +1. I found that running late-Specialise after SpecConstr, with no + simplification in between meant that the carefullly constructed + SpecConstr rule never got to fire. (It was something like + lvl = f a -- Arity 1 + ....g lvl.... + SpecConstr specialised g for argument lvl; but Specialise then + specialised lvl = f a to lvl = $sf, and inlined. Or something like + that.) + +2. Specialise relies on unfoldings being available for top-level dictionary + bindings; but SpecConstr kills them all! The Simplifer restores them. + +This extra run of the simplifier has a cost, but this is only with -O2. + + ************************************************************************ * * The CoreToDo interpreter |