summaryrefslogtreecommitdiff
path: root/compiler/GHC/Core/Opt/Pipeline.hs
diff options
context:
space:
mode:
authorSimon Peyton Jones <simonpj@microsoft.com>2020-08-24 14:36:57 +0100
committerSimon Peyton Jones <simonpj@microsoft.com>2020-09-21 16:15:14 +0100
commitff5a843e2003abed15f99d10eb1195cf9d572e06 (patch)
tree74e015a51530a05adb25daab808f97dd5b050a54 /compiler/GHC/Core/Opt/Pipeline.hs
parent9df77fed8918bb335874a584a829ee32325cefb5 (diff)
downloadhaskell-wip/T18223.tar.gz
Better eta-expansion (again) and don't specilise DFunswip/T18223
This patch fixes #18223, which made GHC generate an exponential amount of code. There are three quite separate changes in here 1. Re-engineer eta-expansion (again). The eta-expander was generating lots of intermediate stuff, which could be optimised away, but which choked the simplifier meanwhile. Relatively easy to kill it off at source. See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity. The main new thing is the use of pushCoArg in getArg_maybe. 2. Stop Specialise specalising DFuns. This is the cause of a huge (and utterly unnecessary) blowup in program size in #18223. See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise. I also refactored the Specialise monad a bit... it was silly, because it passed on unchanging values as if they were mutable state. 3. Do an extra Simplifer run, after SpecConstra and before late-Specialise. I found (investigating perf/compiler/T16473) that failing to do this was crippling *both* SpecConstr *and* Specialise. See Note [Simplify after SpecConstr] in GHC.Core.Opt.Pipeline. This change does mean an extra run of the Simplifier, but only with -O2, and I think that's acceptable. T16473 allocates *three* times less with this change. (I changed it to check runtime rather than compile time.) Some smaller consequences * I moved pushCoercion, pushCoArg and friends from SimpleOpt to Arity, because it was needed by the new etaInfoApp. And pushCoValArg now returns a MCoercion rather than Coercion for the argument Coercion. * A minor, incidental improvement to Core pretty-printing This does fix #18223, (which was otherwise uncompilable. Hooray. But there is still a big intermediate because there are some very deeply nested types in that program. Modest reductions in compile-time allocation on a couple of benchmarks T12425 -2.0% T13253 -10.3% Metric increase with -O2, due to extra simplifier run T9233 +5.8% T12227 +1.8% T15630 +5.0% There is a spurious apparent increase on heap residency on T9630, on some architectures at least. I tried it with -G1 and the residency is essentially unchanged. Metric Increase T9233 T12227 T9630 Metric Decrease T12425 T13253
Diffstat (limited to 'compiler/GHC/Core/Opt/Pipeline.hs')
-rw-r--r--compiler/GHC/Core/Opt/Pipeline.hs52
1 files changed, 39 insertions, 13 deletions
diff --git a/compiler/GHC/Core/Opt/Pipeline.hs b/compiler/GHC/Core/Opt/Pipeline.hs
index c3f2fc9f85..6d0712e634 100644
--- a/compiler/GHC/Core/Opt/Pipeline.hs
+++ b/compiler/GHC/Core/Opt/Pipeline.hs
@@ -315,33 +315,38 @@ getCoreToDo dflags
runWhen do_float_in CoreDoFloatInwards,
+ simplify "final", -- Final tidy-up
+
maybe_rule_check FinalPhase,
+ -------- After this we have -O2 passes -----------------
+ -- None of them run with -O
+
-- Case-liberation for -O2. This should be after
-- strictness analysis and the simplification which follows it.
- runWhen liberate_case (CoreDoPasses [
- CoreLiberateCase,
- simplify "post-liberate-case"
- ]), -- Run the simplifier after LiberateCase to vastly
- -- reduce the possibility of shadowing
- -- Reason: see Note [Shadowing] in GHC.Core.Opt.SpecConstr
+ runWhen liberate_case $ CoreDoPasses
+ [ CoreLiberateCase, simplify "post-liberate-case" ],
+ -- Run the simplifier after LiberateCase to vastly
+ -- reduce the possibility of shadowing
+ -- Reason: see Note [Shadowing] in GHC.Core.Opt.SpecConstr
- runWhen spec_constr CoreDoSpecConstr,
+ runWhen spec_constr $ CoreDoPasses
+ [ CoreDoSpecConstr, simplify "post-spec-constr"],
+ -- See Note [Simplify after SpecConstr]
maybe_rule_check FinalPhase,
- runWhen late_specialise
- (CoreDoPasses [ CoreDoSpecialising
- , simplify "post-late-spec"]),
+ runWhen late_specialise $ CoreDoPasses
+ [ CoreDoSpecialising, simplify "post-late-spec"],
-- LiberateCase can yield new CSE opportunities because it peels
-- off one layer of a recursive function (concretely, I saw this
-- in wheel-sieve1), and I'm guessing that SpecConstr can too
-- And CSE is a very cheap pass. So it seems worth doing here.
- runWhen ((liberate_case || spec_constr) && cse) CoreCSE,
+ runWhen ((liberate_case || spec_constr) && cse) $ CoreDoPasses
+ [ CoreCSE, simplify "post-final-cse" ],
- -- Final clean-up simplification:
- simplify "final",
+ --------- End of -O2 passes --------------
runWhen late_dmd_anal $ CoreDoPasses (
dmd_cpr_ww ++ [simplify "post-late-ww"]
@@ -410,6 +415,27 @@ or with -O0. Two reasons:
But watch out: list fusion can prevent floating. So use phase control
to switch off those rules until after floating.
+Note [Simplify after SpecConstr]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We want to run the simplifier after SpecConstr, and before late-Specialise,
+for two reasons, both shown up in test perf/compiler/T16473,
+with -O2 -flate-specialise
+
+1. I found that running late-Specialise after SpecConstr, with no
+ simplification in between meant that the carefullly constructed
+ SpecConstr rule never got to fire. (It was something like
+ lvl = f a -- Arity 1
+ ....g lvl....
+ SpecConstr specialised g for argument lvl; but Specialise then
+ specialised lvl = f a to lvl = $sf, and inlined. Or something like
+ that.)
+
+2. Specialise relies on unfoldings being available for top-level dictionary
+ bindings; but SpecConstr kills them all! The Simplifer restores them.
+
+This extra run of the simplifier has a cost, but this is only with -O2.
+
+
************************************************************************
* *
The CoreToDo interpreter