Thoughtful forcing in CoreUnfolding

We noticed that the structure of CoreUnfolding could leave double the amount of CoreExprs which were retained in the situation where the template but not all the predicates were forced. This observation was then confirmed using ghc-debug: ``` (["ghc:GHC.Core:App","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 237) (["ghc:GHC.Core:App","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","ghc-prim:GHC.Types:True"],Count 1) (["ghc:GHC.Core:Case","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 12) (["ghc:GHC.Core:Cast","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","BLACKHOLE"],Count 1) (["ghc:GHC.Core:Cast","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 78) (["ghc:GHC.Core:Cast","ghc-prim:GHC.Types:True","THUNK_1_0","ghc-prim:GHC.Types:False","THUNK_1_0"],Count 1) (["ghc:GHC.Core:Cast","ghc-prim:GHC.Types:True","ghc-prim:GHC.Types:False","THUNK_1_0","THUNK_1_0"],Count 3) (["ghc:GHC.Core:Cast","ghc-prim:GHC.Types:True","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0"],Count 1) (["ghc:GHC.Core:Lam","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","BLACKHOLE"],Count 31) (["ghc:GHC.Core:Lam","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 4307) (["ghc:GHC.Core:Lam","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","ghc-prim:GHC.Types:True"],Count 6) (["ghc:GHC.Core:Let","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 29) (["ghc:GHC.Core:Lit","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","ghc-prim:GHC.Types:True"],Count 1) (["ghc:GHC.Core:Tick","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 36) (["ghc:GHC.Core:Var","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 1) (["ghc:GHC.Core:Var","ghc-prim:GHC.Types:True","ghc-prim:GHC.Types:False","THUNK_1_0","THUNK_1_0"],Count 6) (["ghc:GHC.Core:Var","ghc-prim:GHC.Types:True","ghc-prim:GHC.Types:False","ghc-prim:GHC.Types:True","THUNK_1_0"],Count 2) ``` Where we can see that the first argument is forced but there are still thunks remaining which retain the old expr. For my test case (a very big module, peak of 3 000 000 core terms) this reduced peak memory usage by 1G (12G -> 11G). Fixes #20905
author: Matthew Pickering <matthewtpickering@gmail.com> 2022-01-05 17:21:28 +0000
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2022-01-07 18:25:06 -0500
commit: 7b783c9da649899bdce34b6e746fa8704b667f28 (patch)
tree: ddfd706deaa75b307b4a1ad1e220fc50284d8ef1
parent: 978ea35e37b49ffde28b0536e44362b66f3187b4 (diff)
download: haskell-7b783c9da649899bdce34b6e746fa8704b667f28.tar.gz
1 files changed, 60 insertions, 5 deletions
diff --git a/compiler/GHC/Core/Unfold/Make.hs b/compiler/GHC/Core/Unfold/Make.hs
index dd0a0b968a..44aa6ba1db 100644
--- a/compiler/GHC/Core/Unfold/Make.hs
+++ b/compiler/GHC/Core/Unfold/Make.hs
@@ -301,14 +301,28 @@ mkCoreUnfolding :: UnfoldingSource -> Bool -> CoreExpr
                 -> UnfoldingGuidance -> Unfolding
 -- Occurrence-analyses the expression before capturing it
 mkCoreUnfolding src top_lvl expr guidance
-  = CoreUnfolding { uf_tmpl         = occurAnalyseExpr expr,
+  =
+
+  let is_value = exprIsHNF expr
+      is_conlike = exprIsConLike expr
+      is_work_free = exprIsWorkFree expr
+      is_expandable = exprIsExpandable expr
+  in
+  -- See #20905 for what is going on here. We are careful to make sure we only
+  -- have one copy of an unfolding around at once.
+  -- Note [Thoughtful forcing in mkCoreUnfolding]
+  CoreUnfolding { uf_tmpl         = is_value `seq`
+                                    is_conlike `seq`
+                                    is_work_free `seq`
+                                    is_expandable `seq`
+                                      occurAnalyseExpr expr,
                       -- See Note [Occurrence analysis of unfoldings]
                     uf_src          = src,
                     uf_is_top       = top_lvl,
-                    uf_is_value     = exprIsHNF        expr,
-                    uf_is_conlike   = exprIsConLike    expr,
-                    uf_is_work_free = exprIsWorkFree   expr,
-                    uf_expandable   = exprIsExpandable expr,
+                    uf_is_value     = is_value,
+                    uf_is_conlike   = is_conlike,
+                    uf_is_work_free = is_work_free,
+                    uf_expandable   = is_expandable,
                     uf_guidance     = guidance }
 
 ----------------
@@ -399,4 +413,45 @@ even though we have a stable inlining, so that strictness w/w takes
 place.  It makes a big difference to efficiency, and the w/w pass knows
 how to transfer the INLINABLE info to the worker; see WorkWrap
 Note [Worker/wrapper for INLINABLE functions]
+
+Note [Thoughtful forcing in mkCoreUnfolding]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Core expressions retained in unfoldings is one of biggest uses of memory when compiling
+a program. Therefore we have to be careful about retaining copies of old or redundant
+templates (see !6202 for a particularlly bad case).
+
+With that in mind we want to maintain the invariant that each unfolding only references
+a single CoreExpr. One place where we have to be careful is in mkCoreUnfolding.
+
+* The template of the unfolding is the result of performing occurence analysis
+  (Note [Occurrence analysis of unfoldings])
+* Predicates are applied to the unanalysed expression
+
+Therefore if we are not thoughtful about forcing you can end up in a situation where the
+template is forced but not all the predicates are forced so the unfolding will retain
+both the old and analysed expressions.
+
+I investigated this using ghc-debug and it was clear this situation did often arise:
+
+```
+(["ghc:GHC.Core:Lam","ghc-prim:GHC.Types:True","THUNK_1_0","THUNK_1_0","THUNK_1_0"],Count 4307)
+```
+
+Here the predicates are unforced but the template is forced.
+
+Therefore we basically had two options in order to fix this:
+
+1. Perform the predicates on the analysed expression.
+2. Force the predicates to remove retainer to the old expression if we force the template.
+
+Option 1 is bad because occurence analysis is expensive and destroys any sharing of the unfolding
+with the actual program. (Testing this approach showed peak 25G memory usage)
+
+Therefore we got for Option 2 which performs a little more work but compensates by
+reducing memory pressure.
+
+The result of fixing this led to a 1G reduction in peak memory usage (12G -> 11G) when
+compiling a very large module (peak 3 million terms). For more discussion see #20905.
 -}
+
author	Matthew Pickering <matthewtpickering@gmail.com>	2022-01-05 17:21:28 +0000
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2022-01-07 18:25:06 -0500
commit	7b783c9da649899bdce34b6e746fa8704b667f28 (patch)
tree	ddfd706deaa75b307b4a1ad1e220fc50284d8ef1
parent	978ea35e37b49ffde28b0536e44362b66f3187b4 (diff)
download	haskell-7b783c9da649899bdce34b6e746fa8704b667f28.tar.gz