summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSimon Peyton Jones <simon.peytonjones@gmail.com>2023-02-21 10:51:34 +0000
committerBen Gamari <ben@smart-cactus.org>2023-03-02 10:31:49 -0500
commitb73b70bfa976eb57f1c4ec0c0db97a5fab2bf9ff (patch)
treeb92f61bbab8d9dbcd63f545d99dc5db3b6118e61
parent5451b48c5affeb443e2ba1eef60a9c10e0a5842b (diff)
downloadhaskell-b73b70bfa976eb57f1c4ec0c0db97a5fab2bf9ff.tar.gz
Take more care with unlifted bindings in the specialiser
As #22998 showed, we were floating an unlifted binding to top level, which breaks a Core invariant. The fix is easy, albeit a little bit conservative. See Note [Care with unlifted bindings] in GHC.Core.Opt.Specialise (cherry picked from commit 7192ef91c855e1fae6997f75cfde76aafd0b4bcf)
-rw-r--r--compiler/GHC/Core.hs94
-rw-r--r--compiler/GHC/Core/Opt/Specialise.hs29
-rw-r--r--testsuite/tests/simplCore/should_run/T22998.hs10
-rw-r--r--testsuite/tests/simplCore/should_run/T22998.stdout1
-rw-r--r--testsuite/tests/simplCore/should_run/all.T1
5 files changed, 101 insertions, 34 deletions
diff --git a/compiler/GHC/Core.hs b/compiler/GHC/Core.hs
index f504a7cbd5..6ecd144988 100644
--- a/compiler/GHC/Core.hs
+++ b/compiler/GHC/Core.hs
@@ -366,39 +366,20 @@ a Coercion, (sym c).
Note [Core letrec invariant]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The right hand sides of all top-level and recursive @let@s
-/must/ be of lifted type (see "Type#type_classification" for
-the meaning of /lifted/ vs. /unlifted/).
+The Core letrec invariant:
-There is one exception to this rule, top-level @let@s are
-allowed to bind primitive string literals: see
-Note [Core top-level string literals].
+ The right hand sides of all
+ /top-level/ or /recursive/
+ bindings must be of lifted type
-Note [Core top-level string literals]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As an exception to the usual rule that top-level binders must be lifted,
-we allow binding primitive string literals (of type Addr#) of type Addr# at the
-top level. This allows us to share string literals earlier in the pipeline and
-crucially allows other optimizations in the Core2Core pipeline to fire.
-Consider,
-
- f n = let a::Addr# = "foo"#
- in \x -> blah
-
-In order to be able to inline `f`, we would like to float `a` to the top.
-Another option would be to inline `a`, but that would lead to duplicating string
-literals, which we want to avoid. See #8472.
-
-The solution is simply to allow top-level unlifted binders. We can't allow
-arbitrary unlifted expression at the top-level though, unlifted binders cannot
-be thunks, so we just allow string literals.
+ There is one exception to this rule, top-level @let@s are
+ allowed to bind primitive string literals: see
+ Note [Core top-level string literals].
-We allow the top-level primitive string literals to be wrapped in Ticks
-in the same way they can be wrapped when nested in an expression.
-CoreToSTG currently discards Ticks around top-level primitive string literals.
-See #14779.
+See "Type#type_classification" in GHC.Core.Type
+for the meaning of "lifted" vs. "unlifted").
-Also see Note [Compilation plan for top-level string literals].
+For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
Note [Compilation plan for top-level string literals]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -420,14 +401,16 @@ parts of the compilation pipeline.
the _bytes suffix.
Note [Core let-can-float invariant]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The let-can-float invariant:
- The right hand side of a non-recursive 'Let'
- /may/ be of unlifted type, but only if
+ The right hand side of a /non-top-level/, /non-recursive/ binding
+ may be of unlifted type, but only if
the expression is ok-for-speculation
or the 'Let' is for a join point.
+ (For top-level or recursive lets see Note [Core letrec invariant].)
+
This means that the let can be floated around
without difficulty. For example, this is OK:
@@ -466,6 +449,53 @@ we need to allow lots of things in the arguments of a call.
TL;DR: we relaxed the let/app invariant to become the let-can-float invariant.
+Note [Core top-level string literals]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As an exception to the usual rule that top-level binders must be lifted,
+we allow binding primitive string literals (of type Addr#) of type Addr# at the
+top level. This allows us to share string literals earlier in the pipeline and
+crucially allows other optimizations in the Core2Core pipeline to fire.
+Consider,
+
+ f n = let a::Addr# = "foo"#
+ in \x -> blah
+
+In order to be able to inline `f`, we would like to float `a` to the top.
+Another option would be to inline `a`, but that would lead to duplicating string
+literals, which we want to avoid. See #8472.
+
+The solution is simply to allow top-level unlifted binders. We can't allow
+arbitrary unlifted expression at the top-level though, unlifted binders cannot
+be thunks, so we just allow string literals.
+
+We allow the top-level primitive string literals to be wrapped in Ticks
+in the same way they can be wrapped when nested in an expression.
+CoreToSTG currently discards Ticks around top-level primitive string literals.
+See #14779.
+
+Also see Note [Compilation plan for top-level string literals].
+
+Note [Compilation plan for top-level string literals]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Here is a summary on how top-level string literals are handled by various
+parts of the compilation pipeline.
+
+* In the source language, there is no way to bind a primitive string literal
+ at the top level.
+
+* In Core, we have a special rule that permits top-level Addr# bindings. See
+ Note [Core top-level string literals]. Core-to-core passes may introduce
+ new top-level string literals.
+
+ See GHC.Core.Utils.exprIsTopLevelBindable, and exprIsTickedString
+
+* In STG, top-level string literals are explicitly represented in the syntax
+ tree.
+
+* A top-level string literal may end up exported from a module. In this case,
+ in the object file, the content of the exported literal is given a label with
+ the _bytes suffix.
+
Note [Case expression invariants]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Case expressions are one of the more complicated elements of the Core
diff --git a/compiler/GHC/Core/Opt/Specialise.hs b/compiler/GHC/Core/Opt/Specialise.hs
index f028bd428e..f48aeb50d7 100644
--- a/compiler/GHC/Core/Opt/Specialise.hs
+++ b/compiler/GHC/Core/Opt/Specialise.hs
@@ -27,7 +27,7 @@ import GHC.Core
import GHC.Core.Make ( mkLitRubbish )
import GHC.Core.Unify ( tcMatchTy )
import GHC.Core.Rules
-import GHC.Core.Utils ( exprIsTrivial
+import GHC.Core.Utils ( exprIsTrivial, exprIsTopLevelBindable
, mkCast, exprType
, stripTicksTop, mkInScopeSetBndrs )
import GHC.Core.FVs
@@ -1515,7 +1515,10 @@ specBind top_lvl env (NonRec fn rhs) do_body
= [mkDB $ NonRec b r | (b,r) <- pairs]
++ fromOL dump_dbs
- ; if float_all then
+ can_float_this_one = exprIsTopLevelBindable rhs (idType fn)
+ -- exprIsTopLevelBindable: see Note [Care with unlifted bindings]
+
+ ; if float_all && can_float_this_one then
-- Rather than discard the calls mentioning the bound variables
-- we float this (dictionary) binding along with the others
return ([], body', all_free_uds `snocDictBinds` final_binds)
@@ -1876,6 +1879,28 @@ even in the case that @x = False@! Instead, we add a dummy 'Void#' argument to
the specialisation '$sfInt' ($sfInt :: Void# -> Array# Int) in order to
preserve laziness.
+Note [Care with unlifted bindings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider (#22998)
+ f x = let x::ByteArray# = <some literal>
+ n::Natural = NB x
+ in wombat @192827 (n |> co)
+where
+ co :: Natural ~ KnownNat 192827
+ wombat :: forall (n:Nat). KnownNat n => blah
+
+Left to itself, the specialiser would float the bindings for `x` and `n` to top
+level, so we can specialise `wombat`. But we can't have a top-level ByteArray#
+(see Note [Core letrec invariant] in GHC.Core). Boo.
+
+This is pretty exotic, so we take a simple way out: in specBind (the NonRec
+case) do not float the binding itself unless it satisfies exprIsTopLevelBindable.
+This is conservative: maybe the RHS of `x` has a free var that would stop it
+floating to top level anyway; but that is hard to spot (since we don't know what
+the non-top-level in-scope binders are) and rare (since the binding must satisfy
+Note [Core let-can-float invariant] in GHC.Core).
+
+
Note [Specialising Calls]
~~~~~~~~~~~~~~~~~~~~~~~~~
Suppose we have a function with a complicated type:
diff --git a/testsuite/tests/simplCore/should_run/T22998.hs b/testsuite/tests/simplCore/should_run/T22998.hs
new file mode 100644
index 0000000000..459f576d82
--- /dev/null
+++ b/testsuite/tests/simplCore/should_run/T22998.hs
@@ -0,0 +1,10 @@
+{-# LANGUAGE DataKinds #-}
+module Main where
+
+import Data.Proxy (Proxy(Proxy))
+import GHC.TypeLits (natVal)
+
+main :: IO ()
+main = print x
+ where
+ x = natVal @18446744073709551616 Proxy + natVal @18446744073709551616 Proxy
diff --git a/testsuite/tests/simplCore/should_run/T22998.stdout b/testsuite/tests/simplCore/should_run/T22998.stdout
new file mode 100644
index 0000000000..1ce484120a
--- /dev/null
+++ b/testsuite/tests/simplCore/should_run/T22998.stdout
@@ -0,0 +1 @@
+36893488147419103232
diff --git a/testsuite/tests/simplCore/should_run/all.T b/testsuite/tests/simplCore/should_run/all.T
index a887d5cedb..2de90b6541 100644
--- a/testsuite/tests/simplCore/should_run/all.T
+++ b/testsuite/tests/simplCore/should_run/all.T
@@ -107,3 +107,4 @@ test('T21229', normal, compile_and_run, ['-O'])
test('T21575', normal, compile_and_run, ['-O'])
test('T21575b', [], multimod_compile_and_run, ['T21575b', '-O'])
test('T20836', normal, compile_and_run, ['-O0']) # Should not time out; See #20836
+test('T22998', normal, compile_and_run, ['-O0 -fspecialise -dcore-lint'])