A bunch of changes related to eta reductionwip/T18993b

This is a large collection of changes all relating to eta reduction, originally triggered by #18993, but there followed a long saga. Specifics: * Move state-hack stuff from GHC.Types.Id (where it never belonged) to GHC.Core.Opt.Arity (which seems much more appropriate). * Add a crucial mkCast in the Cast case of GHC.Core.Opt.Arity.eta_expand; helps with T18223 * Add clarifying notes about eta-reducing to PAPs. See Note [Do not eta reduce PAPs] * I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity, where it properly belongs. See Note [Eta reduce PAPs] * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for when eta-expansion is wanted, to make wantEtaExpansion, and all that same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was previously inconsistent, but it's doing the same thing. * I did a substantial refactor of ArityType; see Note [ArityType]. This allowed me to do away with the somewhat mysterious takeOneShots; more generally it allows arityType to describe the function, leaving its clients to decide how to use that information. I made ArityType abstract, so that clients have to use functions to access it. * Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called mkLam before) aware of the floats that the simplifier builds up, so that it can still do eta-reduction even if there are some floats. (Previously that would not happen.) That means passing the floats to rebuildLam, and an extra check when eta-reducting (etaFloatOk). * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info in the idDemandInfo of the binder, as well as the CallArity info. The occurrence analyser did this but we were failing to take advantage here. In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity; see Note [Combining arityType with demand info], and functions idDemandOneShots and combineWithDemandOneShots. (These changes partly drove my refactoring of ArityType.) * In GHC.Core.Opt.Arity.findRhsArity * I'm now taking account of the demand on the binder to give extra one-shot info. E.g. if the fn is always called with two args, we can give better one-shot info on the binders than if we just look at the RHS. * Don't do any fixpointing in the non-recursive case -- simple short cut. * Trim arity inside the loop. See Note [Trim arity inside the loop] * Make SimpleOpt respect the eta-reduction flag (Some associated refactoring here.) * I made the CallCtxt which the Simplifier uses distinguish between recursive and non-recursive right-hand sides. data CallCtxt = ... | RhsCtxt RecFlag | ... It affects only one thing: - We call an RHS context interesting only if it is non-recursive see Note [RHS of lets] in GHC.Core.Unfold * Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification. See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep. Other incidental changes * Fix a fairly long-standing outright bug in the ApplyToVal case of GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the tail of 'dmds' in the recursive call, which meant the demands were All Wrong. I have no idea why this has not caused problems before now. * Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg Metrics: compile_time/bytes allocated Test Metric Baseline New value Change --------------------------------------------------------------------------------------- MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2% T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0% geo. mean -0.1% Nofib: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- banner +0.0% +0.4% -8.9% -8.7% 0.0% exact-reals +0.0% -7.4% -36.3% -37.4% 0.0% fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0% fft2 -0.1% -0.2% -17.8% -19.2% 0.0% fluid +0.0% -1.3% -2.1% -2.1% 0.0% gg -0.0% +2.2% -0.2% -0.1% 0.0% spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0% tak +0.0% -0.3% -9.8% -9.8% 0.0% x2n1 +0.0% -0.2% -3.2% -3.2% 0.0% -------------------------------------------------------------------------------- Min -3.5% -7.4% -58.7% -59.9% 0.0% Max +0.1% +2.2% +32.9% +32.9% 0.0% Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0% Metric Decrease: MultiLayerModulesTH_OneShot T18223 T3064 T15185 T14766 Metric Increase: T9630
author: Simon Peyton Jones <simonpj@microsoft.com> 2021-07-23 23:57:01 +0100
committer: Matthew Pickering <matthewtpickering@gmail.com> 2022-05-27 15:16:30 +0100
commit: 30d4cf24dc8098b140893992863c31e3a1937d26 (patch)
tree: ca979794290acbe5b298e313d574fb807c2a8fbb
parent: 6cfe39b9d325b0656c74de0b5f5ff039ced98059 (diff)
download: haskell-wip/T18993b.tar.gz
46 files changed, 1820 insertions, 1471 deletions
diff --git a/compiler/GHC/Builtin/PrimOps/Ids.hs b/compiler/GHC/Builtin/PrimOps/Ids.hs
index 9c6984a018..6d50911ad0 100644
--- a/compiler/GHC/Builtin/PrimOps/Ids.hs
+++ b/compiler/GHC/Builtin/PrimOps/Ids.hs
@@ -9,7 +9,7 @@ import GHC.Prelude
 
 -- primop rules are attached to primop ids
 import {-# SOURCE #-} GHC.Core.Opt.ConstantFold (primOpRules)
-import GHC.Core.Type (mkForAllTys, mkVisFunTysMany)
+import GHC.Core.Type (mkForAllTys, mkVisFunTysMany, argsHaveFixedRuntimeRep )
 import GHC.Core.FVs (mkRuleInfo)
 
 import GHC.Builtin.PrimOps
@@ -38,7 +38,8 @@ mkPrimOpId prim_op
     name = mkWiredInName gHC_PRIM (primOpOcc prim_op)
                          (mkPrimOpIdUnique (primOpTag prim_op))
                          (AnId id) UserSyntax
-    id   = mkGlobalId (PrimOpId prim_op) name ty info
+    id   = mkGlobalId (PrimOpId prim_op lev_poly) name ty info
+    lev_poly = not (argsHaveFixedRuntimeRep ty)
 
     -- PrimOps don't ever construct a product, but we want to preserve bottoms
     cpr
diff --git a/compiler/GHC/Core/Lint.hs b/compiler/GHC/Core/Lint.hs
index 6fcd8aca96..0511a4004d 100644
--- a/compiler/GHC/Core/Lint.hs
+++ b/compiler/GHC/Core/Lint.hs
@@ -667,8 +667,8 @@ lintLetBind top_lvl rec_flag binder rhs rhs_ty
        -- exceeds idArity, but that is an unnecessary complication, see
        -- Note [idArity varies independently of dmdTypeDepth] in GHC.Core.Opt.DmdAnal
 
-       -- Check that the binder's arity is within the bounds imposed by
-       -- the type and the strictness signature. See Note [exprArity invariant]
+       -- Check that the binder's arity is within the bounds imposed by the type
+       -- and the strictness signature. See Note [Arity invariants for bindings]
        -- and Note [Trimming arity]
 
        ; checkL (typeArity (idType binder) >= idArity binder)
@@ -677,6 +677,7 @@ lintLetBind top_lvl rec_flag binder rhs rhs_ty
            ppr (typeArity (idType binder)) <> colon <+>
            ppr binder)
 
+       -- See Note [Check arity on bottoming functions]
        ; case splitDmdSig (idDmdSig binder) of
            (demands, result_info) | isDeadEndDiv result_info ->
              checkL (demands `lengthAtLeast` idArity binder)
@@ -761,6 +762,12 @@ lintIdUnfolding  _ _ _
                     -- to exponential behaviour; c.f. GHC.Core.FVs.idUnfoldingVars
 
 {-
+Note [Check arity on bottoming functions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If a function has a strictness signature like [S]b, it claims to
+return bottom when applied to one argument.  So its arity should not
+be greater than 1!  We check this claim in Lint.
+
 Note [Checking for INLINE loop breakers]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It's very suspicious if a strong loop breaker is marked INLINE.
diff --git a/compiler/GHC/Core/Opt/Arity.hs b/compiler/GHC/Core/Opt/Arity.hs
index 811beb6c0a..5858ff91e0 100644
--- a/compiler/GHC/Core/Opt/Arity.hs
+++ b/compiler/GHC/Core/Opt/Arity.hs
@@ -11,16 +11,29 @@
 
 -- | Arity and eta expansion
 module GHC.Core.Opt.Arity
-   ( manifestArity, joinRhsArity, exprArity
-   , typeArity, typeOneShots
-   , exprEtaExpandArity, findRhsArity
-   , etaExpand, etaExpandAT
-   , exprBotStrictness_maybe
+   ( -- Finding arity
+     manifestArity, joinRhsArity, exprArity
+   , findRhsArity, exprBotStrictness_maybe
    , ArityOpts(..)
 
+   -- ** Eta expansion
+   , exprEtaExpandArity, etaExpand, etaExpandAT
+
+   -- ** Eta reduction
+   , tryEtaReduce
+
    -- ** ArityType
-   , ArityType(..), mkBotArityType, mkTopArityType, expandableArityType
-   , arityTypeArity, maxWithArity, minWithArity, idArityType
+   , ArityType, mkBotArityType, mkManifestArityType
+   , arityTypeArity, idArityType, getBotArity
+
+   -- ** typeArity and the state hack
+   , typeArity, typeOneShots, typeOneShot
+   , isOneShotBndr
+   , isStateHackType
+
+   -- * Lambdas
+   , zapLamBndrs
+
 
    -- ** Join points
    , etaExpandToJoinPoint, etaExpandToJoinPointRule
@@ -39,7 +52,7 @@ import GHC.Core.Utils
 import GHC.Core.DataCon
 import GHC.Core.TyCon     ( tyConArity )
 import GHC.Core.TyCon.RecWalk     ( initRecTc, checkRecTc )
-import GHC.Core.Predicate ( isDictTy, isCallStackPredTy )
+import GHC.Core.Predicate ( isDictTy, isEvVar, isCallStackPredTy )
 import GHC.Core.Multiplicity
 
 -- We have two sorts of substitution:
@@ -50,17 +63,19 @@ import GHC.Core.Type     as Type
 import GHC.Core.Coercion as Type
 
 import GHC.Types.Demand
-import GHC.Types.Var
-import GHC.Types.Var.Env
 import GHC.Types.Id
+import GHC.Types.Var.Env
 import GHC.Types.Var.Set
 import GHC.Types.Basic
 import GHC.Types.Tickish
 
+import GHC.Builtin.Types.Prim
 import GHC.Builtin.Uniques
+
 import GHC.Data.FastString
 import GHC.Data.Pair
 
+import GHC.Utils.GlobalVars( unsafeHasNoStateHack )
 import GHC.Utils.Constants (debugIsOn)
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
@@ -123,7 +138,8 @@ exprArity :: CoreExpr -> Arity
 -- We do /not/ guarantee that exprArity e <= typeArity e
 -- You may need to do arity trimming after calling exprArity
 -- See Note [Arity trimming]
--- (If we do arity trimming here we have to do it at every cast.
+-- Reason: if we do arity trimming here we have take exprType
+--         and that can be expensive if there is a large cast
 exprArity e = go e
   where
     go (Var v)                     = idArity v
@@ -139,13 +155,50 @@ exprArity e = go e
     go _                           = 0
 
 ---------------
+exprBotStrictness_maybe :: CoreExpr -> Maybe (Arity, DmdSig)
+-- A cheap and cheerful function that identifies bottoming functions
+-- and gives them a suitable strictness signatures.  It's used during
+-- float-out
+exprBotStrictness_maybe e
+  = case getBotArity (arityType botStrictnessArityEnv e) of
+        Nothing -> Nothing
+        Just ar -> Just (ar, mkVanillaDmdSig ar botDiv)
+
+{- Note [exprArity for applications]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When we come to an application we check that the arg is trivial.
+   eg  f (fac x) does not have arity 2,
+                 even if f has arity 3!
+
+* We require that is trivial rather merely cheap.  Suppose f has arity 2.
+  Then    f (Just y)
+  has arity 0, because if we gave it arity 1 and then inlined f we'd get
+          let v = Just y in \w. <f-body>
+  which has arity 0.  And we try to maintain the invariant that we don't
+  have arity decreases.
+
+*  The `max 0` is important!  (\x y -> f x) has arity 2, even if f is
+   unknown, hence arity 0
+
+
+************************************************************************
+*                                                                      *
+              typeArity and the "state hack"
+*                                                                      *
+********************************************************************* -}
+
+
 typeArity :: Type -> Arity
+-- ^ (typeArity ty) says how many arrows GHC can expose in 'ty', after
+-- looking through newtypes.  More generally, (typeOneShots ty) returns
+-- ty's [OneShotInfo], based only on the type itself, using typeOneShot
+-- on the argument type to access the "state hack".
 typeArity = length . typeOneShots
 
 typeOneShots :: Type -> [OneShotInfo]
 -- How many value arrows are visible in the type?
 -- We look through foralls, and newtypes
--- See Note [typeArity invariants]
+-- See Note [Arity invariants for bindings]
 typeOneShots ty
   = go initRecTc ty
   where
@@ -174,64 +227,121 @@ typeOneShots ty
       | otherwise
       = []
 
----------------
-exprBotStrictness_maybe :: CoreExpr -> Maybe (Arity, DmdSig)
--- A cheap and cheerful function that identifies bottoming functions
--- and gives them a suitable strictness signatures.  It's used during
--- float-out
-exprBotStrictness_maybe e
-  = case getBotArity (arityType botStrictnessArityEnv e) of
-        Nothing -> Nothing
-        Just ar -> Just (ar, sig ar)
-  where
-    sig ar = mkClosedDmdSig (replicate ar topDmd) botDiv
+typeOneShot :: Type -> OneShotInfo
+typeOneShot ty
+   | isStateHackType ty = OneShotLam
+   | otherwise          = NoOneShotInfo
+
+-- | Like 'idOneShotInfo', but taking the Horrible State Hack in to account
+-- See Note [The state-transformer hack] in "GHC.Core.Opt.Arity"
+idStateHackOneShotInfo :: Id -> OneShotInfo
+idStateHackOneShotInfo id
+    | isStateHackType (idType id) = OneShotLam
+    | otherwise                   = idOneShotInfo id
+
+-- | Returns whether the lambda associated with the 'Id' is
+--   certainly applied at most once
+-- This one is the "business end", called externally.
+-- It works on type variables as well as Ids, returning True
+-- Its main purpose is to encapsulate the Horrible State Hack
+-- See Note [The state-transformer hack] in "GHC.Core.Opt.Arity"
+isOneShotBndr :: Var -> Bool
+isOneShotBndr var
+  | isTyVar var                              = True
+  | OneShotLam <- idStateHackOneShotInfo var = True
+  | otherwise                                = False
+
+isStateHackType :: Type -> Bool
+isStateHackType ty
+  | unsafeHasNoStateHack   -- Switch off with -fno-state-hack
+  = False
+  | otherwise
+  = case tyConAppTyCon_maybe ty of
+        Just tycon -> tycon == statePrimTyCon
+        _          -> False
+        -- This is a gross hack.  It claims that
+        -- every function over realWorldStatePrimTy is a one-shot
+        -- function.  This is pretty true in practice, and makes a big
+        -- difference.  For example, consider
+        --      a `thenST` \ r -> ...E...
+        -- The early full laziness pass, if it doesn't know that r is one-shot
+        -- will pull out E (let's say it doesn't mention r) to give
+        --      let lvl = E in a `thenST` \ r -> ...lvl...
+        -- When `thenST` gets inlined, we end up with
+        --      let lvl = E in \s -> case a s of (r, s') -> ...lvl...
+        -- and we don't re-inline E.
+        --
+        -- It would be better to spot that r was one-shot to start with, but
+        -- I don't want to rely on that.
+        --
+        -- Another good example is in fill_in in PrelPack.hs.  We should be able to
+        -- spot that fill_in has arity 2 (and when Keith is done, we will) but we can't yet.
 
-{-
-Note [typeArity invariants]
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-We have the following invariants around typeArity
 
-  (1) In any binding x = e,
-      idArity f <= typeArity (idType f)
+{- Note [Arity invariants for bindings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We have the following invariants for let-bindings
+
+  (1) In any binding f = e,
+         idArity f <= typeArity (idType f)
+      We enforce this with trimArityType, called in findRhsArity;
+      see Note [Arity trimming].
+
+      Note that we enforce this only for /bindings/.  We do /not/ insist that
+         arityTypeArity (arityType e) <= typeArity (exprType e)
+      because that is quite a bit more expensive to guaranteed; it would
+      mean checking at every Cast in the recursive arityType, for example.
 
   (2) If typeArity (exprType e) = n,
       then manifestArity (etaExpand e n) = n
 
       That is, etaExpand can always expand as much as typeArity says
-      So the case analysis in etaExpand and in typeArity must match
+      (or less, of course). So the case analysis in etaExpand and in
+      typeArity must match.
 
-Why is this important?  Because
+      Consequence: because of (1), if we eta-expand to (idArity f), we will
+      end up with n manifest lambdas.
 
-  - In GHC.Iface.Tidy we use exprArity to fix the *final arity* of
-    each top-level Id, and in
+   (3) In any binding f = e,
+         idArity f <= arityTypeArity (safeArityType (arityType e))
+       That is, we call safeArityType before attributing e's arityType to f.
+       See Note [SafeArityType].
 
-  - In CorePrep we use etaExpand on each rhs, so that the visible lambdas
-    actually match that arity, which in turn means
-    that the StgRhs has the right number of lambdas
+       So we call safeArityType in findRhsArity.
 
 Suppose we have
    f :: Int -> Int -> Int
    f x y = x+y    -- Arity 2
 
    g :: F Int
-   g = case x of { True  -> f |> co1
-                 ; False -> g |> co2 }
+   g = case <cond> of { True  -> f |> co1
+                      ; False -> g |> co2 }
 
-Now, we can't eta-expand g to have arity 2, because etaExpand, which works
-off the /type/ of the expression, doesn't know how to make an eta-expanded
-binding
+where F is a type family.  Now, we can't eta-expand g to have arity 2,
+because etaExpand, which works off the /type/ of the expression
+(albeit looking through newtypes), doesn't know how to make an
+eta-expanded binding
    g = (\a b. case x of ...) |> co
-because can't make up `co` or the types of `a` and `b`.
+because it can't make up `co` or the types of `a` and `b`.
 
 So invariant (1) ensures that every binding has an arity that is no greater
 than the typeArity of the RHS; and invariant (2) ensures that etaExpand
 and handle what typeArity says.
 
+Why is this important?  Because
+
+  - In GHC.Iface.Tidy we use exprArity/manifestArity to fix the *final
+    arity* of each top-level Id, and in
+
+  - In CorePrep we use etaExpand on each rhs, so that the visible
+    lambdas actually match that arity, which in turn means that the
+    StgRhs has a number of lambdas that precisely matches the arity.
+
 Note [Arity trimming]
 ~~~~~~~~~~~~~~~~~~~~~
-Arity trimming, implemented by minWithArity, directly implements
-invariant (1) of Note [typeArity invariants]. Failing to do so, and
-hence breaking invariant (1) led to #5441.
+Invariant (1) of Note [Arity invariants for bindings] is upheld by findRhsArity,
+which calls trimArityType to trim the ArityType to match the Arity of the
+binding.  Failing to do so, and hence breaking invariant (1) led to #5441.
 
 How to trim?  If we end in topDiv, it's easy.  But we must take great care with
 dead ends (i.e. botDiv). Suppose the expression was (\x y. error "urk"),
@@ -293,26 +403,34 @@ trying to *make* it hold, but it's tricky and I gave up.
 
 The test simplCore/should_compile/T3722 is an excellent example.
 -------- End of old out of date comments, just for interest -----------
+-}
 
+{- ********************************************************************
+*                                                                      *
+                  Zapping lambda binders
+*                                                                      *
+********************************************************************* -}
 
-Note [exprArity for applications]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-When we come to an application we check that the arg is trivial.
-   eg  f (fac x) does not have arity 2,
-                 even if f has arity 3!
-
-* We require that is trivial rather merely cheap.  Suppose f has arity 2.
-  Then    f (Just y)
-  has arity 0, because if we gave it arity 1 and then inlined f we'd get
-          let v = Just y in \w. <f-body>
-  which has arity 0.  And we try to maintain the invariant that we don't
-  have arity decreases.
+zapLamBndrs :: FullArgCount -> [Var] -> [Var]
+-- If (\xyz. t) appears under-applied to only two arguments,
+-- we must zap the occ-info on x,y, because they appear (in 't') under the \z.
+-- See Note [Occurrence analysis for lambda binders] in GHc.Core.Opt.OccurAnal
+--
+-- NB: both `arg_count` and `bndrs` include both type and value args/bndrs
+zapLamBndrs arg_count bndrs
+  | no_need_to_zap = bndrs
+  | otherwise      = zap_em arg_count bndrs
+  where
+    no_need_to_zap = all isOneShotBndr (drop arg_count bndrs)
 
-*  The `max 0` is important!  (\x y -> f x) has arity 2, even if f is
-   unknown, hence arity 0
+    zap_em :: FullArgCount -> [Var] -> [Var]
+    zap_em 0 bs = bs
+    zap_em _ [] = []
+    zap_em n (b:bs) | isTyVar b = b              : zap_em (n-1) bs
+                    | otherwise = zapLamIdInfo b : zap_em (n-1) bs
 
 
-************************************************************************
+{- *********************************************************************
 *                                                                      *
            Computing the "arity" of an expression
 *                                                                      *
@@ -490,34 +608,72 @@ but not to introduce a new lambda.
 
 Note [ArityType]
 ~~~~~~~~~~~~~~~~
+ArityType can be thought of as an abstraction of an expression.
+The ArityType
+   AT [ (IsCheap,     NoOneShotInfo)
+      , (IsExpensive, OneShotLam)
+      , (IsCheap,     OneShotLam) ] Dunno)
+
+abstracts an expression like
+   \x. let <expensive> in
+       \y{os}.
+       \z{os}. blah
+
+In general we have (AT lams div).  Then
+* In lams :: [(Cost,OneShotInfo)]
+  * The Cost flag describes the part of the expression down
+    to the first (value) lambda.
+  * The OneShotInfo flag gives the one-shot info on that lambda.
+
+* If 'div' is dead-ending ('isDeadEndDiv'), then application to
+  'length lams' arguments will surely diverge, similar to the situation
+  with 'DmdType'.
+
 ArityType is the result of a compositional analysis on expressions,
 from which we can decide the real arity of the expression (extracted
 with function exprEtaExpandArity).
 
 We use the following notation:
-  at  ::= \o1..on.div
+  at  ::= \p1..pn.div
   div ::= T | x | ⊥
-  o   ::= ? | 1
-And omit the \. if n = 0. Examples:
-  \?11.T stands for @AT [NoOneShotInfo,OneShotLam,OneShotLam] topDiv@
-  ⊥      stands for @AT [] botDiv@
+  p   ::= (c o)
+  c   ::= X | C    -- Expensive or Cheap
+  o   ::= ? | 1    -- NotOneShot or OneShotLam
+We may omit the \. if n = 0.
+And ⊥ stands for `AT [] botDiv`
+
+Here is an example demonstrating the notation:
+  \(C?)(X1)(C1).T
+stands for
+   AT [ (IsCheap,NoOneShotInfo)
+      , (IsExpensive,OneShotLam)
+      , (IsCheap,OneShotLam) ]
+      topDiv
+
 See the 'Outputable' instance for more information. It's pretty simple.
 
+How can we use ArityType?  Example:
+      f = \x\y. let v = <expensive> in
+          \s(one-shot) \t(one-shot). blah
+      'f' has arity type \(C?)(C?)(X1)(C1).T
+      The one-shot-ness means we can, in effect, push that
+      'let' inside the \st, and expand to arity 4
+
+Suppose f = \xy. x+y
+Then  f             :: \(C?)(C?).T
+      f v           :: \(C?).T
+      f <expensive> :: \(X?).T
+
 Here is what the fields mean. If an arbitrary expression 'f' has
 ArityType 'at', then
 
  * If @at = AT [o1,..,on] botDiv@ (notation: \o1..on.⊥), then @f x1..xn@
    definitely diverges. Partial applications to fewer than n args may *or
-   may not* diverge.
+   may not* diverge.  Ditto exnDiv.
 
-   We allow ourselves to eta-expand bottoming functions, even
-   if doing so may lose some `seq` sharing,
-       let x = <expensive> in \y. error (g x y)
-       ==> \y. let x = <expensive> in error (g x y)
-
- * If @at = AT [o1,..,on] topDiv@ (notation: \o1..on.T), then expanding 'f'
-   to @\x1..xn. f x1..xn@ loses no sharing, assuming the calls of f respect
-   the one-shot-ness o1..on of its definition.
+ * If `f` has ArityType `at` we can eta-expand `f` to have (aritTypeOneShots at)
+   arguments without losing sharing. This function checks that the either
+   there are no expensive expressions, or the lambdas are one-shots.
 
    NB 'f' is an arbitrary expression, eg @f = g e1 e2@.  This 'f' can have
    arity type @AT oss _@, with @length oss > 0@, only if e1 e2 are themselves
@@ -530,20 +686,45 @@ ArityType 'at', then
    So eta expansion is dynamically ok; see Note [State hack and
    bottoming functions], the part about catch#
 
-Example:
-      f = \x\y. let v = <expensive> in
-          \s(one-shot) \t(one-shot). blah
-      'f' has arity type \??11.T
-      The one-shot-ness means we can, in effect, push that
-      'let' inside the \st.
+Wrinkles
 
+* Wrinkle [Bottoming functions]: see function 'arityLam'.
+  We treat bottoming functions as one-shot, because there is no point
+  in floating work outside the lambda, and it's fine to float it inside.
 
-Suppose f = \xy. x+y
-Then  f             :: \??.T
-      f v           :: \?.T
-      f <expensive> :: T
--}
+  For example, this is fine (see test stranal/sigs/BottomFromInnerLambda)
+       let x = <expensive> in \y. error (g x y)
+       ==> \y. let x = <expensive> in error (g x y)
 
+  Idea: perhaps we could enforce this invariant with
+     data Arity Type = TopAT [(Cost, OneShotInfo)] | DivAT [Cost]
+
+
+Note [SafeArityType]
+~~~~~~~~~~~~~~~~~~~~
+The function safeArityType trims an ArityType to return a "safe" ArityType,
+for which we use a type synonym SafeArityType.  It is "safe" in the sense
+that (arityTypeArity at) really reflects the arity of the expression, whereas
+a regular ArityType might have more lambdas in its [ATLamInfo] that the
+(cost-free) arity of the expression.
+
+For example
+   \x.\y.let v = expensive in \z. blah
+has
+   arityType = AT [C?, C?, X?, C?] Top
+But the expression actually has arity 2, not 4, because of the X.
+So safeArityType will trim it to (AT [C?, C?] Top), whose [ATLamInfo]
+now reflects the (cost-free) arity of the expression
+
+Why do we ever need an "unsafe" ArityType, such as the example above?
+Because its (cost-free) arity may increased by combineWithDemandOneShots
+in findRhsArity. See Note [Combining arity type with demand info].
+
+Thus the function `arityType` returns a regular "unsafe" ArityType, that
+goes deeply into the lambdas (including under IsExpensive). But that is
+very local; most ArityTypes are indeed "safe".  We use the type synonym
+SafeArityType to indicate where we believe the ArityType is safe.
+-}
 
 -- | The analysis lattice of arity analysis. It is isomorphic to
 --
@@ -574,22 +755,33 @@ Then  f             :: \??.T
 --
 -- We rely on this lattice structure for fixed-point iteration in
 -- 'findRhsArity'. For the semantics of 'ArityType', see Note [ArityType].
-data ArityType
-  = AT ![OneShotInfo] !Divergence
-  -- ^ @AT oss div@ means this value can safely be eta-expanded @length oss@
-  -- times, provided use sites respect the 'OneShotInfo's in @oss@.
-  -- A 'OneShotLam' annotation can come from two sources:
-  --     * The user annotated a lambda as one-shot with 'GHC.Exts.oneShot'
-  --     * It's from a lambda binder of a type affected by `-fstate-hack`.
-  --       See 'idStateHackOneShotInfo'.
-  -- In both cases, 'OneShotLam' should win over 'NoOneShotInfo', see
-  -- Note [Combining case branches].
-  --
-  -- If @div@ is dead-ending ('isDeadEndDiv'), then application to
-  -- @length os@ arguments will surely diverge, similar to the situation
-  -- with 'DmdType'.
+data ArityType  -- See Note [ArityType]
+  = AT ![ATLamInfo] !Divergence
+    -- ^ `AT oss div` is an abstraction of the expression, which describes
+    -- its lambdas, and how much work appears where.
+    -- See Note [ArityType] for more information
+    --
+    -- If `div` is dead-ending ('isDeadEndDiv'), then application to
+    -- `length os` arguments will surely diverge, similar to the situation
+    -- with 'DmdType'.
   deriving Eq
 
+type ATLamInfo = (Cost,OneShotInfo)
+     -- ^ Info about one lambda in an ArityType
+     -- See Note [ArityType]
+
+type SafeArityType = ArityType -- See Note [SafeArityType]
+
+data Cost = IsCheap | IsExpensive
+          deriving( Eq )
+
+allCosts :: (a -> Cost) -> [a] -> Cost
+allCosts f xs = foldr (addCost . f) IsCheap xs
+
+addCost :: Cost -> Cost -> Cost
+addCost IsCheap IsCheap = IsCheap
+addCost _       _       = IsExpensive
+
 -- | This is the BNF of the generated output:
 --
 -- @
@@ -608,57 +800,56 @@ instance Outputable ArityType where
       pp_div Diverges = char '⊥'
       pp_div ExnOrDiv = char 'x'
       pp_div Dunno    = char 'T'
-      pp_os OneShotLam    = char '1'
-      pp_os NoOneShotInfo = char '?'
+      pp_os (IsCheap,     OneShotLam)    = text "(C1)"
+      pp_os (IsExpensive, OneShotLam)    = text "(X1)"
+      pp_os (IsCheap,     NoOneShotInfo) = text "(C?)"
+      pp_os (IsExpensive, NoOneShotInfo) = text "(X?)"
 
 mkBotArityType :: [OneShotInfo] -> ArityType
-mkBotArityType oss = AT oss botDiv
+mkBotArityType oss = AT [(IsCheap,os) | os <- oss] botDiv
 
 botArityType :: ArityType
 botArityType = mkBotArityType []
 
-mkTopArityType :: [OneShotInfo] -> ArityType
-mkTopArityType oss = AT oss topDiv
+mkManifestArityType :: [OneShotInfo] -> ArityType
+mkManifestArityType oss = AT [(IsCheap,os) | os <- oss] topDiv
 
 topArityType :: ArityType
-topArityType = mkTopArityType []
+topArityType = AT [] topDiv
 
 -- | The number of value args for the arity type
-arityTypeArity :: ArityType -> Arity
-arityTypeArity (AT oss _) = length oss
-
--- | True <=> eta-expansion will add at least one lambda
-expandableArityType :: ArityType -> Bool
-expandableArityType at = arityTypeArity at > 0
-
--- | See Note [Dead ends] in "GHC.Types.Demand".
--- Bottom implies a dead end.
-isDeadEndArityType :: ArityType -> Bool
-isDeadEndArityType (AT _ div) = isDeadEndDiv div
-
------------------------
-infixl 2 `maxWithArity`, `minWithArity`
-
--- | Expand a non-bottoming arity type so that it has at least the given arity.
-maxWithArity :: ArityType -> Arity -> ArityType
-maxWithArity at@(AT oss div) !ar
-  | isDeadEndArityType at    = at
-  | oss `lengthAtLeast` ar   = at
-  | otherwise                = AT (take ar $ oss ++ repeat NoOneShotInfo) div
-
--- | Trim an arity type so that it has at most the given arity.
--- Any excess 'OneShotInfo's are truncated to 'topDiv', even if they end in
--- 'ABot'.  See Note [Arity trimming]
-minWithArity :: ArityType -> Arity -> ArityType
-minWithArity at@(AT oss _) ar
-  | oss `lengthAtMost` ar = at
-  | otherwise             = AT (take ar oss) topDiv
-
-----------------------
-takeWhileOneShot :: ArityType -> ArityType
-takeWhileOneShot (AT oss div)
-  | isDeadEndDiv div = AT (takeWhile isOneShotInfo oss) topDiv
-  | otherwise        = AT (takeWhile isOneShotInfo oss) div
+arityTypeArity :: SafeArityType -> Arity
+arityTypeArity (AT lams _) = length lams
+
+arityTypeOneShots :: SafeArityType -> [OneShotInfo]
+-- Returns a list only as long as the arity should be
+arityTypeOneShots (AT lams _) = map snd lams
+
+safeArityType :: ArityType -> SafeArityType
+-- ^ Assuming this ArityType is all we know, find the arity of
+-- the function, and trim the argument info (and Divergenge)
+-- to match that arity. See Note [SafeArityType]
+safeArityType at@(AT lams _)
+  = case go 0 IsCheap lams of
+      Nothing -> at  -- No trimming needed
+      Just ar -> AT (take ar lams) topDiv
+ where
+   go :: Arity -> Cost -> [(Cost,OneShotInfo)] -> Maybe Arity
+   go _ _ [] = Nothing
+   go ar ch1 ((ch2,os):lams)
+      = case (ch1 `addCost` ch2, os) of
+          (IsExpensive, NoOneShotInfo) -> Just ar
+          (ch,          _)             -> go (ar+1) ch lams
+
+infixl 2 `trimArityType`
+
+trimArityType :: Arity -> ArityType -> ArityType
+-- ^ Trim an arity type so that it has at most the given arity.
+-- Any excess 'OneShotInfo's are truncated to 'topDiv', even if
+-- they end in 'ABot'.  See Note [Arity trimming]
+trimArityType max_arity at@(AT lams _)
+  | lams `lengthAtMost` max_arity = at
+  | otherwise                    = AT (take max_arity lams) topDiv
 
 data ArityOpts = ArityOpts
   { ao_ped_bot :: !Bool -- See Note [Dealing with bottom]
@@ -667,10 +858,17 @@ data ArityOpts = ArityOpts
 
 -- | The Arity returned is the number of value args the
 -- expression can be applied to without doing much work
-exprEtaExpandArity :: ArityOpts -> CoreExpr -> ArityType
+exprEtaExpandArity :: ArityOpts -> CoreExpr -> Maybe SafeArityType
 -- exprEtaExpandArity is used when eta expanding
 --      e  ==>  \xy -> e x y
-exprEtaExpandArity opts e = arityType (etaExpandArityEnv opts) e
+-- Nothing if the expression has arity 0
+exprEtaExpandArity opts e
+  | AT [] _ <- arity_type
+  = Nothing
+  | otherwise
+  = Just arity_type
+  where
+    arity_type = safeArityType (arityType (etaExpandArityEnv opts) e)
 
 getBotArity :: ArityType -> Maybe Arity
 -- Arity of a divergent function
@@ -678,29 +876,54 @@ getBotArity (AT oss div)
   | isDeadEndDiv div = Just $ length oss
   | otherwise        = Nothing
 
-----------------------
-findRhsArity :: ArityOpts -> RecFlag -> Id -> CoreExpr -> Arity -> ArityType
+
+{- *********************************************************************
+*                                                                      *
+                   findRhsArity
+*                                                                      *
+********************************************************************* -}
+
+findRhsArity :: ArityOpts -> RecFlag -> Id -> CoreExpr -> Arity -> SafeArityType
 -- This implements the fixpoint loop for arity analysis
 -- See Note [Arity analysis]
 -- If findRhsArity e = (n, is_bot) then
 --  (a) any application of e to <n arguments will not do much work,
 --      so it is safe to expand e  ==>  (\x1..xn. e x1 .. xn)
 --  (b) if is_bot=True, then e applied to n args is guaranteed bottom
-
-findRhsArity opts NonRecursive _ rhs _
-  = arityType (findRhsArityEnv opts) rhs
-
-findRhsArity opts Recursive bndr rhs old_arity
-  = go 0 botArityType
-      -- We always do one step, but usually that produces a result equal to
-      -- old_arity, and then we stop right away, because old_arity is assumed
-      -- to be sound. In other words, arities should never decrease.
-      -- Result: the common case is that there is just one iteration
+--
+-- Returns an ArityType that is guaranteed trimmed to typeArity of 'bndr'
+--         See Note [Arity trimming]
+findRhsArity opts is_rec bndr rhs old_arity
+  = case is_rec of
+      Recursive    -> go 0 botArityType
+      NonRecursive -> step init_env
   where
-    go :: Int -> ArityType -> ArityType
-    go !n cur_at@(AT oss div)
+    init_env :: ArityEnv
+    init_env = findRhsArityEnv opts
+
+    ty_arity     = typeArity (idType bndr)
+    id_one_shots = idDemandOneShots bndr
+
+    step :: ArityEnv -> SafeArityType
+    step env = trimArityType ty_arity $
+               safeArityType $ -- See Note [Arity invariants for bindings], item (3)
+               arityType env rhs `combineWithDemandOneShots` id_one_shots
+       -- trimArityType: see Note [Trim arity inside the loop]
+       -- combineWithDemandOneShots: take account of the demand on the
+       -- binder.  Perhaps it is always called with 2 args
+       --   let f = \x. blah in (f 3 4, f 1 9)
+       -- f's demand-info says how many args it is called with
+
+    -- The fixpoint iteration (go), done for recursive bindings. We
+    -- always do one step, but usually that produces a result equal
+    -- to old_arity, and then we stop right away, because old_arity
+    -- is assumed to be sound. In other words, arities should never
+    -- decrease.  Result: the common case is that there is just one
+    -- iteration
+    go :: Int -> SafeArityType -> SafeArityType
+    go !n cur_at@(AT lams div)
       | not (isDeadEndDiv div)           -- the "stop right away" case
-      , length oss <= old_arity = cur_at -- from above
+      , length lams <= old_arity = cur_at -- from above
       | next_at == cur_at       = cur_at
       | otherwise               =
          -- Warn if more than 2 iterations. Why 2? See Note [Exciting arity]
@@ -709,20 +932,49 @@ findRhsArity opts Recursive bndr rhs old_arity
             (nest 2 (ppr bndr <+> ppr cur_at <+> ppr next_at $$ ppr rhs)) $
             go (n+1) next_at
       where
-        next_at = step cur_at
-
-    step :: ArityType -> ArityType
-    step at = -- pprTrace "step" (vcat [ ppr bndr <+> ppr at <+> ppr (arityType env rhs)
-              --                      , ppr (idType bndr)
-              --                      , ppr (typeArity (idType bndr)) ]) $
-              arityType env rhs
-      where
-        env = extendSigEnv (findRhsArityEnv opts) bndr at
+        next_at = step (extendSigEnv init_env bndr cur_at)
 
+infixl 2 `combineWithDemandOneShots`
 
-{-
-Note [Arity analysis]
-~~~~~~~~~~~~~~~~~~~~~
+combineWithDemandOneShots :: ArityType -> [OneShotInfo] -> ArityType
+-- See Note [Combining arity type with demand info]
+combineWithDemandOneShots at@(AT lams div) oss
+  | null lams = at
+  | otherwise = AT (zip_lams lams oss) div
+  where
+    zip_lams :: [ATLamInfo] -> [OneShotInfo] -> [ATLamInfo]
+    zip_lams lams []  = lams
+    zip_lams []   oss = [ (IsExpensive,OneShotLam)
+                        | _ <- takeWhile isOneShotInfo oss]
+    zip_lams ((ch,os1):lams) (os2:oss)
+      = (ch, os1 `bestOneShot` os2) : zip_lams lams oss
+
+idDemandOneShots :: Id -> [OneShotInfo]
+idDemandOneShots bndr
+  = call_arity_one_shots `zip_lams` dmd_one_shots
+  where
+    call_arity_one_shots :: [OneShotInfo]
+    call_arity_one_shots
+      | call_arity == 0 = []
+      | otherwise       = NoOneShotInfo : replicate (call_arity-1) OneShotLam
+    -- Call Arity analysis says the function is always called
+    -- applied to this many arguments.  The first NoOneShotInfo is because
+    -- if Call Arity says "always applied to 3 args" then the one-shot info
+    -- we get is [NoOneShotInfo, OneShotLam, OneShotLam]
+    call_arity = idCallArity bndr
+
+    dmd_one_shots :: [OneShotInfo]
+    -- If the demand info is Cx(C1(C1(.))) then we know that an
+    -- application to one arg is also an application to three
+    dmd_one_shots = argOneShots (idDemandInfo bndr)
+
+    -- Take the *longer* list
+    zip_lams (lam1:lams1) (lam2:lams2) = (lam1 `bestOneShot` lam2) : zip_lams lams1 lams2
+    zip_lams []           lams2        = lams2
+    zip_lams lams1        []           = lams1
+
+{- Note [Arity analysis]
+~~~~~~~~~~~~~~~~~~~~~~~~
 The motivating example for arity analysis is this:
 
   f = \x. let g = f (x+1)
@@ -784,57 +1036,118 @@ to floatIn the non-cheap let-binding.  Which is all perfectly benign, but
 means we do two iterations (well, actually 3 'step's to detect we are stable)
 and don't want to emit the warning.
 
-Note [Eta expanding through dictionaries]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If the experimental -fdicts-cheap flag is on, we eta-expand through
-dictionary bindings.  This improves arities. Thereby, it also
-means that full laziness is less prone to floating out the
-application of a function to its dictionary arguments, which
-can thereby lose opportunities for fusion.  Example:
-        foo :: Ord a => a -> ...
-     foo = /\a \(d:Ord a). let d' = ...d... in \(x:a). ....
-        -- So foo has arity 1
+Note [Trim arity inside the loop]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Here's an example (from gadt/nbe.hs) which caused trouble.
+  data Exp g t where
+     Lam :: Ty a -> Exp (g,a) b -> Exp g (a->b)
 
-     f = \x. foo dInt $ bar x
+  eval :: Exp g t -> g -> t
+  eval (Lam _ e) g = \a -> eval e (g,a)
 
-The (foo DInt) is floated out, and makes ineffective a RULE
-     foo (bar x) = ...
+The danger is that we get arity 3 from analysing this; and the
+next time arity 4, and so on for ever.  Solution: use trimArityType
+on each iteration.
 
-One could go further and make exprIsCheap reply True to any
-dictionary-typed expression, but that's more work.
+Note [Combining arity type with demand info]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+   let f = \x. let y = <expensive> in \p \q{os}. blah
+   in ...(f a b)...(f c d)...
+
+* From the RHS we get an ArityType like
+    AT [ (IsCheap,?), (IsExpensive,?), (IsCheap,OneShotLam) ] Dunno
+  where "?" means NoOneShotInfo
+
+* From the body, the demand analyser (or Call Arity) will tell us
+  that the function is always applied to at least two arguments.
+
+Combining these two pieces of info, we can get the final ArityType
+    AT [ (IsCheap,?), (IsExpensive,OneShotLam), (IsCheap,OneShotLam) ] Dunno
+result: arity=3, which is better than we could do from either
+source alone.
+
+The "combining" part is done by combineWithDemandOneShots.  It
+uses info from both Call Arity and demand analysis.
+
+We may have /more/ call demands from the calls than we have lambdas
+in the binding.  E.g.
+    let f1 = \x. g x x in ...(f1 p q r)...
+    -- Demand on f1 is Cx(C1(C1(L)))
+
+    let f2 = \y. error y in ...(f2 p q r)...
+    -- Demand on f2 is Cx(C1(C1(L)))
+
+In both these cases we can eta expand f1 and f2 to arity 3.
+But /only/ for called-once demands.  Suppose we had
+    let f1 = \y. g x x in ...let h = f1 p q in ...(h r1)...(h r2)...
+
+Now we don't want to eta-expand f1 to have 3 args; only two.
+Nor, in the case of f2, do we want to push that error call under
+a lambda.  Hence the takeWhile in combineWithDemandDoneShots.
 -}
 
+
+{- *********************************************************************
+*                                                                      *
+                   arityType
+*                                                                      *
+********************************************************************* -}
+
 arityLam :: Id -> ArityType -> ArityType
-arityLam id (AT oss div) = AT (idStateHackOneShotInfo id : oss) div
+arityLam id (AT oss div)
+  = AT ((IsCheap, one_shot) : oss) div
+  where
+    one_shot | isDeadEndDiv div = OneShotLam
+             | otherwise        = idStateHackOneShotInfo id
+    -- If the body diverges, treat it as one-shot: no point
+    -- in floating out, and no penalty for floating in
+    -- See Wrinkle [Bottoming functions] in Note [ArityType]
 
-floatIn :: Bool -> ArityType -> ArityType
+floatIn :: Cost -> ArityType -> ArityType
 -- We have something like (let x = E in b),
 -- where b has the given arity type.
-floatIn cheap at
-  | isDeadEndArityType at || cheap = at
-  -- If E is not cheap, keep arity only for one-shots
-  | otherwise                      = takeWhileOneShot at
+floatIn IsCheap     at = at
+floatIn IsExpensive at = addWork at
+
+addWork :: ArityType -> ArityType
+addWork at@(AT lams div)
+  = case lams of
+      []      -> at
+      lam:lams' -> AT (add_work lam : lams') div
+  where
+    add_work :: ATLamInfo -> ATLamInfo
+    add_work (_,os) = (IsExpensive,os)
 
-arityApp :: ArityType -> Bool -> ArityType
+arityApp :: ArityType -> Cost -> ArityType
 -- Processing (fun arg) where at is the ArityType of fun,
 -- Knock off an argument and behave like 'let'
-arityApp (AT (_:oss) div) cheap = floatIn cheap (AT oss div)
-arityApp at               _     = at
+arityApp (AT ((ch1,_):oss) div) ch2 = floatIn (ch1 `addCost` ch2) (AT oss div)
+arityApp at                     _   = at
 
 -- | Least upper bound in the 'ArityType' lattice.
 -- See the haddocks on 'ArityType' for the lattice.
 --
 -- Used for branches of a @case@.
 andArityType :: ArityType -> ArityType -> ArityType
-andArityType (AT (os1:oss1) div1) (AT (os2:oss2) div2)
-  | AT oss' div' <- andArityType (AT oss1 div1) (AT oss2 div2)
-  = AT ((os1 `bestOneShot` os2) : oss') div' -- See Note [Combining case branches]
-andArityType (AT []         div1) at2
-  | isDeadEndDiv div1 = at2                  -- Note [ABot branches: max arity wins]
-  | otherwise         = takeWhileOneShot at2 -- See Note [Combining case branches]
-andArityType at1                  (AT []         div2)
-  | isDeadEndDiv div2 = at1                  -- Note [ABot branches: max arity wins]
-  | otherwise         = takeWhileOneShot at1 -- See Note [Combining case branches]
+andArityType (AT (lam1:lams1) div1) (AT (lam2:lams2) div2)
+  | AT lams' div' <- andArityType (AT lams1 div1) (AT lams2 div2)
+  = AT ((lam1 `and_lam` lam2) : lams') div' -- See Note [Combining case branches]
+  where
+    (ch1,os1) `and_lam` (ch2,os2)
+      = ( ch1 `addCost` ch2, os1 `bestOneShot` os2)
+
+andArityType (AT [] div1) at2 = andWithTail div1 at2
+andArityType at1 (AT [] div2) = andWithTail div2 at1
+
+andWithTail :: Divergence -> ArityType -> ArityType
+andWithTail div1 at2@(AT oss2 _)
+  | isDeadEndDiv div1     -- case x of { T -> error; F -> \y.e }
+  = at2
+  | otherwise  -- case x of { T -> plusInt <expensive>; F -> \y.e }
+  = addWork (AT oss2 topDiv)   -- We know div1 = topDiv
+        -- Note [ABot branches: max arity wins]
+        -- See Note [Combining case branches]
 
 {- Note [ABot branches: max arity wins]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -866,29 +1179,6 @@ basis that if we know one branch is one-shot, then they all must be.
 Surprisingly, this means that the one-shot arity type is effectively the top
 element of the lattice.
 
-Note [Arity trimming]
-~~~~~~~~~~~~~~~~~~~~~
-Consider ((\x y. blah) |> co), where co :: (Int->Int->Int) ~ (Int -> F a) , and
-F is some type family.
-
-Because of Note [exprArity invariant], item (2), we must return with arity at
-most 1, because typeArity (Int -> F a) = 1.  So we have to trim the result of
-calling arityType on (\x y. blah).  Failing to do so, and hence breaking the
-exprArity invariant, led to #5441.
-
-How to trim?  If we end in topDiv, it's easy.  But we must take great care with
-dead ends (i.e. botDiv). Suppose the expression was (\x y. error "urk"),
-we'll get \??.⊥.  We absolutely must not trim that to \?.⊥, because that
-claims that ((\x y. error "urk") |> co) diverges when given one argument,
-which it absolutely does not. And Bad Things happen if we think something
-returns bottom when it doesn't (#16066).
-
-So, if we need to trim a dead-ending arity type, switch (conservatively) to
-topDiv.
-
-Historical note: long ago, we unconditionally switched to topDiv when we
-encountered a cast, but that is far too conservative: see #5475
-
 Note [Eta expanding through CallStacks]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Just as it's good to eta-expand through dictionaries, so it is good to
@@ -899,6 +1189,25 @@ do so through CallStacks.  #20103 is a case in point, where we got
 
 We really want to eta-expand this!  #20103 is quite convincing!
 We do this regardless of -fdicts-cheap; it's not really a dictionary.
+
+Note [Eta expanding through dictionaries]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If the experimental -fdicts-cheap flag is on, we eta-expand through
+dictionary bindings.  This improves arities. Thereby, it also
+means that full laziness is less prone to floating out the
+application of a function to its dictionary arguments, which
+can thereby lose opportunities for fusion.  Example:
+        foo :: Ord a => a -> ...
+     foo = /\a \(d:Ord a). let d' = ...d... in \(x:a). ....
+        -- So foo has arity 1
+
+     f = \x. foo dInt $ bar x
+
+The (foo DInt) is floated out, and makes ineffective a RULE
+     foo (bar x) = ...
+
+One could go further and make exprIsCheap reply True to any
+dictionary-typed expression, but that's more work.
 -}
 
 ---------------------------
@@ -921,14 +1230,18 @@ We do this regardless of -fdicts-cheap; it's not really a dictionary.
 data AnalysisMode
   = BotStrictness
   -- ^ Used during 'exprBotStrictness_maybe'.
+
   | EtaExpandArity { am_opts :: !ArityOpts }
-  -- ^ Used for finding an expression's eta-expanding arity quickly, without
-  -- fixed-point iteration ('exprEtaExpandArity').
-  | FindRhsArity { am_opts :: !ArityOpts
-                 , am_sigs :: !(IdEnv ArityType) }
+  -- ^ Used for finding an expression's eta-expanding arity quickly,
+  -- without fixed-point iteration ('exprEtaExpandArity').
+
+  | FindRhsArity { am_opts        :: !ArityOpts
+                 , am_sigs        :: !(IdEnv SafeArityType) }
   -- ^ Used for regular, fixed-point arity analysis ('findRhsArity').
   --   See Note [Arity analysis] for details about fixed-point iteration.
-  --   INVARIANT: Disjoint with 'ae_joins'.
+  -- am_dicts_cheap: see Note [Eta expanding through dictionaries]
+  -- am_sigs: note `SafeArityType` so we can use this in myIsCheapApp
+  -- INVARIANT: am_sigs is disjoint with 'ae_joins'.
 
 data ArityEnv
   = AE
@@ -991,9 +1304,11 @@ extendJoinEnv env@(AE { ae_joins = joins }) join_ids
   = del_sig_env_list join_ids
   $ env { ae_joins = joins `extendVarSetList` join_ids }
 
-extendSigEnv :: ArityEnv -> Id -> ArityType -> ArityEnv
+extendSigEnv :: ArityEnv -> Id -> SafeArityType -> ArityEnv
 extendSigEnv env id ar_ty
-  = del_join_env id (modifySigEnv (\sigs -> extendVarEnv sigs id ar_ty) env)
+  = del_join_env id                                    $
+    modifySigEnv (\sigs -> extendVarEnv sigs id ar_ty) $
+    env
 
 delInScope :: ArityEnv -> Id -> ArityEnv
 delInScope env id = del_join_env id $ del_sig_env id env
@@ -1001,7 +1316,7 @@ delInScope env id = del_join_env id $ del_sig_env id env
 delInScopeList :: ArityEnv -> [Id] -> ArityEnv
 delInScopeList env ids = del_join_env_list ids $ del_sig_env_list ids env
 
-lookupSigEnv :: ArityEnv -> Id -> Maybe ArityType
+lookupSigEnv :: ArityEnv -> Id -> Maybe SafeArityType
 lookupSigEnv AE{ ae_mode = mode } id = case mode of
   BotStrictness                  -> Nothing
   EtaExpandArity{}               -> Nothing
@@ -1015,6 +1330,11 @@ pedanticBottoms AE{ ae_mode = mode } = case mode of
   EtaExpandArity{ am_opts = ArityOpts{ ao_ped_bot = ped_bot } } -> ped_bot
   FindRhsArity{ am_opts = ArityOpts{ ao_ped_bot = ped_bot } }   -> ped_bot
 
+exprCost :: ArityEnv -> CoreExpr -> Maybe Type -> Cost
+exprCost env e mb_ty
+  | myExprIsCheap env e mb_ty = IsCheap
+  | otherwise                 = IsExpensive
+
 -- | A version of 'exprIsCheap' that considers results from arity analysis
 -- and optionally the expression's type.
 -- Under 'exprBotStrictness_maybe', no expressions are cheap.
@@ -1040,17 +1360,20 @@ myExprIsCheap AE{ae_mode = mode} e mb_ty = case mode of
 -- | A version of 'isCheapApp' that considers results from arity analysis.
 -- See Note [Arity analysis] for what's in the signature environment and why
 -- it's important.
-myIsCheapApp :: IdEnv ArityType -> CheapAppFun
+myIsCheapApp :: IdEnv SafeArityType -> CheapAppFun
 myIsCheapApp sigs fn n_val_args = case lookupVarEnv sigs fn of
+
   -- Nothing means not a local function, fall back to regular
   -- 'GHC.Core.Utils.isCheapApp'
-  Nothing         -> isCheapApp fn n_val_args
-  -- @Just at@ means local function with @at@ as current ArityType.
+  Nothing -> isCheapApp fn n_val_args
+
+  -- `Just at` means local function with `at` as current SafeArityType.
   -- Roughly approximate what 'isCheapApp' is doing.
-  Just (AT oss div)
+  Just (AT lams div)
     | isDeadEndDiv div -> True -- See Note [isCheapApp: bottoming functions] in GHC.Core.Utils
-    | n_val_args < length oss -> True -- Essentially isWorkFreeApp
-    | otherwise -> False
+    | n_val_args == 0          -> True -- Essentially
+    | n_val_args < length lams -> True -- isWorkFreeApp
+    | otherwise                -> False
 
 ----------------
 arityType :: ArityEnv -> CoreExpr -> ArityType
@@ -1077,7 +1400,10 @@ arityType env (Lam x e)
 arityType env (App fun (Type _))
    = arityType env fun
 arityType env (App fun arg )
-   = arityApp (arityType env fun) (myExprIsCheap env arg Nothing)
+   = arityApp fun_at arg_cost
+   where
+     fun_at   = arityType env fun
+     arg_cost = exprCost env arg Nothing
 
         -- Case/Let; keep arity if either the expression is cheap
         -- or it's a 1-shot lambda
@@ -1098,9 +1424,8 @@ arityType env (Case scrut bndr _ alts)
   | exprOkForSpeculation scrut
   = alts_type
 
-  | otherwise                  -- In the remaining cases we may not push
-  = takeWhileOneShot alts_type -- evaluation of the scrutinee in
-
+  | otherwise            -- In the remaining cases we may not push
+  = addWork alts_type -- evaluation of the scrutinee in
   where
     env' = delInScope env bndr
     arity_type_alt (Alt _con bndrs rhs) = arityType (delInScopeList env' bndrs) rhs
@@ -1128,17 +1453,17 @@ arityType env (Let (Rec pairs) body)
       | otherwise
       = pprPanic "arityType:joinrec" (ppr pairs)
 
-arityType env (Let (NonRec b r) e)
-  = floatIn cheap_rhs (arityType env' e)
+arityType env (Let (NonRec b rhs) e)
+  = floatIn rhs_cost (arityType env' e)
   where
-    cheap_rhs = myExprIsCheap env r (Just (idType b))
-    env'      = extendSigEnv env b (arityType env r)
+    rhs_cost = exprCost env rhs (Just (idType b))
+    env'     = extendSigEnv env b (safeArityType (arityType env rhs))
 
 arityType env (Let (Rec prs) e)
-  = floatIn (all is_cheap prs) (arityType env' e)
+  = floatIn (allCosts bind_cost prs) (arityType env' e)
   where
-    env'           = delInScopeList env (map fst prs)
-    is_cheap (b,e) = myExprIsCheap env' e (Just (idType b))
+    env'            = delInScopeList env (map fst prs)
+    bind_cost (b,e) = exprCost env' e (Just (idType b))
 
 arityType env (Tick t e)
   | not (tickishIsCode t)     = arityType env e
@@ -1201,7 +1526,7 @@ environment mapping let-bound Ids to their ArityType.
 idArityType :: Id -> ArityType
 idArityType v
   | strict_sig <- idDmdSig v
-  , not $ isTopSig strict_sig
+  , not $ isNopSig strict_sig
   , (ds, div) <- splitDmdSig strict_sig
   , let arity = length ds
   -- Every strictness signature admits an arity signature!
@@ -1209,8 +1534,8 @@ idArityType v
   | otherwise
   = AT (take (idArity v) one_shots) topDiv
   where
-    one_shots :: [OneShotInfo]  -- One-shot-ness derived from the type
-    one_shots = typeOneShots (idType v)
+    one_shots :: [(Cost,OneShotInfo)]  -- One-shot-ness derived from the type
+    one_shots = repeat IsCheap `zip` typeOneShots (idType v)
 
 {-
 %************************************************************************
@@ -1319,7 +1644,7 @@ Consider
 We'll get an ArityType for foo of \?1.T.
 
 Then we want to eta-expand to
-  foo = (\x. \eta{os}. (case x of ...as before...) eta)) |> some_co
+  foo = (\x. \eta{os}. (case x of ...as before...) eta) |> some_co
 
 That 'eta' binder is fresh, and we really want it to have the
 one-shot flag from the inner \s{os}.  By expanding with the
@@ -1347,14 +1672,14 @@ etaExpand n orig_expr
     in_scope = {-#SCC "eta_expand:in-scopeX" #-}
                mkInScopeSet (exprFreeVars orig_expr)
 
-etaExpandAT :: InScopeSet -> ArityType -> CoreExpr -> CoreExpr
+etaExpandAT :: InScopeSet -> SafeArityType -> CoreExpr -> CoreExpr
 -- See Note [Eta expansion with ArityType]
 --
 -- We pass in the InScopeSet from the simplifier to avoid recomputing
 -- it here, which can be jolly expensive if the casts are big
 -- In #18223 it took 10% of compile time just to do the exprFreeVars!
-etaExpandAT in_scope (AT oss _) orig_expr
-  = eta_expand in_scope oss orig_expr
+etaExpandAT in_scope at orig_expr
+  = eta_expand in_scope (arityTypeOneShots at) orig_expr
 
 -- etaExpand arity e = res
 -- Then 'res' has at least 'arity' lambdas at the top
@@ -1369,7 +1694,11 @@ etaExpandAT in_scope (AT oss _) orig_expr
 
 eta_expand :: InScopeSet -> [OneShotInfo] -> CoreExpr -> CoreExpr
 eta_expand in_scope one_shots (Cast expr co)
-  = Cast (eta_expand in_scope one_shots expr) co
+  = mkCast (eta_expand in_scope one_shots expr) co
+    -- This mkCast is important, because eta_expand might return an
+    -- expression with a cast at the outside; and tryCastWorkerWrapper
+    -- asssumes that we don't have nested casts. Makes a difference
+    -- in compile-time for T18223
 
 eta_expand in_scope one_shots orig_expr
   = go in_scope one_shots [] orig_expr
@@ -1440,7 +1769,7 @@ casts complicate the question.  If we have
 and
    e :: N (N Int)
 then the eta-expansion should look like
-   (\(x::S) (y::S) -> e |> co x y) |> sym co
+   (\(x::S) (y::S) -> (e |> co) x y) |> sym co
 where
   co :: N (N Int) ~ S -> S -> Int
   co = axN @(N Int) ; (S -> axN @Int)
@@ -1619,11 +1948,11 @@ mkEtaWW orig_oss ppr_orig_expr in_scope orig_ty
     -- (go [o1,..,on] subst ty) = (in_scope, EI [b1,..,bn] co)
     --    co :: subst(ty) ~ b1_ty -> ... -> bn_ty -> tr
 
-    go _ [] subst _       -- See Note [exprArity invariant]
+    go _ [] subst _
        ----------- Done!  No more expansion needed
        = (getTCvInScope subst, EI [] MRefl)
 
-    go n oss@(one_shot:oss1) subst ty       -- See Note [exprArity invariant]
+    go n oss@(one_shot:oss1) subst ty
        ----------- Forall types  (forall a. ty)
        | Just (tcv,ty') <- splitForAllTyCoVar_maybe ty
        , (subst', tcv') <- Type.substVarBndr subst tcv
@@ -1676,6 +2005,428 @@ mkEtaWW orig_oss ppr_orig_expr in_scope orig_ty
         -- with an explicit lambda having a non-function type
 
 
+{-
+************************************************************************
+*                                                                      *
+                Eta reduction
+*                                                                      *
+************************************************************************
+
+Note [Eta reduction makes sense]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+GHC's eta reduction transforms
+   \x y. <fun> x y  --->  <fun>
+We discuss when this is /sound/ in Note [Eta reduction soundness].
+But even assuming it is sound, when is it /desirable/.  That
+is what we discuss here.
+
+This test is made by `ok_fun` in tryEtaReduce.
+
+1. We want to eta-reduce only if we get all the way to a trivial
+   expression; we don't want to remove extra lambdas unless we are
+   going to avoid allocating this thing altogether.
+
+   Trivial means *including* casts and type lambdas:
+     * `\x. f x |> co  -->  f |> (ty(x) -> co)` (provided `co` doesn't mention `x`)
+     * `/\a. \x. f @(Maybe a) x -->  /\a. f @(Maybe a)`
+   See Note [Do not eta reduce PAPs] for why we insist on a trivial head.
+
+2. Type and dictionary abstraction. Regardless of whether 'f' is a value, it
+   is always sound to reduce /type lambdas/, thus:
+        (/\a -> f a)  -->   f
+   Moreover, we always want to, because it makes RULEs apply more often:
+      This RULE:    `forall g. foldr (build (/\a -> g a))`
+      should match  `foldr (build (/\b -> ...something complex...))`
+   and the simplest way to do so is eta-reduce `/\a -> g a` in the RULE to `g`.
+
+   The type checker can insert these eta-expanded versions,
+   with both type and dictionary lambdas; hence the slightly
+   ad-hoc (all ok_lam bndrs)
+
+3. (See fun_arity in tryEtaReduce.) We have to hide `f`'s `idArity` in
+   its own RHS, lest we suffer from the last point of Note [Arity
+   robustness] in GHC.Core.Opt.Simplify.Env. There we have `f = \x. f x`
+   and we should not eta-reduce to `f=f`. Which might change a
+   terminating program (think @f `seq` e@) to a non-terminating one.
+   So we check for being a loop breaker first. However for GlobalIds
+   we can look at the arity; and for primops we must, since they have
+   no unfolding.  [SG: Perhaps this is rather a soundness subtlety?]
+
+Of course, eta reduction is not always sound. See Note [Eta reduction soundness]
+for when it is.
+
+When there are multiple arguments, we might get multiple eta-redexes. Example:
+   \x y. e x y
+   ==> { reduce \y. (e x) y in context \x._ }
+   \x. e x
+   ==> { reduce \x. e x in context _ }
+   e
+And (1) implies that we never want to stop with `\x. e x`, because that is not a
+trivial expression. So in practice, the implementation works by considering a
+whole group of leading lambdas to reduce.
+
+These delicacies are why we don't simply use 'exprIsTrivial' and 'exprIsHNF'
+in 'tryEtaReduce'. Alas.
+
+Note [Eta reduction soundness]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+GHC's eta reduction transforms
+   \x y. <fun> x y  --->  <fun>
+For soundness, we obviously require that `x` and `y`
+to not occur free. But what /other/ restrictions are there for
+eta reduction to be sound?
+
+We discuss separately what it means for eta reduction to be
+/desirable/, in Note [Eta reduction makes sense].
+
+Eta reduction is *not* a sound transformation in general, because it
+may change termination behavior if *value* lambdas are involved:
+  `bot`  /=  `\x. bot x`   (as can be observed by a simple `seq`)
+The past has shown that oversight of this fact can not only lead to endless
+loops or exceptions, but also straight out *segfaults*.
+
+Nevertheless, we can give the following criteria for when it is sound to
+perform eta reduction on an expression with n leading lambdas `\xs. e xs`
+(checked in 'is_eta_reduction_sound' in 'tryEtaReduce', which focuses on the
+case where `e` is trivial):
+
+ A. It is sound to eta-reduce n arguments as long as n does not exceed the
+    `exprArity` of `e`. (Needs Arity analysis.)
+    This criterion exploits information about how `e` is *defined*.
+
+    Example: If `e = \x. bot` then we know it won't diverge until it is called
+    with one argument. Hence it is safe to eta-reduce `\x. e x` to `e`.
+    By contrast, it would be *unsound* to eta-reduce 2 args, `\x y. e x y` to `e`:
+    `e 42` diverges when `(\x y. e x y) 42` does not.
+
+ S. It is sound to eta-reduce n arguments in an evaluation context in which all
+    calls happen with at least n arguments. (Needs Strictness analysis.)
+    NB: This treats evaluations like a call with 0 args.
+    NB: This criterion exploits information about how `e` is *used*.
+
+    Example: Given a function `g` like
+      `g c = Just (c 1 2 + c 2 3)`
+    it is safe to eta-reduce the arg in `g (\x y. e x y)` to `g e` without
+    knowing *anything* about `e` (perhaps it's a parameter occ itself), simply
+    because `g` always calls its parameter with 2 arguments.
+    It is also safe to eta-reduce just one arg, e.g., `g (\x. e x)` to `g e`.
+    By contrast, it would *unsound* to eta-reduce 3 args in a call site
+    like `g (\x y z. e x y z)` to `g e`, because that diverges when
+    `e = \x y. bot`.
+
+    Could we relax to "*At least one call in the same trace* is with n args"?
+    (NB: Strictness analysis can only answer this relaxed question, not the
+     original formulation.)
+    Consider what happens for
+      ``g2 c = c True `seq` c False 42``
+    Here, `g2` will call `c` with 2 arguments (if there is a call at all).
+    But it is unsound to eta-reduce the arg in `g2 (\x y. e x y)` to `g2 e`
+    when `e = \x. if x then bot else id`, because the latter will diverge when
+    the former would not.
+
+    On the other hand, with `-fno-pendantic-bottoms` , we will have eta-expanded
+    the definition of `e` and then eta-reduction is sound
+    (see Note [Dealing with bottom]).
+    Consequence: We have to check that `-fpedantic-bottoms` is off; otherwise
+    eta-reduction based on demands is in fact unsound.
+
+    See Note [Eta reduction based on evaluation context] for the implementation
+    details. This criterion is tested extensively in T21261.
+
+ E. (See fun_arity in tryEtaReduce.) As a perhaps special case on the
+    boundary of (A) and (S), when we know that a fun binder `f` is in
+    WHNF, we simply assume it has arity 1 and apply (A).  Example:
+       g f = f `seq` \x. f x
+    Here it's sound eta-reduce `\x. f x` to `f`, because `f` can't be bottom
+    after the `seq`. This turned up in #7542.
+
+And here are a few more technical criteria for when it is *not* sound to
+eta-reduce that are specific to Core and GHC:
+
+ L. With linear types, eta-reduction can break type-checking:
+      f :: A ⊸ B
+      g :: A -> B
+      g = \x. f x
+    The above is correct, but eta-reducing g would yield g=f, the linter will
+    complain that g and f don't have the same type. NB: Not unsound in the
+    dynamic semantics, but unsound according to the static semantics of Core.
+
+ J. We may not undersaturate join points.
+    See Note [Invariants on join points] in GHC.Core, and #20599.
+
+ B. We may not undersaturate functions with no binding.
+    See Note [Eta expanding primops].
+
+ W. We may not undersaturate StrictWorkerIds.
+    See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
+
+Here is a list of historic accidents surrounding unsound eta-reduction:
+
+* Consider
+        f = \x.f x
+        h y = case (case y of { True -> f `seq` True; False -> False }) of
+                True -> ...; False -> ...
+  If we (unsoundly) eta-reduce f to get f=f, the strictness analyser
+  says f=bottom, and replaces the (f `seq` True) with just
+  (f `cast` unsafe-co).
+  [SG in 2022: I don't think worker/wrapper would do this today.]
+  BUT, as things stand, 'f' got arity 1, and it *keeps* arity 1 (perhaps also
+  wrongly). So CorePrep eta-expands the definition again, so that it does not
+  terminate after all.
+  Result: seg-fault because the boolean case actually gets a function value.
+  See #1947.
+
+* Never *reduce* arity. For example
+      f = \xy. g x y
+  Then if h has arity 1 we don't want to eta-reduce because then
+  f's arity would decrease, and that is bad
+  [SG in 2022: I don't understand this point. There is no `h`, perhaps that
+   should have been `g`. Even then, this proposed eta-reduction is invalid by
+   criterion (A), which might actually be the point this anecdote is trying to
+   make. Perhaps the "no arity decrease" idea is also related to
+   Note [Arity robustness]?]
+
+Note [Do not eta reduce PAPs]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+I considered eta-reducing if the result is a PAP:
+   \x. f e1 e2 x  ==>   f e1 e2
+
+This reduces clutter, sometimes a lot. See Note [Do not eta-expand PAPs]
+in GHC.Core.Opt.Simplify.Utils, where we are careful not to eta-expand
+a PAP.  If eta-expanding is bad, then eta-reducing is good!
+
+Also the code generator likes eta-reduced PAPs; see GHC.CoreToStg.Prep
+Note [No eta reduction needed in rhsToBody].
+
+But note that we don't want to eta-reduce
+     \x y.  f <expensive> x y
+to
+     f <expensive>
+The former has arity 2, and repeats <expensive> for every call of the
+function; the latter has arity 0, and shares <expensive>.  We don't want
+to change behaviour.  Hence the call to exprIsCheap in ok_fun.
+
+I noticed this when examining #18993 and, although it is delicate,
+eta-reducing to a PAP happens to fix the regression in #18993.
+
+HOWEVER, if we transform
+   \x. f y x   ==>   f y
+that might mean that f isn't saturated any more, and does not inline.
+This led to some other regressions.
+
+TL;DR currrently we do /not/ eta reduce if the result is a PAP.
+
+Note [Eta reduction with casted arguments]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+    (\(x:t3). f (x |> g)) :: t3 -> t2
+  where
+    f :: t1 -> t2
+    g :: t3 ~ t1
+This should be eta-reduced to
+
+    f |> (sym g -> t2)
+
+So we need to accumulate a coercion, pushing it inward (past
+variable arguments only) thus:
+   f (x |> co_arg) |> co  -->  (f |> (sym co_arg -> co)) x
+   f (x:t)         |> co  -->  (f |> (t -> co)) x
+   f @ a           |> co  -->  (f |> (forall a.co)) @ a
+   f @ (g:t1~t2)   |> co  -->  (f |> (t1~t2 => co)) @ (g:t1~t2)
+These are the equations for ok_arg.
+
+Note [Eta reduction with casted function]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Since we are pushing a coercion inwards, it is easy to accommodate
+    (\xy. (f x |> g) y)
+    (\xy. (f x y) |> g)
+
+See the `(Cast e co)` equation for `go` in `tryEtaReduce`.  The
+eta-expander pushes those casts outwards, so you might think we won't
+ever see a cast here, but if we have
+  \xy. (f x y |> g)
+we will call tryEtaReduce [x,y] (f x y |> g), and we'd like that to
+work.  This happens in GHC.Core.Opt.Simplify.Utils.mkLam, where
+eta-expansion may be turned off (by sm_eta_expand).
+
+Note [Eta reduction based on evaluation context]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Eta reduction soundness], criterion (S) allows us to eta-reduce
+`g (\x y. e x y)` to `g e` when we know that `g` always calls its parameter with
+at least 2 arguments. So how do we read that off `g`'s demand signature?
+
+Let's take the simple example of #21261, where `g` (actually, `f`) is defined as
+  g c = c 1 2 + c 3 4
+Then this is how the pieces are put together:
+
+  * Demand analysis infers `<SCS(C1(L))>` for `g`'s demand signature
+
+  * When the Simplifier next simplifies the argument in `g (\x y. e x y)`, it
+    looks up the *evaluation context* of the argument in the form of the
+    sub-demand `CS(C1(L))` and stores it in the 'SimplCont'.
+    (Why does it drop the outer evaluation cardinality of the demand, `S`?
+    Because it's irrelevant! When we simplify an expression, we do so under the
+    assumption that it is currently under evaluation.)
+    This sub-demand literally says "Whenever this expression is evaluated, it
+    is also called with two arguments, potentially multiple times".
+
+  * Then the simplifier takes apart the lambda and simplifies the lambda group
+    and then calls 'tryEtaReduce' when rebuilding the lambda, passing the
+    evaluation context `CS(C1(L))` along. Then we simply peel off 2 call
+    sub-demands `Cn` and see whether all of the n's (here: `S=C_1N` and
+    `1=C_11`) were strict. And strict they are! Thus, it will eta-reduce
+    `\x y. e x y` to `e`.
+-}
+
+-- | `tryEtaReduce [x,y,z] e sd` returns `Just e'` if `\x y z -> e` is evaluated
+-- according to `sd` and can soundly and gainfully be eta-reduced to `e'`.
+-- See Note [Eta reduction soundness]
+-- and Note [Eta reduction makes sense] when that is the case.
+tryEtaReduce :: [Var] -> CoreExpr -> SubDemand -> Maybe CoreExpr
+-- Return an expression equal to (\bndrs. body)
+tryEtaReduce bndrs body eval_sd
+  = go (reverse bndrs) body (mkRepReflCo (exprType body))
+  where
+    incoming_arity = count isId bndrs -- See Note [Eta reduction makes sense], point (2)
+
+    go :: [Var]            -- Binders, innermost first, types [a3,a2,a1]
+       -> CoreExpr         -- Of type tr
+       -> Coercion         -- Of type tr ~ ts
+       -> Maybe CoreExpr   -- Of type a1 -> a2 -> a3 -> ts
+    -- See Note [Eta reduction with casted arguments]
+    -- for why we have an accumulating coercion
+    --
+    -- Invariant: (go bs body co) returns an expression
+    --            equivalent to (\(reverse bs). body |> co)
+
+    -- See Note [Eta reduction with casted function]
+    go bs (Cast e co1) co2
+      = go bs e (co1 `mkTransCo` co2)
+
+    go bs (Tick t e) co
+      | tickishFloatable t
+      = fmap (Tick t) $ go bs e co
+      -- Float app ticks: \x -> Tick t (e x) ==> Tick t e
+
+    go (b : bs) (App fun arg) co
+      | Just (co', ticks) <- ok_arg b arg co (exprType fun)
+      = fmap (flip (foldr mkTick) ticks) $ go bs fun co'
+            -- Float arg ticks: \x -> e (Tick t x) ==> Tick t e
+
+    go remaining_bndrs fun co
+      | all isTyVar remaining_bndrs
+            -- If all the remaining_bnrs are tyvars, then the etad_exp
+            --    will be trivial, which is what we want.
+            -- e.g. We might have  /\a \b. f [a] b, and we want to
+            --      eta-reduce to  /\a. f [a]
+            -- We don't want to give up on this one: see #20040
+            -- See Note [Eta reduction makes sense], point (1)
+      , remaining_bndrs `ltLength` bndrs
+            -- Only reply Just if /something/ has happened
+      , ok_fun fun
+      , let etad_expr = mkLams (reverse remaining_bndrs) (mkCast fun co)
+            used_vars = exprFreeVars etad_expr
+            reduced_bndrs = mkVarSet (dropList remaining_bndrs bndrs)
+      , used_vars `disjointVarSet` reduced_bndrs
+          -- Check for any of the binders free in the result,
+          -- including the accumulated coercion
+          -- See Note [Eta reduction makes sense], intro and point (1)
+      = Just etad_expr
+
+    go _remaining_bndrs _fun  _  = -- pprTrace "tER fail" (ppr _fun $$ ppr _remaining_bndrs) $
+                                   Nothing
+
+    ---------------
+    -- See Note [Eta reduction makes sense], point (1)
+    ok_fun (App fun (Type {})) = ok_fun fun
+    ok_fun (Cast fun _)        = ok_fun fun
+    ok_fun (Tick _ expr)       = ok_fun expr
+    ok_fun (Var fun_id)        = is_eta_reduction_sound fun_id || all ok_lam bndrs
+    ok_fun _fun                = False
+
+    ---------------
+    -- See Note [Eta reduction soundness], this is THE place to check soundness!
+    is_eta_reduction_sound fun =
+      -- Check that eta-reduction won't make the program stricter...
+      (fun_arity fun >= incoming_arity            -- criterion (A) and (E)
+        || all_calls_with_arity incoming_arity)   -- criterion (S)
+      -- ... and that the function can be eta reduced to arity 0
+      -- without violating invariants of Core and GHC
+      && canEtaReduceToArity fun 0 0              -- criteria (L), (J), (W), (B)
+    all_calls_with_arity n = isStrict (peelManyCalls n eval_sd)
+       -- See Note [Eta reduction based on evaluation context]
+
+    ---------------
+    fun_arity fun
+       | isLocalId fun
+       , isStrongLoopBreaker (idOccInfo fun) = 0
+           -- See Note [Eta reduction makes sense], point (3)
+       | arity > 0                           = arity
+       | isEvaldUnfolding (idUnfolding fun)  = 1
+           -- See Note [Eta reduction soundness], criterion (E)
+       | otherwise                           = 0
+       where
+         arity = idArity fun
+
+    ---------------
+    ok_lam v = isTyVar v || isEvVar v
+    -- See Note [Eta reduction makes sense], point (2)
+
+    ---------------
+    ok_arg :: Var              -- Of type bndr_t
+           -> CoreExpr         -- Of type arg_t
+           -> Coercion         -- Of kind (t1~t2)
+           -> Type             -- Type (arg_t -> t1) of the function
+                               --      to which the argument is supplied
+           -> Maybe (Coercion  -- Of type (arg_t -> t1 ~  bndr_t -> t2)
+                               --   (and similarly for tyvars, coercion args)
+                    , [CoreTickish])
+    -- See Note [Eta reduction with casted arguments]
+    ok_arg bndr (Type ty) co _
+       | Just tv <- getTyVar_maybe ty
+       , bndr == tv  = Just (mkHomoForAllCos [tv] co, [])
+    ok_arg bndr (Var v) co fun_ty
+       | bndr == v
+       , let mult = idMult bndr
+       , Just (fun_mult, _, _) <- splitFunTy_maybe fun_ty
+       , mult `eqType` fun_mult -- There is no change in multiplicity, otherwise we must abort
+       = Just (mkFunResCo Representational (idScaledType bndr) co, [])
+    ok_arg bndr (Cast e co_arg) co fun_ty
+       | (ticks, Var v) <- stripTicksTop tickishFloatable e
+       , Just (fun_mult, _, _) <- splitFunTy_maybe fun_ty
+       , bndr == v
+       , fun_mult `eqType` idMult bndr
+       = Just (mkFunCo Representational (multToCo fun_mult) (mkSymCo co_arg) co, ticks)
+       -- The simplifier combines multiple casts into one,
+       -- so we can have a simple-minded pattern match here
+    ok_arg bndr (Tick t arg) co fun_ty
+       | tickishFloatable t, Just (co', ticks) <- ok_arg bndr arg co fun_ty
+       = Just (co', t:ticks)
+
+    ok_arg _ _ _ _ = Nothing
+
+-- | Can we eta-reduce the given function to the specified arity?
+-- See Note [Eta reduction soundness], criteria (B), (J), (W) and (L).
+canEtaReduceToArity :: Id -> JoinArity -> Arity -> Bool
+canEtaReduceToArity fun dest_join_arity dest_arity =
+  not $
+        hasNoBinding fun -- (B)
+       -- Don't undersaturate functions with no binding.
+
+    ||  ( isJoinId fun && dest_join_arity < idJoinArity fun ) -- (J)
+       -- Don't undersaturate join points.
+       -- See Note [Invariants on join points] in GHC.Core, and #20599
+
+    || ( dest_arity < idCbvMarkArity fun ) -- (W)
+       -- Don't undersaturate StrictWorkerIds.
+       -- See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
+
+    ||  isLinearType (idType fun) -- (L)
+       -- Don't perform eta reduction on linear types.
+       -- If `f :: A %1-> B` and `g :: A -> B`,
+       -- then `g x = f x` is OK but `g = f` is not.
+
+
 {- *********************************************************************
 *                                                                      *
               The "push rules"
diff --git a/compiler/GHC/Core/Opt/CallArity.hs b/compiler/GHC/Core/Opt/CallArity.hs
index 67b9a88875..306b3bd446 100644
--- a/compiler/GHC/Core/Opt/CallArity.hs
+++ b/compiler/GHC/Core/Opt/CallArity.hs
@@ -17,7 +17,7 @@ import GHC.Types.Var.Env
 import GHC.Types.Basic
 import GHC.Core
 import GHC.Types.Id
-import GHC.Core.Opt.Arity ( typeArity, typeOneShots )
+import GHC.Core.Opt.Arity ( typeArity )
 import GHC.Core.Utils ( exprIsCheap, exprIsTrivial )
 import GHC.Data.Graph.UnVar
 import GHC.Types.Demand
@@ -377,15 +377,14 @@ a body representing “all external calls”, which returns a pessimistic
 CallArityRes (the co-call graph is the complete graph, all arityies 0).
 
 Note [Trimming arity]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
+~~~~~~~~~~~~~~~~~~~~~
 In the Call Arity papers, we are working on an untyped lambda calculus with no
 other id annotations, where eta-expansion is always possible. But this is not
 the case for Core!
  1. We need to ensure the invariant
       callArity e <= typeArity (exprType e)
     for the same reasons that exprArity needs this invariant (see Note
-    [exprArity invariant] in GHC.Core.Opt.Arity).
+    [typeArity invariants] in GHC.Core.Opt.Arity).
 
     If we are not doing that, a too-high arity annotation will be stored with
     the id, confusing the simplifier later on.
@@ -544,7 +543,7 @@ callArityAnal arity int (Let bind e)
 -- Which bindings should we look at?
 -- See Note [Which variables are interesting]
 isInteresting :: Var -> Bool
-isInteresting v = not $ null $ typeOneShots $ idType v
+isInteresting v = typeArity (idType v) > 0
 
 interestingBinds :: CoreBind -> [Var]
 interestingBinds = filter isInteresting . bindersOf
diff --git a/compiler/GHC/Core/Opt/FloatIn.hs b/compiler/GHC/Core/Opt/FloatIn.hs
index 88411a7add..cf3ca726e4 100644
--- a/compiler/GHC/Core/Opt/FloatIn.hs
+++ b/compiler/GHC/Core/Opt/FloatIn.hs
@@ -22,13 +22,14 @@ import GHC.Prelude
 import GHC.Platform
 
 import GHC.Core
+import GHC.Core.Opt.Arity( isOneShotBndr )
 import GHC.Core.Make hiding ( wrapFloats )
 import GHC.Core.Utils
 import GHC.Core.FVs
 import GHC.Core.Type
 
 import GHC.Types.Basic      ( RecFlag(..), isRec, Levity(Unlifted) )
-import GHC.Types.Id         ( isOneShotBndr, idType, isJoinId, isJoinId_maybe )
+import GHC.Types.Id         ( idType, isJoinId, isJoinId_maybe )
 import GHC.Types.Tickish
 import GHC.Types.Var
 import GHC.Types.Var.Set
diff --git a/compiler/GHC/Core/Opt/OccurAnal.hs b/compiler/GHC/Core/Opt/OccurAnal.hs
index 304ed12c2d..6e0fa12543 100644
--- a/compiler/GHC/Core/Opt/OccurAnal.hs
+++ b/compiler/GHC/Core/Opt/OccurAnal.hs
@@ -28,7 +28,7 @@ import GHC.Core
 import GHC.Core.FVs
 import GHC.Core.Utils   ( exprIsTrivial, isDefaultAlt, isExpandableApp,
                           stripTicksTopE, mkTicks )
-import GHC.Core.Opt.Arity   ( joinRhsArity )
+import GHC.Core.Opt.Arity   ( joinRhsArity, isOneShotBndr )
 import GHC.Core.Coercion
 import GHC.Core.Type
 import GHC.Core.TyCo.FVs( tyCoVarsOfMCo )
@@ -1755,7 +1755,7 @@ lambda and casts, e.g.
 
 * Why do we take care to account for intervening casts? Answer:
   currently we don't do eta-expansion and cast-swizzling in a stable
-  unfolding (see Note [Eta-expansion in stable unfoldings]).
+  unfolding (see Historical-note [Eta-expansion in stable unfoldings]).
   So we can get
     f = \x. ((\y. ...x...y...) |> co)
   Now, since the lambdas aren't together, the occurrence analyser will
diff --git a/compiler/GHC/Core/Opt/SetLevels.hs b/compiler/GHC/Core/Opt/SetLevels.hs
index 9e2376da45..a8a99ba42f 100644
--- a/compiler/GHC/Core/Opt/SetLevels.hs
+++ b/compiler/GHC/Core/Opt/SetLevels.hs
@@ -85,7 +85,7 @@ import GHC.Core.Utils   ( exprType, exprIsHNF
                         , collectMakeStaticArgs
                         , mkLamTypes
                         )
-import GHC.Core.Opt.Arity   ( exprBotStrictness_maybe )
+import GHC.Core.Opt.Arity   ( exprBotStrictness_maybe, isOneShotBndr )
 import GHC.Core.FVs     -- all of it
 import GHC.Core.Subst
 import GHC.Core.Make    ( sortQuantVars )
@@ -1384,9 +1384,11 @@ lvlLamBndrs env lvl bndrs
     new_lvl | any is_major bndrs = incMajorLvl lvl
             | otherwise          = incMinorLvl lvl
 
-    is_major bndr = isId bndr && not (isProbablyOneShotLambda bndr)
-       -- The "probably" part says "don't float things out of a
-       -- probable one-shot lambda"
+    is_major bndr = not (isOneShotBndr bndr)
+       -- Only non-one-shot lambdas bump a major level, which in
+       -- turn triggers floating.  NB: isOneShotBndr is always
+       -- true of a type variable -- there is no point in floating
+       -- out of a big lambda.
        -- See Note [Computing one-shot info] in GHC.Types.Demand
 
 lvlJoinBndrs :: LevelEnv -> Level -> RecFlag -> [OutVar]
diff --git a/compiler/GHC/Core/Opt/Simplify.hs b/compiler/GHC/Core/Opt/Simplify.hs
index 445fabe682..f87a28f440 100644
--- a/compiler/GHC/Core/Opt/Simplify.hs
+++ b/compiler/GHC/Core/Opt/Simplify.hs
@@ -38,9 +38,9 @@ import GHC.Core.Ppr     ( pprCoreExpr )
 import GHC.Core.Unfold
 import GHC.Core.Unfold.Make
 import GHC.Core.Utils
-import GHC.Core.Opt.Arity ( ArityType(..), typeArity
+import GHC.Core.Opt.Arity ( ArityType, exprArity, getBotArity
                           , pushCoTyArg, pushCoValArg
-                          , etaExpandAT )
+                          , typeArity, arityTypeArity, etaExpandAT )
 import GHC.Core.SimpleOpt ( exprIsConApp_maybe, joinPointBinding_maybe, joinPointBindings_maybe )
 import GHC.Core.FVs     ( mkRuleInfo )
 import GHC.Core.Rules   ( lookupRule, getRules, initRuleOpts )
@@ -352,7 +352,8 @@ simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se
                 -- See Note [Floating and type abstraction] in GHC.Core.Opt.Simplify.Utils
 
         -- Simplify the RHS
-        ; let rhs_cont = mkRhsStop (substTy body_env (exprType body)) (idDemandInfo bndr)
+        ; let rhs_cont = mkRhsStop (substTy body_env (exprType body))
+                                   is_rec (idDemandInfo bndr)
         ; (body_floats0, body0) <- {-#SCC "simplExprF" #-} simplExprF body_env body rhs_cont
 
         -- ANF-ise a constructor or PAP rhs
@@ -375,11 +376,11 @@ simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se
                      {-#SCC "simplLazyBind-type-abstraction-first" #-}
                      do { (poly_binds, body3) <- abstractFloats (seUnfoldingOpts env) top_lvl
                                                                 tvs' body_floats2 body2
-                        ; let floats = foldl' extendFloats (emptyFloats env) poly_binds
-                        ; return (floats, body3) }
+                        ; let poly_floats = foldl' extendFloats (emptyFloats env) poly_binds
+                        ; return (poly_floats, body3) }
 
         ; let env' = env `setInScopeFromF` rhs_floats
-        ; rhs' <- mkLam env' tvs' body3 rhs_cont
+        ; rhs' <- rebuildLam env' tvs' body3 rhs_cont
         ; (bind_float, env2) <- completeBind env' (BC_Let top_lvl is_rec) bndr bndr1 rhs'
         ; return (rhs_floats `addFloats` bind_float, env2) }
 
@@ -598,7 +599,7 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
                         --            a DFunUnfolding in mk_worker_unfolding
   , not (exprIsTrivial rhs)        -- Not x = y |> co; Wrinkle 1
   , not (hasInlineUnfolding info)  -- Not INLINE things: Wrinkle 4
-  , isConcrete (typeKind rhs_ty)   -- Don't peel off a cast if doing so would
+  , isConcrete (typeKind work_ty)  -- Don't peel off a cast if doing so would
                                    -- lose the underlying runtime representation.
                                    -- See Note [Preserve RuntimeRep info in cast w/w]
   , not (isOpaquePragma (idInlinePragma old_bndr)) -- Not for OPAQUE bindings
@@ -661,7 +662,9 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
            _ -> mkLetUnfolding (sm_uf_opts mode) top_lvl InlineRhs work_id work_rhs
 
 tryCastWorkerWrapper env _ _ _ bndr rhs  -- All other bindings
-  = return (mkFloatBind env (NonRec bndr rhs))
+  = do { traceSmpl "tcww:no" (vcat [ text "bndr:" <+> ppr bndr
+                                   , text "rhs:" <+> ppr rhs ])
+        ; return (mkFloatBind env (NonRec bndr rhs)) }
 
 mkCastWrapperInlinePrag :: InlinePragma -> InlinePragma
 -- See Note [Cast worker/wrapper]
@@ -699,6 +702,7 @@ prepareBinding :: SimplEnv -> TopLevelFlag -> RecFlag -> Bool
 --    bndr = K a a tmp
 -- That's what prepareBinding does
 -- Precondition: binder is not a JoinId
+-- Postcondition: the returned SimplFloats contains only let-floats
 prepareBinding env top_lvl is_rec strict_bind bndr rhs_floats rhs
   = do { -- Never float join-floats out of a non-join let-binding (which this is)
          -- So wrap the body in the join-floats right now
@@ -822,30 +826,15 @@ makeTrivial env top_lvl dmd occ_fs expr
   = do { (floats, triv_expr) <- makeTrivial env top_lvl dmd occ_fs expr'
        ; return (floats, Cast triv_expr co) }
 
-  | otherwise
-  = do { (floats, new_id) <- makeTrivialBinding env top_lvl occ_fs
-                                                id_info expr expr_ty
-       ; return (floats, Var new_id) }
-  where
-    id_info = vanillaIdInfo `setDemandInfo` dmd
-    expr_ty = exprType expr
-
-makeTrivialBinding :: HasDebugCallStack
-                   => SimplEnv -> TopLevelFlag
-                   -> FastString  -- ^ a "friendly name" to build the new binder from
-                   -> IdInfo
-                   -> OutExpr
-                   -> OutType     -- Type of the expression
-                   -> SimplM (LetFloats, OutId)
-makeTrivialBinding env top_lvl occ_fs info expr expr_ty
+  | otherwise -- 'expr' is not of form (Cast e co)
   = do  { (floats, expr1) <- prepareRhs env top_lvl occ_fs expr
         ; uniq <- getUniqueM
         ; let name = mkSystemVarName uniq occ_fs
-              var  = mkLocalIdWithInfo name Many expr_ty info
+              var  = mkLocalIdWithInfo name Many expr_ty id_info
 
         -- Now something very like completeBind,
         -- but without the postInlineUnconditionally part
-        ; (arity_type, expr2) <- tryEtaExpandRhs env NonRecursive var expr1
+        ; (arity_type, expr2) <- tryEtaExpandRhs env (BC_Let top_lvl NonRecursive) var expr1
           -- Technically we should extend the in-scope set in 'env' with
           -- the 'floats' from prepareRHS; but they are all fresh, so there is
           -- no danger of introducing name shadowig in eta expansion
@@ -855,9 +844,12 @@ makeTrivialBinding env top_lvl occ_fs info expr expr_ty
         ; let final_id = addLetBndrInfo var arity_type unf
               bind     = NonRec final_id expr2
 
-        ; return ( floats `addLetFlts` unitLetFloat bind, final_id ) }
+        ; traceSmpl "makeTrivial" (vcat [text "final_id" <+> ppr final_id, text "rhs" <+> ppr expr2 ])
+        ; return ( floats `addLetFlts` unitLetFloat bind, Var final_id ) }
   where
-    mode = getMode env
+    id_info = vanillaIdInfo `setDemandInfo` dmd
+    expr_ty = exprType expr
+    mode    = getMode env
 
 bindingOk :: TopLevelFlag -> CoreExpr -> Type -> Bool
 -- True iff we can have a binding of this expression at this level
@@ -945,7 +937,7 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
 
          -- Do eta-expansion on the RHS of the binding
          -- See Note [Eta-expanding at let bindings] in GHC.Core.Opt.Simplify.Utils
-      ; (new_arity, eta_rhs) <- tryEtaExpandRhs env is_rec new_bndr new_rhs
+      ; (new_arity, eta_rhs) <- tryEtaExpandRhs env bind_cxt new_bndr new_rhs
 
         -- Simplify the unfolding
       ; new_unfolding <- simplLetUnfolding env bind_cxt old_bndr
@@ -975,9 +967,7 @@ addLetBndrInfo :: OutId -> ArityType -> Unfolding -> OutId
 addLetBndrInfo new_bndr new_arity_type new_unf
   = new_bndr `setIdInfo` info5
   where
-    AT oss div = new_arity_type
-    new_arity  = length oss
-
+    new_arity = arityTypeArity new_arity_type
     info1 = idInfo new_bndr `setArityInfo` new_arity
 
     -- Unfolding info: Note [Setting the new unfolding]
@@ -990,12 +980,11 @@ addLetBndrInfo new_bndr new_arity_type new_unf
           = info2
 
     -- Bottoming bindings: see Note [Bottoming bindings]
-    info4 | isDeadEndDiv div = info3 `setDmdSigInfo` bot_sig
-                                     `setCprSigInfo`        bot_cpr
-          | otherwise        = info3
-
-    bot_sig = mkClosedDmdSig (replicate new_arity topDmd) div
-    bot_cpr = mkCprSig new_arity botCpr
+    info4 = case getBotArity new_arity_type of
+        Nothing -> info3
+        Just ar -> assert (ar == new_arity) $
+                   info3 `setDmdSigInfo` mkVanillaDmdSig new_arity botDiv
+                         `setCprSigInfo` mkCprSig new_arity botCpr
 
      -- Zap call arity info. We have used it by now (via
      -- `tryEtaExpandRhs`), and the simplifier can invalidate this
@@ -1009,12 +998,12 @@ Suppose we have
    let x = error "urk"
    in ...(case x of <alts>)...
 or
-   let f = \x. error (x ++ "urk")
+   let f = \y. error (y ++ "urk")
    in ...(case f "foo" of <alts>)...
 
 Then we'd like to drop the dead <alts> immediately.  So it's good to
-propagate the info that x's RHS is bottom to x's IdInfo as rapidly as
-possible.
+propagate the info that x's (or f's) RHS is bottom to x's (or f's)
+IdInfo as rapidly as possible.
 
 We use tryEtaExpandRhs on every binding, and it turns out that the
 arity computation it performs (via GHC.Core.Opt.Arity.findRhsArity) already
@@ -1023,6 +1012,21 @@ is propagate that info to the binder's IdInfo.
 
 This showed up in #12150; see comment:16.
 
+There is a second reason for settting  the strictness signature. Consider
+   let -- f :: <[S]b>
+       f = \x. error "urk"
+   in ...(f a b c)...
+Then, in GHC.Core.Opt.Arity.findRhsArity we'll use the demand-info on `f`
+to eta-expand to
+   let f = \x y z. error "urk"
+   in ...(f a b c)...
+
+But now f's strictness signature has too short an arity; see
+GHC.Core.Lint Note [Check arity on bottoming functions].
+Fortuitously, the same strictness-signature-fixup code gives the
+function a new strictness signature with the right number of
+arguments.  Example in stranal/should_compile/EtaExpansion.
+
 Note [Setting the demand info]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 If the unfolding is a value, the demand info may
@@ -1689,7 +1693,7 @@ simpl_lam env bndr body cont
   = do  { let (inner_bndrs, inner_body) = collectBinders body
         ; (env', bndrs') <- simplLamBndrs env (bndr:inner_bndrs)
         ; body'   <- simplExpr env' inner_body
-        ; new_lam <- mkLam env' bndrs' body' cont
+        ; new_lam <- rebuildLam env' bndrs' body' cont
         ; rebuild env' new_lam cont }
 
 -------------
@@ -3552,8 +3556,8 @@ mkDupableContWithDmds env dmds
         --              let a = ...arg...
         --              in [...hole...] a
         -- NB: sc_dup /= OkToDup; that is caught earlier by contIsDupable
-    do  { let (dmd:_) = dmds   -- Never fails
-        ; (floats1, cont') <- mkDupableContWithDmds env dmds cont
+    do  { let (dmd:cont_dmds) = dmds   -- Never fails
+        ; (floats1, cont') <- mkDupableContWithDmds env cont_dmds cont
         ; let env' = env `setInScopeFromF` floats1
         ; (_, se', arg') <- simplArg env' dup se arg
         ; (let_floats2, arg'') <- makeTrivial env NotTopLevel dmd (fsLit "karg") arg'
@@ -4086,12 +4090,14 @@ simplStableUnfolding env bind_cxt id rhs_ty id_arity unf
       CoreUnfolding { uf_tmpl = expr, uf_src = src, uf_guidance = guide }
         | isStableSource src
         -> do { expr' <- case bind_cxt of
-                           BC_Join cont -> -- Binder is a join point
-                                           -- See Note [Rules and unfolding for join points]
-                                           simplJoinRhs unf_env id expr cont
-                           BC_Let {} -> -- Binder is not a join point
-                                        do { expr' <- simplExprC unf_env expr (mkBoringStop rhs_ty)
-                                           ; return (eta_expand expr') }
+                  BC_Join cont    -> -- Binder is a join point
+                                     -- See Note [Rules and unfolding for join points]
+                                     simplJoinRhs unf_env id expr cont
+                  BC_Let _ is_rec -> -- Binder is not a join point
+                                     do { let cont = mkRhsStop rhs_ty is_rec topDmd
+                                           -- mkRhsStop: switch off eta-expansion at the top level
+                                        ; expr' <- simplExprC unf_env expr cont
+                                        ; return (eta_expand expr') }
               ; case guide of
                   UnfWhen { ug_arity = arity
                           , ug_unsat_ok = sat_ok
@@ -4138,11 +4144,13 @@ simplStableUnfolding env bind_cxt id rhs_ty id_arity unf
          -- See Note [Simplifying inside stable unfoldings] in GHC.Core.Opt.Simplify.Utils
 
     -- See Note [Eta-expand stable unfoldings]
-    eta_expand expr
-      | not eta_on         = expr
-      | exprIsTrivial expr = expr
-      | otherwise          = etaExpandAT (getInScope env) id_arity expr
-    eta_on = sm_eta_expand (getMode env)
+    -- Use the arity from the main Id (in id_arity), rather than computing it from rhs
+    eta_expand expr | sm_eta_expand (getMode env)
+                    , exprArity expr < arityTypeArity id_arity
+                    , wantEtaExpansion expr
+                    = etaExpandAT (getInScope env) id_arity expr
+                    | otherwise
+                    = expr
 
 {- Note [Eta-expand stable unfoldings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4166,7 +4174,7 @@ eta-expand the stable unfolding to arity N too. Simple and consistent.
 
 Wrinkles
 
-* See Note [Eta-expansion in stable unfoldings] in
+* See Historical-note [Eta-expansion in stable unfoldings] in
   GHC.Core.Opt.Simplify.Utils
 
 * Don't eta-expand a trivial expr, else each pass will eta-reduce it,
diff --git a/compiler/GHC/Core/Opt/Simplify/Env.hs b/compiler/GHC/Core/Opt/Simplify/Env.hs
index fa6599b6bc..5defa782e0 100644
--- a/compiler/GHC/Core/Opt/Simplify/Env.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Env.hs
@@ -20,7 +20,7 @@ module GHC.Core.Opt.Simplify.Env (
         getSimplRules,
 
         -- * Substitution results
-        SimplSR(..), mkContEx, substId, lookupRecBndr, refineFromInScope,
+        SimplSR(..), mkContEx, substId, lookupRecBndr,
 
         -- * Simplifying 'Id' binders
         simplNonRecBndr, simplNonRecJoinBndr, simplRecBndrs, simplRecJoinBndrs,
@@ -32,6 +32,7 @@ module GHC.Core.Opt.Simplify.Env (
         SimplFloats(..), emptyFloats, isEmptyFloats, mkRecFloats,
         mkFloatBind, addLetFloats, addJoinFloats, addFloats,
         extendFloats, wrapFloats,
+        isEmptyJoinFloats, isEmptyLetFloats,
         doFloatFromRhs, getTopFloatBinds,
 
         -- * LetFloats
@@ -519,10 +520,16 @@ so we must take the 'or' of the two.
 emptyLetFloats :: LetFloats
 emptyLetFloats = LetFloats nilOL FltLifted
 
+isEmptyLetFloats :: LetFloats -> Bool
+isEmptyLetFloats (LetFloats fs _) = isNilOL fs
+
 emptyJoinFloats :: JoinFloats
 emptyJoinFloats = nilOL
 
-unitLetFloat :: HasDebugCallStack => OutBind -> LetFloats
+isEmptyJoinFloats :: JoinFloats -> Bool
+isEmptyJoinFloats = isNilOL
+
+unitLetFloat :: OutBind -> LetFloats
 -- This key function constructs a singleton float with the right form
 unitLetFloat bind = assert (all (not . isJoinId) (bindersOf bind)) $
                     LetFloats (unitOL bind) (flag bind)
@@ -801,7 +808,6 @@ simplRecBndrs env@(SimplEnv {}) ids
     do  { let (!env1, ids1) = mapAccumL substIdBndr env ids
         ; seqIds ids1 `seq` return env1 }
 
-
 ---------------
 substIdBndr :: SimplEnv -> InBndr -> (SimplEnv, OutBndr)
 -- Might be a coercion variable
@@ -1028,7 +1034,7 @@ getTCvSubst (SimplEnv { seInScope = in_scope, seTvSubst = tv_env
                       , seCvSubst = cv_env })
   = mkTCvSubst in_scope (tv_env, cv_env)
 
-substTy :: SimplEnv -> Type -> Type
+substTy :: HasDebugCallStack => SimplEnv -> Type -> Type
 substTy env ty = Type.substTy (getTCvSubst env) ty
 
 substTyVar :: SimplEnv -> TyVar -> Type
diff --git a/compiler/GHC/Core/Opt/Simplify/Utils.hs b/compiler/GHC/Core/Opt/Simplify/Utils.hs
index 8afaef82ce..d0a7abb84f 100644
--- a/compiler/GHC/Core/Opt/Simplify/Utils.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Utils.hs
@@ -8,7 +8,8 @@ The simplifier utilities
 
 module GHC.Core.Opt.Simplify.Utils (
         -- Rebuilding
-        mkLam, mkCase, prepareAlts, tryEtaExpandRhs,
+        rebuildLam, mkCase, prepareAlts,
+        tryEtaExpandRhs, wantEtaExpansion,
 
         -- Inlining,
         preInlineUnconditionally, postInlineUnconditionally,
@@ -23,9 +24,9 @@ module GHC.Core.Opt.Simplify.Utils (
         SimplCont(..), DupFlag(..), StaticEnv,
         isSimplified, contIsStop,
         contIsDupable, contResultType, contHoleType, contHoleScaling,
-        contIsTrivial, contArgs,
+        contIsTrivial, contArgs, contIsRhs,
         countArgs,
-        mkBoringStop, mkRhsStop, mkLazyArgStop, contIsRhsOrArg,
+        mkBoringStop, mkRhsStop, mkLazyArgStop,
         interestingCallContext,
 
         -- ArgInfo
@@ -335,7 +336,7 @@ instance Outputable ArgInfo where
   ppr (ArgInfo { ai_fun = fun, ai_args = args, ai_dmds = dmds })
     = text "ArgInfo" <+> braces
          (sep [ text "fun =" <+> ppr fun
-              , text "dmds =" <+> ppr dmds
+              , text "dmds(first 10) =" <+> ppr (take 10 dmds)
               , text "args =" <+> ppr args ])
 
 instance Outputable ArgSpec where
@@ -428,8 +429,9 @@ mkFunRules rs = Just (n_required, rs)
 mkBoringStop :: OutType -> SimplCont
 mkBoringStop ty = Stop ty BoringCtxt topSubDmd
 
-mkRhsStop :: OutType -> Demand -> SimplCont       -- See Note [RHS of lets] in GHC.Core.Unfold
-mkRhsStop ty bndr_dmd = Stop ty RhsCtxt (subDemandIfEvaluated bndr_dmd)
+mkRhsStop :: OutType -> RecFlag -> Demand -> SimplCont
+-- See Note [RHS of lets] in GHC.Core.Unfold
+mkRhsStop ty is_rec bndr_dmd = Stop ty (RhsCtxt is_rec) (subDemandIfEvaluated bndr_dmd)
 
 mkLazyArgStop :: OutType -> ArgInfo -> SimplCont
 mkLazyArgStop ty fun_info = Stop ty (lazyArgContext fun_info) arg_sd
@@ -437,16 +439,10 @@ mkLazyArgStop ty fun_info = Stop ty (lazyArgContext fun_info) arg_sd
     arg_sd = subDemandIfEvaluated (head (ai_dmds fun_info))
 
 -------------------
-contIsRhsOrArg :: SimplCont -> Bool
-contIsRhsOrArg (Stop {})       = True
-contIsRhsOrArg (StrictBind {}) = True
-contIsRhsOrArg (StrictArg {})  = True
-contIsRhsOrArg _               = False
-
-contIsRhs :: SimplCont -> Bool
-contIsRhs (Stop _ RhsCtxt _) = True
-contIsRhs (CastIt _ k)       = contIsRhs k   -- For f = e |> co, treat e as Rhs context
-contIsRhs _                  = False
+contIsRhs :: SimplCont -> Maybe RecFlag
+contIsRhs (Stop _ (RhsCtxt is_rec) _) = Just is_rec
+contIsRhs (CastIt _ k)                = contIsRhs k   -- For f = e |> co, treat e as Rhs context
+contIsRhs _                           = Nothing
 
 -------------------
 contIsStop :: SimplCont -> Bool
@@ -767,13 +763,16 @@ strictArgContext (ArgInfo { ai_encl = encl_rules, ai_discs = discs })
 -- Use this for strict arguments
   | encl_rules                = RuleArgCtxt
   | disc:_ <- discs, disc > 0 = DiscArgCtxt  -- Be keener here
-  | otherwise                 = RhsCtxt
-      -- Why RhsCtxt?  if we see f (g x) (h x), and f is strict, we
+  | otherwise                 = RhsCtxt NonRecursive
+      -- Why RhsCtxt?  if we see f (g x), and f is strict, we
       -- want to be a bit more eager to inline g, because it may
       -- expose an eval (on x perhaps) that can be eliminated or
       -- shared. I saw this in nofib 'boyer2', RewriteFuns.onewayunify1
       -- It's worth an 18% improvement in allocation for this
       -- particular benchmark; 5% on 'mate' and 1.3% on 'multiplier'
+      --
+      -- Why NonRecursive?  Becuase it's a bit like
+      --   let a = g x in f a
 
 interestingCallContext :: SimplEnv -> SimplCont -> CallCtxt
 -- See Note [Interesting call context]
@@ -962,12 +961,10 @@ simplEnvForGHCi logger dflags
 updModeForStableUnfoldings :: Activation -> SimplMode -> SimplMode
 updModeForStableUnfoldings unf_act current_mode
   = current_mode { sm_phase      = phaseFromActivation unf_act
-                 , sm_eta_expand = False
                  , sm_inline     = True }
-    -- sm_phase: see Note [Simplifying inside stable unfoldings]
-    -- sm_eta_expand: see Note [Eta-expansion in stable unfoldings]
-    -- sm_rules: just inherit; sm_rules might be "off"
-    --           because of -fno-enable-rewrite-rules
+       -- sm_eta_expand: see Historical-note [No eta expansion in stable unfoldings]
+       -- sm_rules: just inherit; sm_rules might be "off"
+       --           because of -fno-enable-rewrite-rules
   where
     phaseFromActivation (ActiveAfter _ n) = Phase n
     phaseFromActivation _                 = InitialPhase
@@ -986,15 +983,23 @@ updModeForRules current_mode
 {- Note [Simplifying rules]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 When simplifying a rule LHS, refrain from /any/ inlining or applying
-of other RULES.
+of other RULES. Doing anything to the LHS is plain confusing, because
+it means that what the rule matches is not what the user
+wrote. c.f. #10595, and #10528.
+
+* sm_inline, sm_rules: inlining (or applying rules) on rule LHSs risks
+  introducing Ticks into the LHS, which makes matching
+  trickier. #10665, #10745.
+
+  Doing this to either side confounds tools like HERMIT, which seek to reason
+  about and apply the RULES as originally written. See #10829.
 
-Doing anything to the LHS is plain confusing, because it means that what the
-rule matches is not what the user wrote. c.f. #10595, and #10528.
-Moreover, inlining (or applying rules) on rule LHSs risks introducing
-Ticks into the LHS, which makes matching trickier. #10665, #10745.
+  See also Note [Do not expose strictness if sm_inline=False]
 
-Doing this to either side confounds tools like HERMIT, which seek to reason
-about and apply the RULES as originally written. See #10829.
+* sm_eta_expand: the template (LHS) of a rule must only mention coercion
+  /variables/ not arbitrary coercions.  See Note [Casts in the template] in
+  GHC.Core.Rules.  Eta expansion can create new coercions; so we switch
+  it off.
 
 There is, however, one case where we are pretty much /forced/ to transform the
 LHS of a rule: postInlineUnconditionally. For instance, in the case of
@@ -1021,29 +1026,25 @@ we don't want to swizzle this to
       (\x. blah) |> (Refl xty `FunCo` CoVar cv)
 So we switch off cast swizzling in updModeForRules.
 
-Note [Eta-expansion in stable unfoldings]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We don't do eta-expansion inside stable unfoldings.  It's extra work,
-and can be expensive (the bizarre T18223 is a case in point).
-
-See Note [Occurrence analysis for lambda binders] in GHC.Core.Opt.OccurAnal.
-
-Historical note. There was /previously/ another reason not to do eta
-expansion in stable unfoldings.  If we have a stable unfolding
-
-  f :: Ord a => a -> IO ()
-  -- Unfolding template
-  --    = /\a \(d:Ord a) (x:a). bla
-
-we previously did not want to eta-expand to
-
-  f :: Ord a => a -> IO ()
-  -- Unfolding template
-  --    = (/\a \(d:Ord a) (x:a) (eta:State#). bla eta) |> co
-
-because not specialisation of the overloading didn't work properly (#9509).
-But now it does: see Note [Account for casts in binding] in GHC.Core.Opt.Specialise
-
+Historical-note [No eta expansion in stable unfoldings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This Note is no longer relevant because the specialiser has improved.
+See Note [Account for casts in binding] in GHC.Core.Opt.Specialise.
+So we do not override sm_eta_expand in updModeForStableUnfoldings.
+
+    Old note: If we have a stable unfolding
+      f :: Ord a => a -> IO ()
+      -- Unfolding template
+      --    = /\a \(d:Ord a) (x:a). bla
+    we do not want to eta-expand to
+      f :: Ord a => a -> IO ()
+      -- Unfolding template
+      --    = (/\a \(d:Ord a) (x:a) (eta:State#). bla eta) |> co
+    because not specialisation of the overloading doesn't work properly
+    (see Note [Specialisation shape] in GHC.Core.Opt.Specialise), #9509.
+    So we disable eta-expansion in stable unfoldings.
+
+    End of Historical Note
 
 Note [Inlining in gentle mode]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1639,73 +1640,88 @@ won't inline because 'e' is too big.
 ************************************************************************
 -}
 
-mkLam :: SimplEnv -> [OutBndr] -> OutExpr -> SimplCont -> SimplM OutExpr
--- mkLam tries three things
+rebuildLam :: SimplEnv
+           -> [OutBndr] -> OutExpr
+           -> SimplCont
+           -> SimplM OutExpr
+-- (rebuildLam env bndrs body cont)
+-- returns expr which means the same as \bndrs. body
+--
+-- But it tries
 --      a) eta reduction, if that gives a trivial expression
 --      b) eta expansion [only if there are some value lambdas]
 --
 -- NB: the SimplEnv already includes the [OutBndr] in its in-scope set
-mkLam _env [] body _cont
+
+rebuildLam _env [] body _cont
   = return body
-mkLam env bndrs body cont
-  = {-#SCC "mkLam" #-}
---    pprTrace "mkLam" (ppr bndrs $$ ppr body $$ ppr cont) $
+
+rebuildLam env bndrs body cont
+  = {-# SCC "rebuildLam" #-}
     do { dflags <- getDynFlags
-       ; mkLam' dflags bndrs body }
+       ; try_eta dflags bndrs body }
   where
-    mode = getMode env
+    mode     = getMode env
+    in_scope = getInScope env  -- Includes 'bndrs'
+    mb_rhs   = contIsRhs cont
 
     -- See Note [Eta reduction based on evaluation context]
-    -- NB: cont is never ApplyToVal, otherwise contEvalContext panics
-    eval_sd dflags | gopt Opt_PedanticBottoms dflags = topSubDmd
-                       -- See Note [Eta reduction soundness], criterion (S)
-                       -- the bit about -fpedantic-bottoms
-                   | otherwise                       = contEvalContext cont
-
-    mkLam' :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr
-    mkLam' dflags bndrs body@(Lam {})
-      = mkLam' dflags (bndrs ++ bndrs1) body1
+    eval_sd dflags
+      | gopt Opt_PedanticBottoms dflags = topSubDmd
+          -- See Note [Eta reduction soundness], criterion (S)
+          -- the bit about -fpedantic-bottoms
+      | otherwise = contEvalContext cont
+        -- NB: cont is never ApplyToVal, because beta-reduction would
+        -- have happened.  So contEvalContext can panic on ApplyToVal.
+
+    try_eta :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr
+    try_eta dflags bndrs body
+      | -- Try eta reduction
+        gopt Opt_DoEtaReduction dflags
+      , Just etad_lam <- tryEtaReduce bndrs body (eval_sd dflags)
+      = do { tick (EtaReduction (head bndrs))
+           ; return etad_lam }
+
+      | -- Try eta expansion
+        Nothing <- mb_rhs  -- See Note [Eta expanding lambdas]
+      , sm_eta_expand mode
+      , any isRuntimeVar bndrs  -- Only when there is at least one value lambda already
+      , Just body_arity <- exprEtaExpandArity (initArityOpts dflags) body
+      = do { tick (EtaExpansion (head bndrs))
+           ; let body' = etaExpandAT in_scope body_arity body
+           ; traceSmpl "eta expand" (vcat [text "before" <+> ppr body
+                                          , text "after" <+> ppr body'])
+           -- NB: body' might have an outer Cast, but if so
+           --     mk_lams will pull it further out, past 'bndrs' to the top
+           ; mk_lams dflags bndrs body' }
+
+      | otherwise
+      = mk_lams dflags bndrs body
+
+    mk_lams :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr
+    -- mk_lams pulls casts and ticks to the top
+    mk_lams dflags bndrs body@(Lam {})
+      = mk_lams dflags (bndrs ++ bndrs1) body1
       where
         (bndrs1, body1) = collectBinders body
 
-    mkLam' dflags bndrs (Tick t expr)
+    mk_lams dflags bndrs (Tick t expr)
       | tickishFloatable t
-      = mkTick t <$> mkLam' dflags bndrs expr
+      = do { expr' <- mk_lams dflags bndrs expr
+           ; return (mkTick t expr') }
 
-    mkLam' dflags bndrs (Cast body co)
+    mk_lams dflags bndrs (Cast body co)
       | -- Note [Casts and lambdas]
         sm_cast_swizzle mode
       , not (any bad bndrs)
-      = do { lam <- mkLam' dflags bndrs body
+      = do { lam <- mk_lams dflags bndrs body
            ; return (mkCast lam (mkPiCos Representational bndrs co)) }
       where
         co_vars  = tyCoVarsOfCo co
         bad bndr = isCoVar bndr && bndr `elemVarSet` co_vars
 
-    mkLam' dflags bndrs body
-      | gopt Opt_DoEtaReduction dflags
-      -- , pprTrace "try eta" (ppr bndrs $$ ppr body $$ ppr cont $$ ppr (eval_sd dflags)) True
-      , Just etad_lam <- {-# SCC "tryee" #-} tryEtaReduce bndrs body (eval_sd dflags)
-      = do { tick (EtaReduction (head bndrs))
-           ; return etad_lam }
-
-      | not (contIsRhs cont)   -- See Note [Eta expanding lambdas]
-      , sm_eta_expand mode
-      , any isRuntimeVar bndrs
-      , let body_arity = {-# SCC "eta" #-} exprEtaExpandArity (initArityOpts dflags) body
-      , expandableArityType body_arity
-      = do { tick (EtaExpansion (head bndrs))
-           ; let res = {-# SCC "eta3" #-}
-                       mkLams bndrs $
-                       etaExpandAT in_scope body_arity body
-           ; traceSmpl "eta expand" (vcat [text "before" <+> ppr (mkLams bndrs body)
-                                          , text "after" <+> ppr res])
-           ; return res }
-
-      | otherwise
+    mk_lams _ bndrs body
       = return (mkLams bndrs body)
-      where
-        in_scope = getInScope env  -- Includes 'bndrs'
 
 {-
 Note [Eta expanding lambdas]
@@ -1727,21 +1743,40 @@ bother to try expansion in mkLam in that case; hence the contIsRhs
 guard.
 
 NB: We check the SimplEnv (sm_eta_expand), not DynFlags.
-    See Note [Eta-expansion in stable unfoldings]
+    See Historical-note [Eta-expansion in stable unfoldings]
 
 Note [Casts and lambdas]
 ~~~~~~~~~~~~~~~~~~~~~~~~
 Consider
-        (\x. (\y. e) `cast` g1) `cast` g2
-There is a danger here that the two lambdas look separated, and the
-full laziness pass might float an expression to between the two.
+        (\(x:tx). (\(y:ty). e) `cast` co)
 
-So this equation in mkLam' floats the g1 out, thus:
-        (\x. e `cast` g1)  -->  (\x.e) `cast` (tx -> g1)
-where x:tx.
+We float the cast out, thus
+        (\(x:tx) (y:ty). e) `cast` (tx -> co)
 
-In general, this floats casts outside lambdas, where (I hope) they
-might meet and cancel with some other cast:
+We do this for at least three reasons:
+
+1. There is a danger here that the two lambdas look separated, and the
+   full laziness pass might float an expression to between the two.
+
+2. The occurrence analyser will mark x as InsideLam if the Lam nodes
+   are separated (see the Lam case of occAnal).  By floating the cast
+   out we put the two Lams together, so x can get a vanilla Once
+   annotation.  If this lambda is the RHS of a let, which we inline,
+   we can do preInlineUnconditionally on that x=arg binding.  With the
+   InsideLam OccInfo, we can't do that, which results in an extra
+   iteration of the Simplifier.
+
+3. It may cancel with another cast.  E.g
+      (\x. e |> co1) |> co2
+   If we float out co1 it might cancel with co2.  Similarly
+      let f = (\x. e |> co1) in ...
+   If we float out co1, and then do cast worker/wrapper, we get
+      let f1 = \x.e; f = f1 |> co1 in ...
+   and now we can inline f, hoping that co1 may cancel at a call site.
+
+TL;DR: put the lambdas together if at all possible.
+
+In general, here's the transformation:
         \x. e `cast` co   ===>   (\x. e) `cast` (tx -> co)
         /\a. e `cast` co  ===>   (/\a. e) `cast` (/\a. co)
         /\g. e `cast` co  ===>   (/\g. e) `cast` (/\g. co)
@@ -1774,62 +1809,55 @@ Wrinkles
 ************************************************************************
 -}
 
-tryEtaExpandRhs :: SimplEnv -> RecFlag -> OutId -> OutExpr
+tryEtaExpandRhs :: SimplEnv -> BindContext -> OutId -> OutExpr
                 -> SimplM (ArityType, OutExpr)
 -- See Note [Eta-expanding at let bindings]
 -- If tryEtaExpandRhs rhs = (n, is_bot, rhs') then
 --   (a) rhs' has manifest arity n
 --   (b) if is_bot is True then rhs' applied to n args is guaranteed bottom
-tryEtaExpandRhs env is_rec bndr rhs
+tryEtaExpandRhs _env (BC_Join {}) bndr rhs
   | Just join_arity <- isJoinId_maybe bndr
   = do { let (join_bndrs, join_body) = collectNBinders join_arity rhs
              oss   = [idOneShotInfo id | id <- join_bndrs, isId id]
              arity_type | exprIsDeadEnd join_body = mkBotArityType oss
-                        | otherwise               = mkTopArityType oss
+                        | otherwise               = mkManifestArityType oss
        ; return (arity_type, rhs) }
          -- Note [Do not eta-expand join points]
          -- But do return the correct arity and bottom-ness, because
          -- these are used to set the bndr's IdInfo (#15517)
          -- Note [Invariants on join points] invariant 2b, in GHC.Core
 
+  | otherwise
+  = pprPanic "tryEtaExpandRhs" (ppr bndr)
+
+tryEtaExpandRhs env (BC_Let _ is_rec) bndr rhs
   | sm_eta_expand mode      -- Provided eta-expansion is on
   , new_arity > old_arity   -- And the current manifest arity isn't enough
-  , want_eta rhs
+  , wantEtaExpansion rhs
   = do { tick (EtaExpansion bndr)
        ; return (arity_type, etaExpandAT in_scope arity_type rhs) }
 
   | otherwise
   = return (arity_type, rhs)
-
   where
-    mode      = getMode env
-    in_scope  = getInScope env
-    dflags    = sm_dflags mode
-    arityOpts = initArityOpts dflags
-    old_arity = exprArity rhs
-    ty_arity  = typeArity (idType bndr)
-
-    arity_type = findRhsArity arityOpts is_rec bndr rhs old_arity
-                 `maxWithArity` idCallArity bndr
-                 `minWithArity` ty_arity
-    -- minWithArity: see Note [Arity trimming] in GHC.Core.Opt.Arity
-
-    new_arity = arityTypeArity arity_type
-
-    -- See Note [Which RHSs do we eta-expand?]
-    want_eta (Cast e _)                  = want_eta e
-    want_eta (Tick _ e)                  = want_eta e
-    want_eta (Lam b e) | isTyVar b       = want_eta e
-    want_eta (App e a) | exprIsTrivial a = want_eta e
-    want_eta (Var {})                    = False
-    want_eta (Lit {})                    = False
-    want_eta _ = True
-{-
-    want_eta _ = case arity_type of
-                   ATop (os:_) -> isOneShotInfo os
-                   ATop []     -> False
-                   ABot {}     -> True
--}
+    mode       = getMode env
+    in_scope   = getInScope env
+    dflags     = sm_dflags mode
+    arity_opts = initArityOpts dflags
+    old_arity  = exprArity rhs
+    arity_type = findRhsArity arity_opts is_rec bndr rhs old_arity
+    new_arity  = arityTypeArity arity_type
+
+wantEtaExpansion :: CoreExpr -> Bool
+-- Mostly True; but False of PAPs which will immediately eta-reduce again
+-- See Note [Which RHSs do we eta-expand?]
+wantEtaExpansion (Cast e _)             = wantEtaExpansion e
+wantEtaExpansion (Tick _ e)             = wantEtaExpansion e
+wantEtaExpansion (Lam b e) | isTyVar b  = wantEtaExpansion e
+wantEtaExpansion (App e _)              = wantEtaExpansion e
+wantEtaExpansion (Var {})               = False
+wantEtaExpansion (Lit {})               = False
+wantEtaExpansion _                      = True
 
 {-
 Note [Eta-expanding at let bindings]
diff --git a/compiler/GHC/Core/Opt/WorkWrap.hs b/compiler/GHC/Core/Opt/WorkWrap.hs
index 35d818d814..1c7a728d12 100644
--- a/compiler/GHC/Core/Opt/WorkWrap.hs
+++ b/compiler/GHC/Core/Opt/WorkWrap.hs
@@ -679,10 +679,11 @@ is there only to generate used-once info for single-entry thunks.
 
 Note [Don't eta expand in w/w]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-A binding where the manifestArity of the RHS is less than idArity of the binder
-means GHC.Core.Opt.Arity didn't eta expand that binding. When this happens, it does so
-for a reason (see Note [exprArity invariant] in GHC.Core.Opt.Arity) and we probably have
-a PAP, cast or trivial expression as RHS.
+A binding where the manifestArity of the RHS is less than idArity of
+the binder means GHC.Core.Opt.Arity didn't eta expand that binding
+When this happens, it does so for a reason (see Note [Arity invariants for bindings]
+in GHC.Core.Opt.Arity) and we probably have a PAP, cast or trivial expression
+as RHS.
 
 Below is a historical account of what happened when w/w still did eta expansion.
 Nowadays, it doesn't do that, but will simply w/w for the wrong arity, unleashing
diff --git a/compiler/GHC/Core/Ppr.hs b/compiler/GHC/Core/Ppr.hs
index 3f22e17bbe..c24e223553 100644
--- a/compiler/GHC/Core/Ppr.hs
+++ b/compiler/GHC/Core/Ppr.hs
@@ -534,7 +534,7 @@ instance Outputable IdInfo where
       has_caf_info = not (mayHaveCafRefs caf_info)
 
       str_info = dmdSigInfo info
-      has_str_info = not (isTopSig str_info)
+      has_str_info = not (isNopSig str_info)
 
       unf_info = realUnfoldingInfo info
       has_unf = hasSomeUnfolding unf_info
@@ -578,7 +578,7 @@ ppIdInfo id info
     has_caf_info = not (mayHaveCafRefs caf_info)
 
     str_info = dmdSigInfo info
-    has_str_info = not (isTopSig str_info)
+    has_str_info = not (isNopSig str_info)
 
     cpr_info = cprSigInfo info
     has_cpr_info = cpr_info /= topCprSig
diff --git a/compiler/GHC/Core/SimpleOpt.hs b/compiler/GHC/Core/SimpleOpt.hs
index cf46c3a937..2db7ee3373 100644
--- a/compiler/GHC/Core/SimpleOpt.hs
+++ b/compiler/GHC/Core/SimpleOpt.hs
@@ -90,14 +90,15 @@ little dance in action; the full Simplifier is a lot more complicated.
 data SimpleOpts = SimpleOpts
    { so_uf_opts :: !UnfoldingOpts   -- ^ Unfolding options
    , so_co_opts :: !OptCoercionOpts -- ^ Coercion optimiser options
+   , so_eta_red :: !Bool            -- ^ Eta reduction on?
    }
 
 -- | Default options for the Simple optimiser.
 defaultSimpleOpts :: SimpleOpts
 defaultSimpleOpts = SimpleOpts
    { so_uf_opts = defaultUnfoldingOpts
-   , so_co_opts = OptCoercionOpts
-      { optCoercionEnabled = False }
+   , so_co_opts = OptCoercionOpts { optCoercionEnabled = False }
+   , so_eta_red = False
    }
 
 simpleOptExpr :: HasDebugCallStack => SimpleOpts -> CoreExpr -> CoreExpr
@@ -180,13 +181,10 @@ simpleOptPgm opts this_mod binds rules =
 type SimpleClo = (SimpleOptEnv, InExpr)
 
 data SimpleOptEnv
-  = SOE { soe_co_opt_opts :: !OptCoercionOpts
-             -- ^ Options for the coercion optimiser
+  = SOE { soe_opts :: {-# UNPACK #-} !SimpleOpts
+             -- ^ Simplifier options
 
-        , soe_uf_opts :: !UnfoldingOpts
-             -- ^ Unfolding options
-
-        , soe_inl   :: IdEnv SimpleClo
+        , soe_inl :: IdEnv SimpleClo
              -- ^ Deals with preInlineUnconditionally; things
              -- that occur exactly once and are inlined
              -- without having first been simplified
@@ -202,12 +200,9 @@ instance Outputable SimpleOptEnv where
                    <+> text "}"
 
 emptyEnv :: SimpleOpts -> SimpleOptEnv
-emptyEnv opts = SOE
-   { soe_inl         = emptyVarEnv
-   , soe_subst       = emptySubst
-   , soe_co_opt_opts = so_co_opts opts
-   , soe_uf_opts     = so_uf_opts opts
-   }
+emptyEnv opts = SOE { soe_inl   = emptyVarEnv
+                    , soe_subst = emptySubst
+                    , soe_opts  = opts  }
 
 soeZapSubst :: SimpleOptEnv -> SimpleOptEnv
 soeZapSubst env@(SOE { soe_subst = subst })
@@ -283,7 +278,7 @@ simple_opt_expr env expr
         (env', b') = subst_opt_bndr env b
 
     ----------------------
-    go_co co = optCoercion (soe_co_opt_opts env) (getTCvSubst subst) co
+    go_co co = optCoercion (so_co_opts (soe_opts env)) (getTCvSubst subst) co
 
     ----------------------
     go_alt env (Alt con bndrs rhs)
@@ -298,7 +293,8 @@ simple_opt_expr env expr
        where
          (env', b') = subst_opt_bndr env b
     go_lam env bs' e
-       | Just etad_e <- tryEtaReduce bs e' topSubDmd = etad_e
+       | so_eta_red (soe_opts env)
+       , Just etad_e <- tryEtaReduce bs e' topSubDmd = etad_e
        | otherwise                                   = mkLams bs e'
        where
          bs = reverse bs'
@@ -443,7 +439,7 @@ simple_bind_pair env@(SOE { soe_inl = inl_env, soe_subst = subst })
     (env { soe_subst = extendTvSubst subst in_bndr out_ty }, Nothing)
 
   | Coercion co <- in_rhs
-  , let out_co = optCoercion (soe_co_opt_opts env) (getTCvSubst (soe_subst rhs_env)) co
+  , let out_co = optCoercion (so_co_opts (soe_opts env)) (getTCvSubst (soe_subst rhs_env)) co
   = assert (isCoVar in_bndr)
     (env { soe_subst = extendCvSubst subst in_bndr out_co }, Nothing)
 
@@ -723,7 +719,7 @@ add_info env old_bndr top_level new_rhs new_bndr
  | otherwise        = lazySetIdInfo new_bndr new_info
  where
    subst    = soe_subst env
-   uf_opts  = soe_uf_opts env
+   uf_opts  = so_uf_opts (soe_opts env)
    old_info = idInfo old_bndr
 
    -- Add back in the rules and unfolding which were
diff --git a/compiler/GHC/Core/Unfold.hs b/compiler/GHC/Core/Unfold.hs
index 6316e321d4..7786f2e3a2 100644
--- a/compiler/GHC/Core/Unfold.hs
+++ b/compiler/GHC/Core/Unfold.hs
@@ -47,8 +47,8 @@ import GHC.Core.DataCon
 import GHC.Types.Literal
 import GHC.Builtin.PrimOps
 import GHC.Types.Id.Info
-import GHC.Types.Basic  ( Arity )
 import GHC.Types.RepType ( isZeroBitTy )
+import GHC.Types.Basic  ( Arity, RecFlag(..) )
 import GHC.Core.Type
 import GHC.Builtin.Names
 import GHC.Data.Bag
@@ -1003,7 +1003,7 @@ nonTriv _       = True
 
 data CallCtxt
   = BoringCtxt
-  | RhsCtxt             -- Rhs of a let-binding; see Note [RHS of lets]
+  | RhsCtxt RecFlag     -- Rhs of a let-binding; see Note [RHS of lets]
   | DiscArgCtxt         -- Argument of a function with non-zero arg discount
   | RuleArgCtxt         -- We are somewhere in the argument of a function with rules
 
@@ -1018,7 +1018,7 @@ instance Outputable CallCtxt where
   ppr CaseCtxt    = text "CaseCtxt"
   ppr ValAppCtxt  = text "ValAppCtxt"
   ppr BoringCtxt  = text "BoringCtxt"
-  ppr RhsCtxt     = text "RhsCtxt"
+  ppr (RhsCtxt ir)= text "RhsCtxt" <> parens (ppr ir)
   ppr DiscArgCtxt = text "DiscArgCtxt"
   ppr RuleArgCtxt = text "RuleArgCtxt"
 
@@ -1244,21 +1244,17 @@ tryUnfolding logger opts !case_depth id lone_variable
           = case cont_info of
               CaseCtxt   -> not (lone_variable && is_exp)  -- Note [Lone variables]
               ValAppCtxt -> True                           -- Note [Cast then apply]
-              RuleArgCtxt -> uf_arity > 0  -- See Note [Unfold into lazy contexts]
+              RuleArgCtxt -> uf_arity > 0  -- See Note [RHS of lets]
               DiscArgCtxt -> uf_arity > 0  -- Note [Inlining in ArgCtxt]
-              RhsCtxt     -> uf_arity > 0  --
+              RhsCtxt NonRecursive
+                          -> uf_arity > 0  -- See Note [RHS of lets]
               _other      -> False         -- See Note [Nested functions]
 
 
-{-
-Note [Unfold into lazy contexts]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Merged into Note [RHS of lets].
-
-Note [RHS of lets]
-~~~~~~~~~~~~~~~~~~
+{- Note [RHS of lets]
+~~~~~~~~~~~~~~~~~~~~~
 When the call is the argument of a function with a RULE, or the RHS of a let,
-we are a little bit keener to inline.  For example
+we are a little bit keener to inline (in tryUnfolding).  For example
      f y = (y,y,y)
      g y = let x = f y in ...(case x of (a,b,c) -> ...) ...
 We'd inline 'f' if the call was in a case context, and it kind-of-is,
@@ -1267,7 +1263,11 @@ only we can't see it.  Also
 could be expensive whereas
      x = case v of (a,b) -> a
 is patently cheap and may allow more eta expansion.
-So we treat the RHS of a let as not-totally-boring.
+
+So, in `interesting_call` in `tryUnfolding`, we treat the RHS of a
+/non-recursive/ let as not-totally-boring.  A /recursive/ let isn't
+going be inlined so there is much less point.  Hence the (only reason
+for the) RecFlag in RhsCtxt
 
 Note [Unsaturated applications]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/compiler/GHC/Core/Utils.hs b/compiler/GHC/Core/Utils.hs
index 2d287a1b3d..87dc238d62 100644
--- a/compiler/GHC/Core/Utils.hs
+++ b/compiler/GHC/Core/Utils.hs
@@ -37,9 +37,6 @@ module GHC.Core.Utils (
         cheapEqExpr, cheapEqExpr', eqExpr,
         diffBinds,
 
-        -- * Lambdas and eta reduction
-        tryEtaReduce, canEtaReduceToArity,
-
         -- * Manipulating data constructors and types
         exprToType,
         applyTypeToArgs,
@@ -71,11 +68,9 @@ import GHC.Platform
 
 import GHC.Core
 import GHC.Core.Ppr
-import GHC.Core.FVs( exprFreeVars )
 import GHC.Core.DataCon
 import GHC.Core.Type as Type
 import GHC.Core.FamInstEnv
-import GHC.Core.Predicate
 import GHC.Core.TyCo.Rep( TyCoBinder(..), TyBinder )
 import GHC.Core.Coercion
 import GHC.Core.Reduction
@@ -95,10 +90,11 @@ import GHC.Types.Literal
 import GHC.Types.Tickish
 import GHC.Types.Id
 import GHC.Types.Id.Info
+import GHC.Types.Basic( Arity, Levity(..)
+                      , CbvMark(..), isMarkedCbv )
 import GHC.Types.Unique
-import GHC.Types.Basic
-import GHC.Types.Demand
 import GHC.Types.Unique.Set
+import GHC.Types.Demand
 
 import GHC.Data.FastString
 import GHC.Data.Maybe
@@ -2326,372 +2322,6 @@ locBind loc b1 b2 diffs = map addLoc diffs
         bindLoc | b1 == b2  = ppr b1
                 | otherwise = ppr b1 <> char '/' <> ppr b2
 
-{-
-************************************************************************
-*                                                                      *
-                Eta reduction
-*                                                                      *
-************************************************************************
-
-Note [Eta reduction makes sense]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Traditionally, eta reduction transforms `\x. e x` to `e`, where `e` is an
-arbitrary expression in which `x` doesn't occur free.
-It is the inverse of eta expansion, which generally transforms the program into
-a form that executes faster. So why and when will GHC attempt to eta *reduce*?
-
-1. We want to eta-reduce only if we get all the way to a
-   trivial expression; we don't want to remove extra lambdas unless we are going
-   to avoid allocating this thing altogether.
-   Trivial means *including* casts and type lambdas:
-     * `\x. f x |> co  -->  f |> (ty(x) -> co)` (provided `co` doesn't mention `x`)
-     * `/\a. \x. f @(Maybe a) x -->  /\a. f @(Maybe a)`
-
-2. It's always sound to eta-reduce *type* lambdas and we always want to, because
-   it makes RULEs apply more often:
-      This RULE:    `forall g. foldr (build (/\a -> g a))`
-      should match  `foldr (build (/\b -> ...something complex...))`
-   and the simplest way to do so is eta-reduce `/\a -> g a` in the RULE to `g`.
-   The type checker can insert these eta-expanded versions of the RULE.
-   [SG: This is implied by (1), isn't it? Perhaps we want to eta-reduce type
-    lambdas even if the resulting expression is non-trivial?]
-
-3. We have to hide `f`'s `idArity` in its own RHS, lest we suffer from the last
-   point of Note [Arity robustness]. There we have `f = \x. f x` and we should
-   not eta-reduce to `f=f`. Which might change a terminating program
-   (think @f `seq` e@) to a non-terminating one.
-   So we check for being a loop breaker first. However for GlobalIds we can look
-   at the arity; and for primops we must, since they have no unfolding.
-   [SG: Perhaps this is rather a soundness subtlety?]
-
-Of course, eta reduction is not always sound. See Note [Eta reduction soundness]
-for when it is.
-
-When there are multiple arguments, we might get multiple eta-redexes. Example:
-   \x y. e x y
-   ==> { reduce \y. (e x) y in context \x._ }
-   \x. e x
-   ==> { reduce \x. e x in context _ }
-   e
-And (1) implies that we never want to stop with `\x. e x`, because that is not a
-trivial expression. So in practice, the implementation works by considering a
-whole group of leading lambdas to reduce.
-
-These delicacies are why we don't simply use 'exprIsTrivial' and 'exprIsHNF'
-in 'tryEtaReduce'. Alas.
-
-Note [Eta reduction soundness]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As Note [Eta reduction makes sense] explains, GHC's eta reduction transforms
-`\x y. e x y` to `e`, where `e` is an arbitrary expression in which `x` and `y`
-don't occur free.
-
-Eta reduction is *not* a sound transformation in general, because it
-may change termination behavior if *value* lambdas are involved:
-  `bot`  /=  `\x. bot x`   (as can be observed by a simple `seq`)
-The past has shown that oversight of this fact can not only lead to endless
-loops or exceptions, but also straight out *segfaults*.
-
-Nevertheless, we can give the following criteria for when it is sound to
-perform eta reduction on an expression with n leading lambdas `\xs. e xs`
-(checked in 'is_eta_reduction_sound' in 'tryEtaReduce', which focuses on the
-case where `e` is trivial):
-
- A. It is sound to eta-reduce n arguments as long as n does not exceed the
-    `exprArity` of `e`. (Needs Arity analysis.)
-    This criterion exploits information about how `e` is *defined*.
-
-    Example: If `e = \x. bot` then we know it won't diverge until it is called
-    with one argument. Hence it is safe to eta-reduce `\x. e x` to `e`.
-    By contrast, it would be *unsound* to eta-reduce 2 args, `\x y. e x y` to `e`:
-    `e 42` diverges when `(\x y. e x y) 42` does not.
-
- S. It is sound to eta-reduce n arguments in an evaluation context in which all
-    calls happen with at least n arguments. (Needs Strictness analysis.)
-    NB: This treats evaluations like a call with 0 args.
-    NB: This criterion exploits information about how `e` is *used*.
-
-    Example: Given a function `g` like
-      `g c = Just (c 1 2 + c 2 3)`
-    it is safe to eta-reduce the arg in `g (\x y. e x y)` to `g e` without
-    knowing *anything* about `e` (perhaps it's a parameter occ itself), simply
-    because `g` always calls its parameter with 2 arguments.
-    It is also safe to eta-reduce just one arg, e.g., `g (\x. e x)` to `g e`.
-    By contrast, it would *unsound* to eta-reduce 3 args in a call site
-    like `g (\x y z. e x y z)` to `g e`, because that diverges when
-    `e = \x y. bot`.
-
-    Could we relax to "*At least one call in the same trace* is with n args"?
-    (NB: Strictness analysis can only answer this relaxed question, not the
-     original formulation.)
-    Consider what happens for
-      ``g2 c = c True `seq` c False 42``
-    Here, `g2` will call `c` with 2 arguments (if there is a call at all).
-    But it is unsound to eta-reduce the arg in `g2 (\x y. e x y)` to `g2 e`
-    when `e = \x. if x then bot else id`, because the latter will diverge when
-    the former would not.
-    On the other hand, with `-fno-pendantic-bottoms` , we will have eta-expanded
-    the definition of `e` and then eta-reduction is sound
-    (see Note [Dealing with bottom]).
-    Consequence: We have to check that `-fpedantic-bottoms` is off; otherwise
-    eta-reduction based on demands is in fact unsound.
-
-    See Note [Eta reduction based on evaluation context] for the implementation
-    details. This criterion is tested extensively in T21261.
-
- E. As a perhaps special case on the boundary of (A) and (S), when we know that
-    a fun binder `f` is in WHNF, we simply assume it has arity 1 and apply (A).
-    Example:
-      ``g f = f `seq` \x. f x``
-    Here it's sound eta-reduce `\x. f x` to `f`, because `f` can't be bottom
-    after the `seq`. This turned up in #7542.
-
-And here are a few more technical criteria for when it is *not* sound to
-eta-reduce that are specific to Core and GHC:
-
- L. With linear types, eta-reduction can break type-checking:
-      f :: A ⊸ B
-      g :: A -> B
-      g = \x. f x
-    The above is correct, but eta-reducing g would yield g=f, the linter will
-    complain that g and f don't have the same type. NB: Not unsound in the
-    dynamic semantics, but unsound according to the static semantics of Core.
-
- J. We may not undersaturate join points.
-    See Note [Invariants on join points] in GHC.Core, and #20599.
-
- B. We may not undersaturate functions with no binding.
-    See Note [Eta expanding primops].
-
- W. We may not undersaturate StrictWorkerIds.
-    See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
-
-Here is a list of historic accidents surrounding unsound eta-reduction:
-
-* Consider
-        f = \x.f x
-        h y = case (case y of { True -> f `seq` True; False -> False }) of
-                True -> ...; False -> ...
-  If we (unsoundly) eta-reduce f to get f=f, the strictness analyser
-  says f=bottom, and replaces the (f `seq` True) with just
-  (f `cast` unsafe-co).
-  [SG in 2022: I don't think worker/wrapper would do this today.]
-  BUT, as things stand, 'f' got arity 1, and it *keeps* arity 1 (perhaps also
-  wrongly). So CorePrep eta-expands the definition again, so that it does not
-  terminate after all.
-  Result: seg-fault because the boolean case actually gets a function value.
-  See #1947.
-
-* Never *reduce* arity. For example
-      f = \xy. g x y
-  Then if h has arity 1 we don't want to eta-reduce because then
-  f's arity would decrease, and that is bad
-  [SG in 2022: I don't understand this point. There is no `h`, perhaps that
-   should have been `g`. Even then, this proposed eta-reduction is invalid by
-   criterion (A), which might actually be the point this anecdote is trying to
-   make. Perhaps the "no arity decrease" idea is also related to
-   Note [Arity robustness]?]
-
-Note [Eta reduction with casted arguments]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Consider
-    (\(x:t3). f (x |> g)) :: t3 -> t2
-  where
-    f :: t1 -> t2
-    g :: t3 ~ t1
-This should be eta-reduced to
-
-    f |> (sym g -> t2)
-
-So we need to accumulate a coercion, pushing it inward (past
-variable arguments only) thus:
-   f (x |> co_arg) |> co  -->  (f |> (sym co_arg -> co)) x
-   f (x:t)         |> co  -->  (f |> (t -> co)) x
-   f @ a           |> co  -->  (f |> (forall a.co)) @ a
-   f @ (g:t1~t2)   |> co  -->  (f |> (t1~t2 => co)) @ (g:t1~t2)
-These are the equations for ok_arg.
-
-Note [Eta reduction with casted function]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Since we are pushing a coercion inwards, it is easy to accommodate
-    (\xy. (f x |> g) y)
-    (\xy. (f x y) |> g)
-
-See the `(Cast e co)` equation for `go` in `tryEtaReduce`.  The
-eta-expander pushes those casts outwards, so you might think we won't
-ever see a cast here, but if we have
-  \xy. (f x y |> g)
-we will call tryEtaReduce [x,y] (f x y |> g), and we'd like that to
-work.  This happens in GHC.Core.Opt.Simplify.Utils.mkLam, where
-eta-expansion may be turned off (by sm_eta_expand).
-
-Note [Eta reduction based on evaluation context]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Note [Eta reduction soundness], criterion (S) allows us to eta-reduce
-`g (\x y. e x y)` to `g e` when we know that `g` always calls its parameter with
-at least 2 arguments. So how do we read that off `g`'s demand signature?
-
-Let's take the simple example of #21261, where `g` (actually, `f`) is defined as
-  g c = c 1 2 + c 3 4
-Then this is how the pieces are put together:
-
-  * Demand analysis infers `<SCS(C1(L))>` for `g`'s demand signature
-
-  * When the Simplifier next simplifies the argument in `g (\x y. e x y)`, it
-    looks up the *evaluation context* of the argument in the form of the
-    sub-demand `CS(C1(L))` and stores it in the 'SimplCont'.
-    (Why does it drop the outer evaluation cardinality of the demand, `S`?
-    Because it's irrelevant! When we simplify an expression, we do so under the
-    assumption that it is currently under evaluation.)
-    This sub-demand literally says "Whenever this expression is evaluated, it
-    is also called with two arguments, potentially multiple times".
-
-  * Then the simplifier takes apart the lambda and simplifies the lambda group
-    and then calls 'tryEtaReduce' when rebuilding the lambda, passing the
-    evaluation context `CS(C1(L))` along. Then we simply peel off 2 call
-    sub-demands `Cn` and see whether all of the n's (here: `S=C_1N` and
-    `1=C_11`) were strict. And strict they are! Thus, it will eta-reduce
-    `\x y. e x y` to `e`.
--}
-
--- | `tryEtaReduce [x,y,z] e sd` returns `Just e'` if `\x y z -> e` is evaluated
--- according to `sd` and can soundly and gainfully be eta-reduced to `e'`.
--- See Note [Eta reduction soundness]
--- and Note [Eta reduction makes sense] when that is the case.
-tryEtaReduce :: [Var] -> CoreExpr -> SubDemand -> Maybe CoreExpr
--- Return an expression equal to (\bndrs. body)
-tryEtaReduce bndrs body eval_sd
-  = go (reverse bndrs) body (mkRepReflCo (exprType body))
-  where
-    incoming_arity = count isId bndrs -- See Note [Eta reduction makes sense], point (2)
-
-    go :: [Var]            -- Binders, innermost first, types [a3,a2,a1]
-       -> CoreExpr         -- Of type tr
-       -> Coercion         -- Of type tr ~ ts
-       -> Maybe CoreExpr   -- Of type a1 -> a2 -> a3 -> ts
-    -- See Note [Eta reduction with casted arguments]
-    -- for why we have an accumulating coercion
-    --
-    -- Invariant: (go bs body co) returns an expression
-    --            equivalent to (\(reverse bs). body |> co)
-
-    -- See Note [Eta reduction with casted function]
-    go bs (Cast e co1) co2
-      = go bs e (co1 `mkTransCo` co2)
-
-    go bs (Tick t e) co
-      | tickishFloatable t
-      = fmap (Tick t) $ go bs e co
-      -- Float app ticks: \x -> Tick t (e x) ==> Tick t e
-
-    go (b : bs) (App fun arg) co
-      | Just (co', ticks) <- ok_arg b arg co (exprType fun)
-      = fmap (flip (foldr mkTick) ticks) $ go bs fun co'
-            -- Float arg ticks: \x -> e (Tick t x) ==> Tick t e
-
-    go remaining_bndrs fun co
-      | all isTyVar remaining_bndrs -- See Note [Eta reduction makes sense], point (1)
-      , remaining_bndrs `ltLength` bndrs
-      , ok_fun fun
-      , let used_vars = exprFreeVars fun `unionVarSet` tyCoVarsOfCo co
-            reduced_bndrs = mkVarSet (dropList remaining_bndrs bndrs)
-      , used_vars `disjointVarSet` reduced_bndrs
-          -- Check for any of the binders free in the result including the
-          -- accumulated coercion
-          -- See Note [Eta reduction makes sense], intro and point (1)
-      = Just $ mkLams (reverse remaining_bndrs) (mkCast fun co)
-
-    go _remaining_bndrs _fun  _  = -- pprTrace "tER fail" (ppr _fun $$ ppr _remaining_bndrs) $
-                                   Nothing
-
-
-    ---------------
-    -- See Note [Eta reduction makes sense], point (1)
-    ok_fun (App fun (Type {})) = ok_fun fun
-    ok_fun (Cast fun _)        = ok_fun fun
-    ok_fun (Tick _ expr)       = ok_fun expr
-    ok_fun (Var fun_id)        = is_eta_reduction_sound fun_id || all ok_lam bndrs
-    ok_fun _fun                = False
-
-    ---------------
-    -- See Note [Eta reduction soundness], this is THE place to check soundness!
-    is_eta_reduction_sound fun =
-      -- Check that eta-reduction won't make the program stricter...
-      (fun_arity fun >= incoming_arity            -- criterion (A) and (E)
-        || all_calls_with_arity incoming_arity)   -- criterion (S)
-      -- ... and that the function can be eta reduced to arity 0
-      -- without violating invariants of Core and GHC
-      && canEtaReduceToArity fun 0 0              -- criteria (L), (J), (W), (B)
-    all_calls_with_arity n = isStrict (peelManyCalls n eval_sd)
-       -- See Note [Eta reduction based on evaluation context]
-
-    ---------------
-    fun_arity fun -- See Note [Eta reduction makes sense], point (3)
-       | isLocalId fun
-       , isStrongLoopBreaker (idOccInfo fun) = 0
-       | arity > 0                           = arity
-       | isEvaldUnfolding (idUnfolding fun)  = 1
-           -- See Note [Eta reduction soundness], criterion (E)
-       | otherwise                           = 0
-       where
-         arity = idArity fun
-
-    ---------------
-    ok_lam v = isTyVar v || isEvVar v
-
-    ---------------
-    ok_arg :: Var              -- Of type bndr_t
-           -> CoreExpr         -- Of type arg_t
-           -> Coercion         -- Of kind (t1~t2)
-           -> Type             -- Type (arg_t -> t1) of the function
-                               --      to which the argument is supplied
-           -> Maybe (Coercion  -- Of type (arg_t -> t1 ~  bndr_t -> t2)
-                               --   (and similarly for tyvars, coercion args)
-                    , [CoreTickish])
-    -- See Note [Eta reduction with casted arguments]
-    ok_arg bndr (Type ty) co _
-       | Just tv <- getTyVar_maybe ty
-       , bndr == tv  = Just (mkHomoForAllCos [tv] co, [])
-    ok_arg bndr (Var v) co fun_ty
-       | bndr == v
-       , let mult = idMult bndr
-       , Just (fun_mult, _, _) <- splitFunTy_maybe fun_ty
-       , mult `eqType` fun_mult -- There is no change in multiplicity, otherwise we must abort
-       = Just (mkFunResCo Representational (idScaledType bndr) co, [])
-    ok_arg bndr (Cast e co_arg) co fun_ty
-       | (ticks, Var v) <- stripTicksTop tickishFloatable e
-       , Just (fun_mult, _, _) <- splitFunTy_maybe fun_ty
-       , bndr == v
-       , fun_mult `eqType` idMult bndr
-       = Just (mkFunCo Representational (multToCo fun_mult) (mkSymCo co_arg) co, ticks)
-       -- The simplifier combines multiple casts into one,
-       -- so we can have a simple-minded pattern match here
-    ok_arg bndr (Tick t arg) co fun_ty
-       | tickishFloatable t, Just (co', ticks) <- ok_arg bndr arg co fun_ty
-       = Just (co', t:ticks)
-
-    ok_arg _ _ _ _ = Nothing
-
--- | Can we eta-reduce the given function to the specified arity?
--- See Note [Eta reduction soundness], criteria (B), (J), (W) and (L).
-canEtaReduceToArity :: Id -> JoinArity -> Arity -> Bool
-canEtaReduceToArity fun dest_join_arity dest_arity =
-  not $
-        hasNoBinding fun -- (B)
-       -- Don't undersaturate functions with no binding.
-
-    ||  ( isJoinId fun && dest_join_arity < idJoinArity fun ) -- (J)
-       -- Don't undersaturate join points.
-       -- See Note [Invariants on join points] in GHC.Core, and #20599
-
-    || ( dest_arity < idCbvMarkArity fun ) -- (W)
-       -- Don't undersaturate StrictWorkerIds.
-       -- See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
-
-    ||  isLinearType (idType fun) -- (L)
-       -- Don't perform eta reduction on linear types.
-       -- If `f :: A %1-> B` and `g :: A -> B`,
-       -- then `g x = f x` is OK but `g = f` is not.
 
 {- *********************************************************************
 *                                                                      *
diff --git a/compiler/GHC/CoreToIface.hs b/compiler/GHC/CoreToIface.hs
index de9d13d7aa..248e517f61 100644
--- a/compiler/GHC/CoreToIface.hs
+++ b/compiler/GHC/CoreToIface.hs
@@ -75,7 +75,7 @@ import GHC.Types.Var
 import GHC.Types.Var.Env
 import GHC.Types.Var.Set
 import GHC.Types.Tickish
-import GHC.Types.Demand ( isTopSig )
+import GHC.Types.Demand ( isNopSig )
 import GHC.Types.Cpr ( topCprSig )
 
 import GHC.Utils.Outputable
@@ -476,7 +476,7 @@ toIfaceIdInfo id_info
     ------------  Strictness  --------------
         -- No point in explicitly exporting TopSig
     sig_info = dmdSigInfo id_info
-    strict_hsinfo | not (isTopSig sig_info) = Just (HsDmdSig sig_info)
+    strict_hsinfo | not (isNopSig sig_info) = Just (HsDmdSig sig_info)
                   | otherwise               = Nothing
 
     ------------  CPR --------------
diff --git a/compiler/GHC/CoreToStg/Prep.hs b/compiler/GHC/CoreToStg/Prep.hs
index 63aeba48ca..045d580a2a 100644
--- a/compiler/GHC/CoreToStg/Prep.hs
+++ b/compiler/GHC/CoreToStg/Prep.hs
@@ -36,7 +36,6 @@ import GHC.Builtin.Types.Prim ( realWorldStatePrimTy )
 
 import GHC.Core.Utils
 import GHC.Core.Opt.Arity
-import GHC.Core.FVs
 import GHC.Core.Opt.Monad ( CoreToDo(..) )
 import GHC.Core.Lint    ( endPassIO )
 import GHC.Core
@@ -64,7 +63,6 @@ import GHC.Utils.Trace
 
 import GHC.Types.Demand
 import GHC.Types.Var
-import GHC.Types.Var.Set
 import GHC.Types.Var.Env
 import GHC.Types.Id
 import GHC.Types.Id.Info
@@ -781,7 +779,7 @@ cpeRhsE env expr@(Lit (LitNumber nt i))
       Just e  -> cpeRhsE env e
 cpeRhsE _env expr@(Lit {}) = return (emptyFloats, expr)
 cpeRhsE env expr@(Var {})  = cpeApp env expr
-cpeRhsE env expr@(App {}) = cpeApp env expr
+cpeRhsE env expr@(App {})  = cpeApp env expr
 
 cpeRhsE env (Let bind body)
   = do { (env', bind_floats, maybe_bind') <- cpeBind NotTopLevel env bind
@@ -916,9 +914,7 @@ rhsToBody (Cast e co)
   = do { (floats, e') <- rhsToBody e
        ; return (floats, Cast e' co) }
 
-rhsToBody expr@(Lam {})
-  | Just no_lam_result <- tryEtaReducePrep bndrs body
-  = return (emptyFloats, no_lam_result)
+rhsToBody expr@(Lam {})   -- See Note [No eta reduction needed in rhsToBody]
   | all isTyVar bndrs           -- Type lambdas are ok
   = return (emptyFloats, expr)
   | otherwise                   -- Some value lambdas
@@ -927,11 +923,29 @@ rhsToBody expr@(Lam {})
        ; let float = FloatLet (NonRec fn rhs)
        ; return (unitFloat float, Var fn) }
   where
-    (bndrs,body) = collectBinders expr
+    (bndrs,_) = collectBinders expr
 
 rhsToBody expr = return (emptyFloats, expr)
 
 
+{- Note [No eta reduction needed in rhsToBody]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Historical note.  In the olden days we used to have a Prep-specific
+eta-reduction step in rhsToBody:
+  rhsToBody expr@(Lam {})
+    | Just no_lam_result <- tryEtaReducePrep bndrs body
+    = return (emptyFloats, no_lam_result)
+
+The goal was to reduce
+        case x of { p -> \xs. map f xs }
+    ==> case x of { p -> map f }
+
+to avoid allocating a lambda.  Of course, we'd allocate a PAP
+instead, which is hardly better, but that's the way it was.
+
+Now we simply don't bother with this. It doesn't seem to be a win,
+and it's extra work.
+-}
 
 -- ---------------------------------------------------------------------------
 --              CpeApp: produces a result satisfying CpeApp
@@ -1581,7 +1595,7 @@ the simplifier only when there at least one lambda already.
 
 NB1:we could refrain when the RHS is trivial (which can happen
     for exported things).  This would reduce the amount of code
-    generated (a little) and make things a little words for
+    generated (a little) and make things a little worse for
     code compiled without -O.  The case in point is data constructor
     wrappers.
 
@@ -1615,58 +1629,6 @@ cpeEtaExpand arity expr
   | otherwise  = etaExpand arity expr
 
 {-
--- -----------------------------------------------------------------------------
---      Eta reduction
--- -----------------------------------------------------------------------------
-
-Why try eta reduction?  Hasn't the simplifier already done eta?
-But the simplifier only eta reduces if that leaves something
-trivial (like f, or f Int).  But for deLam it would be enough to
-get to a partial application:
-        case x of { p -> \xs. map f xs }
-    ==> case x of { p -> map f }
--}
-
--- When updating this function, make sure it lines up with
--- GHC.Core.Utils.tryEtaReduce!
-tryEtaReducePrep :: [CoreBndr] -> CoreExpr -> Maybe CoreExpr
-tryEtaReducePrep bndrs expr@(App _ _)
-  | ok_to_eta_reduce f
-  , n_remaining >= 0
-  , and (zipWith ok bndrs last_args)
-  , not (any (`elemVarSet` fvs_remaining) bndrs)
-  , exprIsHNF remaining_expr   -- Don't turn value into a non-value
-                               -- else the behaviour with 'seq' changes
-  =
-    -- pprTrace "prep-reduce" (vcat
-    --   [ text "reduced:" <+> ppr expr
-    --   , text "from" <+> ppr (length args) <+> text "to" <+> ppr n_remaining
-    --   , (case f of Var v -> text "has strict worker:" <+> ppr (idCbvMarkArity v) <+> ppr n_remaining_vals; _ -> empty)
-    --   , ppr remaining_args
-    --   ]) $
-    Just remaining_expr
-  where
-    (f, args) = collectArgs expr
-    remaining_expr = mkApps f remaining_args
-    fvs_remaining = exprFreeVars remaining_expr
-    (remaining_args, last_args) = splitAt n_remaining args
-    n_remaining = length args - length bndrs
-    n_remaining_vals = length $ filter isRuntimeArg remaining_args
-
-    ok bndr (Var arg) = bndr == arg
-    ok _    _         = False
-
-    ok_to_eta_reduce (Var f) = canEtaReduceToArity f n_remaining n_remaining_vals
-    ok_to_eta_reduce _       = False -- Safe. ToDo: generalise
-
-
-tryEtaReducePrep bndrs (Tick tickish e)
-  | tickishFloatable tickish
-  = fmap (mkTick tickish) $ tryEtaReducePrep bndrs e
-
-tryEtaReducePrep _ _ = Nothing
-
-{-
 ************************************************************************
 *                                                                      *
                 Floats
diff --git a/compiler/GHC/Driver/Config.hs b/compiler/GHC/Driver/Config.hs
index 2d4135a847..bd9790312b 100644
--- a/compiler/GHC/Driver/Config.hs
+++ b/compiler/GHC/Driver/Config.hs
@@ -29,6 +29,7 @@ initSimpleOpts :: DynFlags -> SimpleOpts
 initSimpleOpts dflags = SimpleOpts
    { so_uf_opts = unfoldingOpts dflags
    , so_co_opts = initOptCoercionOpts dflags
+   , so_eta_red = gopt Opt_DoEtaReduction dflags
    }
 
 -- | Extract BCO options from DynFlags
diff --git a/compiler/GHC/Iface/Tidy.hs b/compiler/GHC/Iface/Tidy.hs
index f7282faa83..c966d0946e 100644
--- a/compiler/GHC/Iface/Tidy.hs
+++ b/compiler/GHC/Iface/Tidy.hs
@@ -28,7 +28,7 @@ import GHC.Core.Unfold.Make
 import GHC.Core.FVs
 import GHC.Core.Tidy
 import GHC.Core.Seq     (seqBinds)
-import GHC.Core.Opt.Arity   ( exprArity, typeArity,, exprBotStrictness_maybe )
+import GHC.Core.Opt.Arity   ( exprArity, typeArity, exprBotStrictness_maybe )
 import GHC.Core.InstEnv
 import GHC.Core.Type     ( Type, tidyTopType )
 import GHC.Core.DataCon
@@ -52,7 +52,7 @@ import GHC.Types.Var
 import GHC.Types.Id
 import GHC.Types.Id.Make ( mkDictSelRhs )
 import GHC.Types.Id.Info
-import GHC.Types.Demand  ( isDeadEndAppSig, isTopSig, isDeadEndSig )
+import GHC.Types.Demand  ( isDeadEndAppSig, isNopSig, nopSig, isDeadEndSig )
 import GHC.Types.Cpr     ( mkCprSig, botCpr )
 import GHC.Types.Basic
 import GHC.Types.Name hiding (varName)
@@ -1263,11 +1263,16 @@ tidyTopIdInfo uf_opts rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unf
     mb_bot_str = exprBotStrictness_maybe orig_rhs
 
     sig = dmdSigInfo idinfo
-    final_sig | not $ isTopSig sig
+    final_sig | not (isNopSig sig)
               = warnPprTrace (_bottom_hidden sig) "tidyTopIdInfo" (ppr name) sig
-              -- try a cheap-and-cheerful bottom analyser
-              | Just (_, nsig) <- mb_bot_str = nsig
-              | otherwise                    = sig
+
+              -- No demand signature, so try a
+              -- cheap-and-cheerful bottom analyser
+              | Just (_, nsig) <- mb_bot_str
+              = nsig
+
+              -- No stricness info
+              | otherwise = nopSig
 
     cpr = cprSigInfo idinfo
     final_cpr | Just _ <- mb_bot_str
@@ -1314,7 +1319,7 @@ tidyTopIdInfo uf_opts rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unf
     arity = exprArity orig_rhs `min` typeArity rhs_ty
             -- orig_rhs: using tidy_rhs would make a black hole, since
             --           exprArity uses the arities of Ids inside the rhs
-            -- typeArity: see Note [typeArity invariants]
+            -- typeArity: see Note [Arity invariants for bindings]
             --            in GHC.Core.Opt.Arity
 
 {-
@@ -1419,4 +1424,4 @@ mustExposeTyCon no_trim_types exports tc
     exported_con con = any (`elemNameSet` exports)
                            (dataConName con : dataConFieldLabels con)
 -}
->>>>>>> Do arity trimming at bindings, rather than in exprArity
+
diff --git a/compiler/GHC/StgToCmm/Bind.hs b/compiler/GHC/StgToCmm/Bind.hs
index e8539d80f5..cf54ef4be0 100644
--- a/compiler/GHC/StgToCmm/Bind.hs
+++ b/compiler/GHC/StgToCmm/Bind.hs
@@ -16,6 +16,7 @@ module GHC.StgToCmm.Bind (
 import GHC.Prelude hiding ((<*>))
 
 import GHC.Core          ( AltCon(..) )
+import GHC.Core.Opt.Arity( isOneShotBndr )
 import GHC.Runtime.Heap.Layout
 import GHC.Unit.Module
 
diff --git a/compiler/GHC/Types/Demand.hs b/compiler/GHC/Types/Demand.hs
index 4c651cc9c2..8ea0ddc84e 100644
--- a/compiler/GHC/Types/Demand.hs
+++ b/compiler/GHC/Types/Demand.hs
@@ -62,9 +62,10 @@ module GHC.Types.Demand (
     keepAliveDmdType,
 
     -- * Demand signatures
-    DmdSig(..), mkDmdSigForArity, mkClosedDmdSig,
+    DmdSig(..), mkDmdSigForArity, mkClosedDmdSig, mkVanillaDmdSig,
     splitDmdSig, dmdSigDmdEnv, hasDemandEnvSig,
-    nopSig, botSig, isTopSig, isDeadEndSig, isDeadEndAppSig, trimBoxityDmdSig,
+    nopSig, botSig, isNopSig, isDeadEndSig, isDeadEndAppSig, trimBoxityDmdSig,
+
     -- ** Handling arity adjustments
     prependArgsDmdSig, etaConvertDmdSig,
 
@@ -1831,8 +1832,8 @@ botDmdType = DmdType emptyDmdEnv [] botDiv
 nopDmdType :: DmdType
 nopDmdType = DmdType emptyDmdEnv [] topDiv
 
-isTopDmdType :: DmdType -> Bool
-isTopDmdType (DmdType env args div)
+isNopDmdType :: DmdType -> Bool
+isNopDmdType (DmdType env args div)
   = div == topDiv && null args && isEmptyVarEnv env
 
 -- | The demand type of an unspecified expression that is guaranteed to
@@ -2158,6 +2159,9 @@ mkDmdSigForArity arity dmd_ty@(DmdType fvs args div)
 mkClosedDmdSig :: [Demand] -> Divergence -> DmdSig
 mkClosedDmdSig ds res = mkDmdSigForArity (length ds) (DmdType emptyDmdEnv ds res)
 
+mkVanillaDmdSig :: Arity -> Divergence -> DmdSig
+mkVanillaDmdSig ar div = mkClosedDmdSig (replicate ar topDmd) div
+
 splitDmdSig :: DmdSig -> ([Demand], Divergence)
 splitDmdSig (DmdSig (DmdType _ dmds res)) = (dmds, res)
 
@@ -2173,8 +2177,8 @@ botSig = DmdSig botDmdType
 nopSig :: DmdSig
 nopSig = DmdSig nopDmdType
 
-isTopSig :: DmdSig -> Bool
-isTopSig (DmdSig ty) = isTopDmdType ty
+isNopSig :: DmdSig -> Bool
+isNopSig (DmdSig ty) = isNopDmdType ty
 
 -- | True if the signature diverges or throws an exception in a saturated call.
 -- See Note [Dead ends].
@@ -2219,7 +2223,7 @@ prependArgsDmdSig :: Int -> DmdSig -> DmdSig
 -- demands. This is used by FloatOut.
 prependArgsDmdSig new_args sig@(DmdSig dmd_ty@(DmdType env dmds res))
   | new_args == 0       = sig
-  | isTopDmdType dmd_ty = sig
+  | isNopDmdType dmd_ty = sig
   | new_args < 0        = pprPanic "prependArgsDmdSig: negative new_args"
                                    (ppr new_args $$ ppr sig)
   | otherwise           = DmdSig (DmdType env dmds' res)
diff --git a/compiler/GHC/Types/Id.hs b/compiler/GHC/Types/Id.hs
index 01ad94172a..6135d02f9c 100644
--- a/compiler/GHC/Types/Id.hs
+++ b/compiler/GHC/Types/Id.hs
@@ -86,10 +86,8 @@ module GHC.Types.Id (
         idInlineActivation, setInlineActivation, idRuleMatchInfo,
 
         -- ** One-shot lambdas
-        isOneShotBndr, isProbablyOneShotLambda,
         setOneShotLambda, clearOneShotLambda,
         updOneShotInfo, setIdOneShotInfo,
-        isStateHackType, stateHackOneShot, typeOneShot,
 
         -- ** Reading 'IdInfo' fields
         idArity,
@@ -97,7 +95,7 @@ module GHC.Types.Id (
         idUnfolding, realIdUnfolding,
         idSpecialisation, idCoreRules, idHasRules,
         idCafInfo, idLFInfo_maybe,
-        idOneShotInfo, idStateHackOneShotInfo,
+        idOneShotInfo,
         idOccInfo,
 
         -- ** Writing 'IdInfo' fields
@@ -144,7 +142,6 @@ import qualified GHC.Types.Var as Var
 
 import GHC.Core.Type
 import GHC.Types.RepType
-import GHC.Builtin.Types.Prim
 import GHC.Core.DataCon
 import GHC.Types.Demand
 import GHC.Types.Cpr
@@ -165,7 +162,6 @@ import GHC.Utils.Misc
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Panic.Plain
-import GHC.Utils.GlobalVars
 import GHC.Utils.Trace
 import GHC.Stg.InferTags.TagSig
 
@@ -880,64 +876,6 @@ isConLikeId id = isConLike (idRuleMatchInfo id)
 idOneShotInfo :: Id -> OneShotInfo
 idOneShotInfo id = oneShotInfo (idInfo id)
 
--- | Like 'idOneShotInfo', but taking the Horrible State Hack in to account
--- See Note [The state-transformer hack] in "GHC.Core.Opt.Arity"
-idStateHackOneShotInfo :: Id -> OneShotInfo
-idStateHackOneShotInfo id
-    | isStateHackType (idType id) = stateHackOneShot
-    | otherwise                   = idOneShotInfo id
-
--- | Returns whether the lambda associated with the 'Id' is certainly applied at most once
--- This one is the "business end", called externally.
--- It works on type variables as well as Ids, returning True
--- Its main purpose is to encapsulate the Horrible State Hack
--- See Note [The state-transformer hack] in "GHC.Core.Opt.Arity"
-isOneShotBndr :: Var -> Bool
-isOneShotBndr var
-  | isTyVar var                              = True
-  | OneShotLam <- idStateHackOneShotInfo var = True
-  | otherwise                                = False
-
--- | Should we apply the state hack to values of this 'Type'?
-stateHackOneShot :: OneShotInfo
-stateHackOneShot = OneShotLam
-
-typeOneShot :: Type -> OneShotInfo
-typeOneShot ty
-   | isStateHackType ty = stateHackOneShot
-   | otherwise          = NoOneShotInfo
-
-isStateHackType :: Type -> Bool
-isStateHackType ty
-  | unsafeHasNoStateHack
-  = False
-  | otherwise
-  = case tyConAppTyCon_maybe ty of
-        Just tycon -> tycon == statePrimTyCon
-        _          -> False
-        -- This is a gross hack.  It claims that
-        -- every function over realWorldStatePrimTy is a one-shot
-        -- function.  This is pretty true in practice, and makes a big
-        -- difference.  For example, consider
-        --      a `thenST` \ r -> ...E...
-        -- The early full laziness pass, if it doesn't know that r is one-shot
-        -- will pull out E (let's say it doesn't mention r) to give
-        --      let lvl = E in a `thenST` \ r -> ...lvl...
-        -- When `thenST` gets inlined, we end up with
-        --      let lvl = E in \s -> case a s of (r, s') -> ...lvl...
-        -- and we don't re-inline E.
-        --
-        -- It would be better to spot that r was one-shot to start with, but
-        -- I don't want to rely on that.
-        --
-        -- Another good example is in fill_in in PrelPack.hs.  We should be able to
-        -- spot that fill_in has arity 2 (and when Keith is done, we will) but we can't yet.
-
-isProbablyOneShotLambda :: Id -> Bool
-isProbablyOneShotLambda id = case idStateHackOneShotInfo id of
-                               OneShotLam    -> True
-                               NoOneShotInfo -> False
-
 setOneShotLambda :: Id -> Id
 setOneShotLambda id = modifyIdInfo (`setOneShotInfo` OneShotLam) id
 
diff --git a/compiler/GHC/Types/Id/Info.hs b/compiler/GHC/Types/Id/Info.hs
index ee7708baa8..77eb06f206 100644
--- a/compiler/GHC/Types/Id/Info.hs
+++ b/compiler/GHC/Types/Id/Info.hs
@@ -156,6 +156,7 @@ data IdDetails
   | PrimOpId PrimOp Bool        -- ^ The 'Id' is for a primitive operator
                                 -- True <=> is representation-polymorphic,
                                 --          and hence has no binding
+                                -- This lev-poly flag is used only in GHC.Types.Id.hasNoBinding
 
   | FCallId ForeignCall         -- ^ The 'Id' is for a foreign call.
                                 -- Type will be simple: no type families, newtypes, etc
diff --git a/compiler/GHC/Types/Id/Make.hs b/compiler/GHC/Types/Id/Make.hs
index e46a3279fa..6b7f1053b9 100644
--- a/compiler/GHC/Types/Id/Make.hs
+++ b/compiler/GHC/Types/Id/Make.hs
@@ -43,6 +43,7 @@ import GHC.Builtin.Types
 import GHC.Builtin.Names
 
 import GHC.Core
+import GHC.Core.Opt.Arity( typeOneShot )
 import GHC.Core.Type
 import GHC.Core.Multiplicity
 import GHC.Core.TyCo.Rep
@@ -1318,7 +1319,7 @@ mkFCallId uniq fcall ty
 
     (bndrs, _) = tcSplitPiTys ty
     arity      = count isAnonTyCoBinder bndrs
-    strict_sig = mkClosedDmdSig (replicate arity topDmd) topDiv
+    strict_sig = mkVanillaDmdSig arity topDiv
     -- the call does not claim to be strict in its arguments, since they
     -- may be lifted (foreign import prim) and the called code doesn't
     -- necessarily force them. See #11076.
@@ -1771,9 +1772,11 @@ inlined.
 -}
 
 realWorldPrimId :: Id   -- :: State# RealWorld
-realWorldPrimId = pcMiscPrelId realWorldName realWorldStatePrimTy
+realWorldPrimId = pcMiscPrelId realWorldName id_ty
                      (noCafIdInfo `setUnfoldingInfo` evaldUnfolding    -- Note [evaldUnfoldings]
-                                  `setOneShotInfo`   stateHackOneShot)
+                                  `setOneShotInfo`   typeOneShot id_ty)
+   where
+     id_ty = realWorldStatePrimTy
 
 voidPrimId :: Id     -- Global constant :: Void#
                      -- The type Void# is now the same as (# #) (ticket #18441),
diff --git a/testsuite/tests/arityanal/should_compile/Arity03.stderr b/testsuite/tests/arityanal/should_compile/Arity03.stderr
index e5e3e754dd..f41fc1552c 100644
--- a/testsuite/tests/arityanal/should_compile/Arity03.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity03.stderr
@@ -18,18 +18,18 @@ end Rec }
 fac [InlPrag=[2]] :: Int -> Int
 [GblId,
  Arity=1,
- Str=<1P(1L)>,
- Cpr=m1,
+ Str=<1!P(1L)>,
+ Cpr=1,
  Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (w [Occ=Once1!] :: Int) -> case w of { GHC.Types.I# ww1 [Occ=Once1] -> case F3.$wfac ww1 of ww2 [Occ=Once1] { __DEFAULT -> GHC.Types.I# ww2 } }}]
-fac = \ (w :: Int) -> case w of { GHC.Types.I# ww1 -> case F3.$wfac ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 } }
+         Tmpl= \ (x [Occ=Once1!] :: Int) -> case x of { GHC.Types.I# ww [Occ=Once1] -> case F3.$wfac ww of ww1 [Occ=Once1] { __DEFAULT -> GHC.Types.I# ww1 } }}]
+fac = \ (x :: Int) -> case x of { GHC.Types.I# ww -> case F3.$wfac ww of ww1 { __DEFAULT -> GHC.Types.I# ww1 } }
 
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
 f3 :: Int -> Int
 [GblId,
  Arity=1,
- Str=<1P(1L)>,
- Cpr=m1,
+ Str=<1!P(1L)>,
+ Cpr=1,
  Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
          Tmpl= fac}]
 f3 = fac
diff --git a/testsuite/tests/arityanal/should_compile/Arity11.stderr b/testsuite/tests/arityanal/should_compile/Arity11.stderr
index 7c7451a6d7..a4f2e38b53 100644
--- a/testsuite/tests/arityanal/should_compile/Arity11.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity11.stderr
@@ -142,4 +142,7 @@ f11 :: (Integer, Integer)
 f11 = (F11.f4, F11.f1)
 
 
+------ Local rules for imported ids --------
+"SPEC fib @Integer @Integer" forall ($dEq :: Eq Integer) ($dNum :: Num Integer) ($dNum1 :: Num Integer). fib @Integer @Integer $dEq $dNum $dNum1 = F11.f11_fib
+
 
diff --git a/testsuite/tests/codeGen/should_compile/debug.stdout b/testsuite/tests/codeGen/should_compile/debug.stdout
index 3dca62a419..25df0c258f 100644
--- a/testsuite/tests/codeGen/should_compile/debug.stdout
+++ b/testsuite/tests/codeGen/should_compile/debug.stdout
@@ -18,6 +18,7 @@ src<debug.hs:4:9>
 src<debug.hs:5:21-29>
 src<debug.hs:5:9-29>
 src<debug.hs:6:1-21>
+src<debug.hs:6:16-21>
 == CBE ==
 src<debug.hs:4:9>
 89
diff --git a/testsuite/tests/driver/inline-check.stderr b/testsuite/tests/driver/inline-check.stderr
index 40b5b59d19..a65d39ea6f 100644
--- a/testsuite/tests/driver/inline-check.stderr
+++ b/testsuite/tests/driver/inline-check.stderr
@@ -1,6 +1,6 @@
 Considering inlining: foo
   arg infos [ValueArg]
-  interesting continuation RhsCtxt
+  interesting continuation RhsCtxt(NonRecursive)
   some_benefit True
   is exp: True
   is work-free: True
@@ -19,7 +19,7 @@ Inactive unfolding: foo1
 Inactive unfolding: foo1
 Considering inlining: foo
   arg infos []
-  interesting continuation RhsCtxt
+  interesting continuation RhsCtxt(NonRecursive)
   some_benefit False
   is exp: True
   is work-free: True
diff --git a/testsuite/tests/numeric/should_compile/T19641.stderr b/testsuite/tests/numeric/should_compile/T19641.stderr
index 8f6e3696be..7c1cf57b06 100644
--- a/testsuite/tests/numeric/should_compile/T19641.stderr
+++ b/testsuite/tests/numeric/should_compile/T19641.stderr
@@ -4,16 +4,16 @@ Result size of Tidy Core
   = {terms: 22, types: 20, coercions: 0, joins: 0/0}
 
 integer_to_int
-  = \ x ->
-      case x of {
+  = \ eta ->
+      case eta of {
         IS ipv -> Just (I# ipv);
         IP x1 -> Nothing;
         IN ds -> Nothing
       }
 
 natural_to_word
-  = \ x ->
-      case x of {
+  = \ eta ->
+      case eta of {
         NS x1 -> Just (W# x1);
         NB ds -> Nothing
       }
diff --git a/testsuite/tests/profiling/should_run/T2552.prof.sample b/testsuite/tests/profiling/should_run/T2552.prof.sample
index 7ed927f6db..c8bfad1ecf 100644
--- a/testsuite/tests/profiling/should_run/T2552.prof.sample
+++ b/testsuite/tests/profiling/should_run/T2552.prof.sample
@@ -1,36 +1,36 @@
-	Sat Jun  4 11:59 2016 Time and Allocation Profiling Report  (Final)
+	Mon Apr 25 16:27 2022 Time and Allocation Profiling Report  (Final)
 
 	   T2552 +RTS -hc -p -RTS
 
-	total time  =        0.09 secs   (90 ticks @ 1000 us, 1 processor)
-	total alloc = 123,465,848 bytes  (excludes profiling overheads)
+	total time  =        0.05 secs   (49 ticks @ 1000 us, 1 processor)
+	total alloc =  74,099,440 bytes  (excludes profiling overheads)
 
 COST CENTRE     MODULE SRC               %time %alloc
 
-fib1.fib1'.nfib Main   T2552.hs:5:9-61    37.8   33.3
-fib2'.nfib      Main   T2552.hs:10:5-57   31.1   33.3
-fib3'.nfib      Main   T2552.hs:15:5-57   31.1   33.3
+fib1.fib1'.nfib Main   T2552.hs:5:9-61    34.7   33.3
+fib3'.nfib      Main   T2552.hs:15:5-57   32.7   33.3
+fib2'.nfib      Main   T2552.hs:10:5-57   32.7   33.3
 
 
                                                                                        individual      inherited
 COST CENTRE           MODULE                SRC                     no.     entries  %time %alloc   %time %alloc
 
-MAIN                  MAIN                  <built-in>               45          0    0.0    0.0   100.0  100.0
- CAF                  Main                  <entire-module>          89          0    0.0    0.0   100.0  100.0
-  main                Main                  T2552.hs:(17,1)-(20,17)  90          1    0.0    0.0   100.0  100.0
-   fib1               Main                  T2552.hs:(1,1)-(5,61)    92          1    0.0    0.0    37.8   33.3
-    fib1.fib1'        Main                  T2552.hs:(3,5)-(5,61)    93          1    0.0    0.0    37.8   33.3
-     nfib'            Main                  T2552.hs:3:35-40         94          1    0.0    0.0    37.8   33.3
-      fib1.fib1'.nfib Main                  T2552.hs:5:9-61          95    1028457   37.8   33.3    37.8   33.3
-   fib2               Main                  T2552.hs:7:1-16          96          1    0.0    0.0    31.1   33.3
-    fib2'             Main                  T2552.hs:(8,1)-(10,57)   97          1    0.0    0.0    31.1   33.3
-     fib2'.nfib       Main                  T2552.hs:10:5-57         98    1028457   31.1   33.3    31.1   33.3
-   fib3               Main                  T2552.hs:12:1-12         99          1    0.0    0.0     0.0    0.0
-   fib3'              Main                  T2552.hs:(13,1)-(15,57) 100          1    0.0    0.0    31.1   33.3
-    fib3'.nfib        Main                  T2552.hs:15:5-57        101    1028457   31.1   33.3    31.1   33.3
- CAF                  GHC.IO.Handle.FD      <entire-module>          84          0    0.0    0.0     0.0    0.0
- CAF                  GHC.IO.Handle.Text    <entire-module>          83          0    0.0    0.0     0.0    0.0
- CAF                  GHC.Conc.Signal       <entire-module>          81          0    0.0    0.0     0.0    0.0
- CAF                  GHC.IO.Encoding       <entire-module>          78          0    0.0    0.0     0.0    0.0
- CAF                  GHC.IO.Encoding.Iconv <entire-module>          64          0    0.0    0.0     0.0    0.0
- main                 Main                  T2552.hs:(17,1)-(20,17)  91          0    0.0    0.0     0.0    0.0
+MAIN                  MAIN                  <built-in>              128           0    0.0    0.0   100.0  100.0
+ CAF                  Main                  <entire-module>         255           0    0.0    0.0   100.0   99.9
+  fib3                Main                  T2552.hs:12:1-12        265           1    0.0    0.0     0.0    0.0
+  main                Main                  T2552.hs:(17,1)-(20,17) 256           1    0.0    0.0   100.0   99.9
+   fib1               Main                  T2552.hs:(1,1)-(5,61)   258           1    0.0    0.0    34.7   33.3
+    fib1.fib1'        Main                  T2552.hs:(3,5)-(5,61)   259           1    0.0    0.0    34.7   33.3
+     nfib'            Main                  T2552.hs:3:35-40        260           1    0.0    0.0    34.7   33.3
+      fib1.fib1'.nfib Main                  T2552.hs:5:9-61         261     1028457   34.7   33.3    34.7   33.3
+   fib2               Main                  T2552.hs:7:1-16         262           1    0.0    0.0    32.7   33.3
+    fib2'             Main                  T2552.hs:(8,1)-(10,57)  263           1    0.0    0.0    32.7   33.3
+     fib2'.nfib       Main                  T2552.hs:10:5-57        264     1028457   32.7   33.3    32.7   33.3
+   fib3               Main                  T2552.hs:12:1-12        266           0    0.0    0.0    32.7   33.3
+    fib3'             Main                  T2552.hs:(13,1)-(15,57) 267           1    0.0    0.0    32.7   33.3
+     fib3'.nfib       Main                  T2552.hs:15:5-57        268     1028457   32.7   33.3    32.7   33.3
+ CAF                  GHC.Conc.Signal       <entire-module>         250           0    0.0    0.0     0.0    0.0
+ CAF                  GHC.IO.Encoding       <entire-module>         241           0    0.0    0.0     0.0    0.0
+ CAF                  GHC.IO.Encoding.Iconv <entire-module>         239           0    0.0    0.0     0.0    0.0
+ CAF                  GHC.IO.Handle.FD      <entire-module>         231           0    0.0    0.0     0.0    0.0
+ main                 Main                  T2552.hs:(17,1)-(20,17) 257           0    0.0    0.0     0.0    0.0
diff --git a/testsuite/tests/profiling/should_run/all.T b/testsuite/tests/profiling/should_run/all.T
index 0455d06f17..96a0d30bc6 100644
--- a/testsuite/tests/profiling/should_run/all.T
+++ b/testsuite/tests/profiling/should_run/all.T
@@ -93,7 +93,7 @@ test('T5314', [extra_ways(extra_prof_ways)], compile_and_run, [''])
 test('T680', [], compile_and_run,
      ['-fno-full-laziness']) # Note [consistent stacks]
 
-test('T2552', [expect_broken_for_10037], compile_and_run, [''])
+test('T2552', [], compile_and_run, [''])
 
 test('T949', [extra_ways(extra_prof_ways)], compile_and_run, [''])
 
@@ -101,7 +101,7 @@ test('T949', [extra_ways(extra_prof_ways)], compile_and_run, [''])
 # We care more about getting the optimised results right, so ignoring
 # this for now.
 test('ioprof',
-     [expect_broken_for_10037,
+     [normal,
       exit_code(1),
       omit_ways(['ghci-ext-prof']), # doesn't work with exit_code(1)
       ignore_stderr
diff --git a/testsuite/tests/profiling/should_run/ioprof.prof.sample b/testsuite/tests/profiling/should_run/ioprof.prof.sample
index 52ab8ba4d2..103207d8ca 100644
--- a/testsuite/tests/profiling/should_run/ioprof.prof.sample
+++ b/testsuite/tests/profiling/should_run/ioprof.prof.sample
@@ -1,46 +1,54 @@
-	Sat Jun  4 11:59 2016 Time and Allocation Profiling Report  (Final)
+	Mon May 23 13:50 2022 Time and Allocation Profiling Report  (Final)
 
 	   ioprof +RTS -hc -p -RTS
 
 	total time  =        0.00 secs   (0 ticks @ 1000 us, 1 processor)
-	total alloc =     180,024 bytes  (excludes profiling overheads)
+	total alloc =     129,248 bytes  (excludes profiling overheads)
 
 COST CENTRE MODULE           SRC                 %time %alloc
 
-CAF         GHC.IO.Encoding  <entire-module>       0.0    1.8
-CAF         GHC.IO.Handle.FD <entire-module>       0.0   19.2
-CAF         GHC.Exception    <entire-module>       0.0    2.5
-main        Main             ioprof.hs:28:1-43     0.0    4.8
-errorM.\    Main             ioprof.hs:23:22-28    0.0   68.7
+CAF         Main             <entire-module>       0.0    1.1
+main        Main             ioprof.hs:28:1-43     0.0    6.8
+errorM.\    Main             ioprof.hs:23:22-28    0.0   56.8
+CAF         GHC.IO.Handle.FD <entire-module>       0.0   26.9
+CAF         GHC.IO.Exception <entire-module>       0.0    1.0
+CAF         GHC.IO.Encoding  <entire-module>       0.0    2.3
+CAF         GHC.Exception    <entire-module>       0.0    3.0
 
 
-                                                                                      individual      inherited
-COST CENTRE        MODULE                SRC                       no.     entries  %time %alloc   %time %alloc
+                                                                                  individual      inherited
+COST CENTRE    MODULE                SRC                       no.     entries  %time %alloc   %time %alloc
 
-MAIN               MAIN                  <built-in>                 46          0    0.0    0.4     0.0  100.0
- CAF               Main                  <entire-module>            91          0    0.0    0.9     0.0   69.8
-  <*>              Main                  ioprof.hs:20:5-14          96          1    0.0    0.0     0.0    0.0
-  fmap             Main                  ioprof.hs:16:5-16         100          1    0.0    0.0     0.0    0.0
-  main             Main                  ioprof.hs:28:1-43          92          1    0.0    0.0     0.0   68.9
-   runM            Main                  ioprof.hs:26:1-37          94          1    0.0    0.1     0.0   68.9
-    bar            Main                  ioprof.hs:31:1-20          95          1    0.0    0.1     0.0   68.8
-     foo           Main                  ioprof.hs:34:1-16         104          1    0.0    0.0     0.0    0.0
-      errorM       Main                  ioprof.hs:23:1-28         105          1    0.0    0.0     0.0    0.0
-     <*>           Main                  ioprof.hs:20:5-14          97          0    0.0    0.0     0.0   68.7
-      >>=          Main                  ioprof.hs:(11,3)-(12,50)   98          1    0.0    0.0     0.0   68.7
-       >>=.\       Main                  ioprof.hs:(11,27)-(12,50)  99          2    0.0    0.0     0.0   68.7
-        fmap       Main                  ioprof.hs:16:5-16         103          0    0.0    0.0     0.0    0.0
-        foo        Main                  ioprof.hs:34:1-16         106          0    0.0    0.0     0.0   68.7
-         errorM    Main                  ioprof.hs:23:1-28         107          0    0.0    0.0     0.0   68.7
-          errorM.\ Main                  ioprof.hs:23:22-28        108          1    0.0   68.7     0.0   68.7
-     fmap          Main                  ioprof.hs:16:5-16         101          0    0.0    0.0     0.0    0.0
-      >>=          Main                  ioprof.hs:(11,3)-(12,50)  102          1    0.0    0.0     0.0    0.0
- CAF               GHC.IO.Exception      <entire-module>            89          0    0.0    0.7     0.0    0.7
- CAF               GHC.Exception         <entire-module>            86          0    0.0    2.5     0.0    2.5
- CAF               GHC.IO.Handle.FD      <entire-module>            85          0    0.0   19.2     0.0   19.2
- CAF               GHC.Conc.Signal       <entire-module>            82          0    0.0    0.4     0.0    0.4
- CAF               GHC.IO.Encoding       <entire-module>            80          0    0.0    1.8     0.0    1.8
- CAF               GHC.Conc.Sync         <entire-module>            75          0    0.0    0.1     0.0    0.1
- CAF               GHC.Stack.CCS         <entire-module>            71          0    0.0    0.2     0.0    0.2
- CAF               GHC.IO.Encoding.Iconv <entire-module>            64          0    0.0    0.1     0.0    0.1
- main              Main                  ioprof.hs:28:1-43          93          0    0.0    4.8     0.0    4.8
+MAIN           MAIN                  <built-in>                129           0    0.0    0.5     0.0  100.0
+ CAF           GHC.Conc.Signal       <entire-module>           233           0    0.0    0.5     0.0    0.5
+ CAF           GHC.Conc.Sync         <entire-module>           232           0    0.0    0.5     0.0    0.5
+ CAF           GHC.Exception         <entire-module>           215           0    0.0    3.0     0.0    3.0
+ CAF           GHC.IO.Encoding       <entire-module>           199           0    0.0    2.3     0.0    2.3
+ CAF           GHC.IO.Encoding.Iconv <entire-module>           197           0    0.0    0.2     0.0    0.2
+ CAF           GHC.IO.Exception      <entire-module>           191           0    0.0    1.0     0.0    1.0
+ CAF           GHC.IO.Handle.FD      <entire-module>           188           0    0.0   26.9     0.0   26.9
+ CAF           GHC.Stack.CCS         <entire-module>           167           0    0.0    0.2     0.0    0.2
+ CAF           GHC.Weak.Finalize     <entire-module>           158           0    0.0    0.0     0.0    0.0
+ CAF           Main                  <entire-module>           136           0    0.0    1.1     0.0    1.1
+  <*>          Main                  ioprof.hs:20:5-14         261           1    0.0    0.0     0.0    0.0
+  fmap         Main                  ioprof.hs:16:5-16         269           1    0.0    0.0     0.0    0.0
+  main         Main                  ioprof.hs:28:1-43         258           1    0.0    0.0     0.0    0.0
+ main          Main                  ioprof.hs:28:1-43         259           0    0.0    6.8     0.0   63.7
+  bar          Main                  ioprof.hs:31:1-20         260           1    0.0    0.1     0.0    0.2
+   foo         Main                  ioprof.hs:34:1-16         275           1    0.0    0.0     0.0    0.0
+    errorM     Main                  ioprof.hs:23:1-28         276           1    0.0    0.0     0.0    0.0
+   <*>         Main                  ioprof.hs:20:5-14         262           0    0.0    0.0     0.0    0.0
+    >>=        Main                  ioprof.hs:(11,3)-(12,50)  263           1    0.0    0.0     0.0    0.0
+   fmap        Main                  ioprof.hs:16:5-16         270           0    0.0    0.0     0.0    0.0
+    >>=        Main                  ioprof.hs:(11,3)-(12,50)  271           1    0.0    0.0     0.0    0.0
+  runM         Main                  ioprof.hs:26:1-37         264           1    0.0    0.0     0.0   56.8
+   bar         Main                  ioprof.hs:31:1-20         265           0    0.0    0.0     0.0   56.8
+    <*>        Main                  ioprof.hs:20:5-14         266           0    0.0    0.0     0.0    0.0
+     >>=       Main                  ioprof.hs:(11,3)-(12,50)  267           0    0.0    0.0     0.0    0.0
+      >>=.\    Main                  ioprof.hs:(11,27)-(12,50) 268           1    0.0    0.0     0.0    0.0
+    fmap       Main                  ioprof.hs:16:5-16         272           0    0.0    0.0     0.0    0.0
+     >>=       Main                  ioprof.hs:(11,3)-(12,50)  273           0    0.0    0.0     0.0    0.0
+      >>=.\    Main                  ioprof.hs:(11,27)-(12,50) 274           1    0.0    0.0     0.0    0.0
+    foo        Main                  ioprof.hs:34:1-16         277           0    0.0    0.0     0.0   56.8
+     errorM    Main                  ioprof.hs:23:1-28         278           0    0.0    0.0     0.0   56.8
+      errorM.\ Main                  ioprof.hs:23:22-28        279           1    0.0   56.8     0.0   56.8
diff --git a/testsuite/tests/simplCore/should_compile/Makefile b/testsuite/tests/simplCore/should_compile/Makefile
index 02358e1746..b1ed06bf71 100644
--- a/testsuite/tests/simplCore/should_compile/Makefile
+++ b/testsuite/tests/simplCore/should_compile/Makefile
@@ -160,12 +160,12 @@ T5298:
 .PHONY: T5327
 T5327:
 	$(RM) -f T5327.hi T5327.o
-	'$(TEST_HC)' $(TEST_HC_OPTS) -c T5327.hs -O -ddump-simpl | grep -c '># 34# '
+	'$(TEST_HC)' $(TEST_HC_OPTS) -c T5327.hs -O -ddump-simpl | grep -c '34#'
 
 .PHONY: T16254
 T16254:
 	$(RM) -f T16254.hi T16254.o
-	'$(TEST_HC)' $(TEST_HC_OPTS) -c T16254.hs -O -ddump-simpl | grep -c '># 34# '
+	'$(TEST_HC)' $(TEST_HC_OPTS) -c T16254.hs -O -ddump-simpl | grep -c '34#'
 
 .PHONY: T5623
 T5623:
diff --git a/testsuite/tests/simplCore/should_compile/T16254.hs b/testsuite/tests/simplCore/should_compile/T16254.hs
index 3c1490c17c..a877eee6ab 100644
--- a/testsuite/tests/simplCore/should_compile/T16254.hs
+++ b/testsuite/tests/simplCore/should_compile/T16254.hs
@@ -8,7 +8,12 @@ newtype Size a b where
 {-# INLINABLE val2 #-}
 val2 = Size 17
 
--- In the core, we should see a comparison against 34#, i.e. constant
--- folding should have happened. We actually see it twice: Once in f's
--- definition, and once in its unfolding.
+-- In the core, we should see 34#, i.e. constant folding
+-- should have happened.
+--
+-- We actually get eta-reduction thus:
+--    tmp = I# 34#
+--    f = gtInt tmp
+-- beucase gtInt is marked INLINE with two parameters.
+-- But that's ok
 f n = case val2 of Size s -> s + s > n
diff --git a/testsuite/tests/simplCore/should_compile/T5327.hs b/testsuite/tests/simplCore/should_compile/T5327.hs
index a2d9c018ae..a533a2fe32 100644
--- a/testsuite/tests/simplCore/should_compile/T5327.hs
+++ b/testsuite/tests/simplCore/should_compile/T5327.hs
@@ -5,8 +5,13 @@ newtype Size = Size Int
 {-# INLINABLE val2 #-}
 val2 = Size 17
 
--- In the core, we should see a comparison against 34#, i.e. constant
--- folding should have happened. We actually see it twice: Once in f's
--- definition, and once in its unfolding.
+-- In the core, we should see 34#, i.e. constant folding
+-- should have happened.
+--
+-- We actually get eta-reduction thus:
+--    tmp = I# 34#
+--    f = gtInt tmp
+-- beucase gtInt is marked INLINE with two parameters.
+-- But that's ok
 f n = case val2 of Size s -> s + s > n
 
diff --git a/testsuite/tests/simplCore/should_compile/T7785.stderr b/testsuite/tests/simplCore/should_compile/T7785.stderr
index f2f819f89a..504fdc1677 100644
--- a/testsuite/tests/simplCore/should_compile/T7785.stderr
+++ b/testsuite/tests/simplCore/should_compile/T7785.stderr
@@ -1,408 +1,331 @@
-
-==================== Specialise ====================
-Result size of Specialise
-  = {terms: 293, types: 99, coercions: 11, joins: 0/2}
+==================== Common sub-expression ====================
+Result size of Common sub-expression
+  = {terms: 181, types: 89, coercions: 5, joins: 0/1}
 
 -- RHS size: {terms: 5, types: 10, coercions: 0, joins: 0/0}
-$cmyfmap_aG0
+$cmyfmap_aG7
   :: forall a b. (Domain [] a, Domain [] b) => (a -> b) -> [a] -> [b]
 [LclId,
  Arity=4,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ Str=<A><A><U><SU>,
+ Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True,
-         Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=True)}]
-$cmyfmap_aG0
-  = \ (@a_aG3) (@b_aG4) _ [Occ=Dead] _ [Occ=Dead] ->
-      map @a_aG3 @b_aG4
+         Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=True)
+         Tmpl= \ (@a_aGa)
+                 (@b_aGb)
+                 _ [Occ=Dead]
+                 _ [Occ=Dead]
+                 (eta_B0 [Occ=Once1, OS=OneShot] :: a_aGa -> b_aGb)
+                 (eta_B1 [Occ=Once1, OS=OneShot] :: [a_aGa]) ->
+                 GHC.Base.build
+                   @b_aGb
+                   (\ (@b1_aHe)
+                      (c_aHf [Occ=Once1, OS=OneShot] :: b_aGb -> b1_aHe -> b1_aHe)
+                      (n_aHg [Occ=Once1, OS=OneShot] :: b1_aHe) ->
+                      GHC.Base.foldr
+                        @a_aGa
+                        @b1_aHe
+                        (GHC.Base.mapFB @b_aGb @b1_aHe @a_aGa c_aHf eta_B0)
+                        n_aHg
+                        eta_B1)}]
+$cmyfmap_aG7
+  = \ (@a_aGa)
+      (@b_aGb)
+      _ [Occ=Dead, Dmd=A]
+      _ [Occ=Dead, Dmd=A, OS=OneShot] ->
+      map @a_aGa @b_aGb
 
 -- RHS size: {terms: 1, types: 0, coercions: 3, joins: 0/0}
-Foo.$fMyFunctor[] [InlPrag=CONLIKE] :: MyFunctor []
+Foo.$fMyFunctor[] [InlPrag=INLINE (sat-args=0)] :: MyFunctor []
 [LclIdX[DFunId(nt)],
  Arity=4,
- Unf=DFun: \ -> Foo.C:MyFunctor TYPE: [] $cmyfmap_aG0]
+ Str=<A><A><U><SU>,
+ Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=0,unsat_ok=False,boring_ok=True)
+         Tmpl= $cmyfmap_aG7
+               `cast` (Sym (Foo.N:MyFunctor[0] <[]>_N)
+                       :: (forall a b.
+                           (Domain [] a, Domain [] b) =>
+                           (a -> b) -> [a] -> [b])
+                          ~R# MyFunctor [])}]
 Foo.$fMyFunctor[]
-  = $cmyfmap_aG0
+  = $cmyfmap_aG7
     `cast` (Sym (Foo.N:MyFunctor[0] <[]>_N)
             :: (forall a b.
                 (Domain [] a, Domain [] b) =>
                 (a -> b) -> [a] -> [b])
                ~R# MyFunctor [])
 
--- RHS size: {terms: 114, types: 12, coercions: 0, joins: 0/1}
-$sshared_sHu :: Domain [] Int => [Int] -> [Int]
-[LclId, Arity=1]
-$sshared_sHu
-  = \ (irred_azD :: Domain [] Int) ->
-      let {
-        f_sHt :: [Int] -> [Int]
-        [LclId]
-        f_sHt
-          = myfmap
-              @[]
-              Foo.$fMyFunctor[]
-              @Int
-              @Int
-              irred_azD
-              irred_azD
-              GHC.Num.$fNumInt_$cnegate } in
-      \ (x_X4N :: [Int]) ->
-        f_sHt
-          (f_sHt
-             (f_sHt
-                (f_sHt
-                   (f_sHt
-                      (f_sHt
-                         (f_sHt
-                            (f_sHt
-                               (f_sHt
-                                  (f_sHt
-                                     (f_sHt
-                                        (f_sHt
-                                           (f_sHt
-                                              (f_sHt
-                                                 (f_sHt
-                                                    (f_sHt
-                                                       (f_sHt
-                                                          (f_sHt
-                                                             (f_sHt
-                                                                (f_sHt
-                                                                   (f_sHt
-                                                                      (f_sHt
-                                                                         (f_sHt
-                                                                            (f_sHt
-                                                                               (f_sHt
-                                                                                  (f_sHt
-                                                                                     (f_sHt
-                                                                                        (f_sHt
-                                                                                           (f_sHt
-                                                                                              (f_sHt
-                                                                                                 (f_sHt
-                                                                                                    (f_sHt
-                                                                                                       (f_sHt
-                                                                                                          (f_sHt
-                                                                                                             (f_sHt
-                                                                                                                (f_sHt
-                                                                                                                   (f_sHt
-                                                                                                                      (f_sHt
-                                                                                                                         (f_sHt
-                                                                                                                            (f_sHt
-                                                                                                                               (f_sHt
-                                                                                                                                  (f_sHt
-                                                                                                                                     (f_sHt
-                                                                                                                                        (f_sHt
-                                                                                                                                           (f_sHt
-                                                                                                                                              (f_sHt
-                                                                                                                                                 (f_sHt
-                                                                                                                                                    (f_sHt
-                                                                                                                                                       (f_sHt
-                                                                                                                                                          (f_sHt
-                                                                                                                                                             (f_sHt
-                                                                                                                                                                (f_sHt
-                                                                                                                                                                   (f_sHt
-                                                                                                                                                                      (f_sHt
-                                                                                                                                                                         (f_sHt
-                                                                                                                                                                            (f_sHt
-                                                                                                                                                                               (f_sHt
-                                                                                                                                                                                  (f_sHt
-                                                                                                                                                                                     (f_sHt
-                                                                                                                                                                                        (f_sHt
-                                                                                                                                                                                           (f_sHt
-                                                                                                                                                                                              (f_sHt
-                                                                                                                                                                                                 (f_sHt
-                                                                                                                                                                                                    (f_sHt
-                                                                                                                                                                                                       (f_sHt
-                                                                                                                                                                                                          (f_sHt
-                                                                                                                                                                                                             (f_sHt
-                                                                                                                                                                                                                (f_sHt
-                                                                                                                                                                                                                   (f_sHt
-                                                                                                                                                                                                                      (f_sHt
-                                                                                                                                                                                                                         (f_sHt
-                                                                                                                                                                                                                            (f_sHt
-                                                                                                                                                                                                                               (f_sHt
-                                                                                                                                                                                                                                  (f_sHt
-                                                                                                                                                                                                                                     (f_sHt
-                                                                                                                                                                                                                                        (f_sHt
-                                                                                                                                                                                                                                           (f_sHt
-                                                                                                                                                                                                                                              (f_sHt
-                                                                                                                                                                                                                                                 (f_sHt
-                                                                                                                                                                                                                                                    (f_sHt
-                                                                                                                                                                                                                                                       (f_sHt
-                                                                                                                                                                                                                                                          (f_sHt
-                                                                                                                                                                                                                                                             (f_sHt
-                                                                                                                                                                                                                                                                (f_sHt
-                                                                                                                                                                                                                                                                   (f_sHt
-                                                                                                                                                                                                                                                                      (f_sHt
-                                                                                                                                                                                                                                                                         (f_sHt
-                                                                                                                                                                                                                                                                            (f_sHt
-                                                                                                                                                                                                                                                                               (f_sHt
-                                                                                                                                                                                                                                                                                  (f_sHt
-                                                                                                                                                                                                                                                                                     (f_sHt
-                                                                                                                                                                                                                                                                                        (f_sHt
-                                                                                                                                                                                                                                                                                           (f_sHt
-                                                                                                                                                                                                                                                                                              (f_sHt
-                                                                                                                                                                                                                                                                                                 (f_sHt
-                                                                                                                                                                                                                                                                                                    (f_sHt
-                                                                                                                                                                                                                                                                                                       (f_sHt
-                                                                                                                                                                                                                                                                                                          (f_sHt
-                                                                                                                                                                                                                                                                                                             (f_sHt
-                                                                                                                                                                                                                                                                                                                (f_sHt
-                                                                                                                                                                                                                                                                                                                   (f_sHt
-                                                                                                                                                                                                                                                                                                                      (f_sHt
-                                                                                                                                                                                                                                                                                                                         (f_sHt
-                                                                                                                                                                                                                                                                                                                            (f_sHt
-                                                                                                                                                                                                                                                                                                                               (f_sHt
-                                                                                                                                                                                                                                                                                                                                  x_X4N))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
+-- RHS size: {terms: 2, types: 2, coercions: 0, joins: 0/0}
+$sshared_sHD :: [Int] -> [Int]
+[LclId,
+ Arity=1,
+ Str=<SU>,
+ Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=False)
+         Tmpl= map @Int @Int GHC.Num.$fNumInt_$cnegate}]
+$sshared_sHD = map @Int @Int GHC.Num.$fNumInt_$cnegate
 
--- RHS size: {terms: 116, types: 16, coercions: 0, joins: 0/1}
+-- RHS size: {terms: 115, types: 15, coercions: 2, joins: 0/1}
 shared
   :: forall (f :: * -> *).
      (MyFunctor f, Domain f Int) =>
      f Int -> f Int
 [LclIdX,
  Arity=2,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=NEVER},
+ Str=<UC1(CS(CS(U)))><U>,
  RULES: "SPEC shared @[]"
-            forall ($dMyFunctor_sHr :: MyFunctor []).
-              shared @[] $dMyFunctor_sHr
-              = $sshared_sHu]
+            forall ($dMyFunctor_sHz :: MyFunctor [])
+                   (irred_sHA :: Domain [] Int).
+              shared @[] $dMyFunctor_sHz irred_sHA
+              = $sshared_sHD]
 shared
-  = \ (@(f_azB :: * -> *))
-      ($dMyFunctor_azC :: MyFunctor f_azB)
-      (irred_azD :: Domain f_azB Int) ->
+  = \ (@(f_ayh :: * -> *))
+      ($dMyFunctor_ayi [Dmd=UC1(CS(CS(U)))] :: MyFunctor f_ayh)
+      (irred_ayj :: Domain f_ayh Int) ->
       let {
-        f_sHq :: f_azB Int -> f_azB Int
+        f_sHy :: f_ayh Int -> f_ayh Int
         [LclId]
-        f_sHq
-          = myfmap
-              @f_azB
-              $dMyFunctor_azC
-              @Int
-              @Int
-              irred_azD
-              irred_azD
-              GHC.Num.$fNumInt_$cnegate } in
-      \ (x_X4N :: f_azB Int) ->
-        f_sHq
-          (f_sHq
-             (f_sHq
-                (f_sHq
-                   (f_sHq
-                      (f_sHq
-                         (f_sHq
-                            (f_sHq
-                               (f_sHq
-                                  (f_sHq
-                                     (f_sHq
-                                        (f_sHq
-                                           (f_sHq
-                                              (f_sHq
-                                                 (f_sHq
-                                                    (f_sHq
-                                                       (f_sHq
-                                                          (f_sHq
-                                                             (f_sHq
-                                                                (f_sHq
-                                                                   (f_sHq
-                                                                      (f_sHq
-                                                                         (f_sHq
-                                                                            (f_sHq
-                                                                               (f_sHq
-                                                                                  (f_sHq
-                                                                                     (f_sHq
-                                                                                        (f_sHq
-                                                                                           (f_sHq
-                                                                                              (f_sHq
-                                                                                                 (f_sHq
-                                                                                                    (f_sHq
-                                                                                                       (f_sHq
-                                                                                                          (f_sHq
-                                                                                                             (f_sHq
-                                                                                                                (f_sHq
-                                                                                                                   (f_sHq
-                                                                                                                      (f_sHq
-                                                                                                                         (f_sHq
-                                                                                                                            (f_sHq
-                                                                                                                               (f_sHq
-                                                                                                                                  (f_sHq
-                                                                                                                                     (f_sHq
-                                                                                                                                        (f_sHq
-                                                                                                                                           (f_sHq
-                                                                                                                                              (f_sHq
-                                                                                                                                                 (f_sHq
-                                                                                                                                                    (f_sHq
-                                                                                                                                                       (f_sHq
-                                                                                                                                                          (f_sHq
-                                                                                                                                                             (f_sHq
-                                                                                                                                                                (f_sHq
-                                                                                                                                                                   (f_sHq
-                                                                                                                                                                      (f_sHq
-                                                                                                                                                                         (f_sHq
-                                                                                                                                                                            (f_sHq
-                                                                                                                                                                               (f_sHq
-                                                                                                                                                                                  (f_sHq
-                                                                                                                                                                                     (f_sHq
-                                                                                                                                                                                        (f_sHq
-                                                                                                                                                                                           (f_sHq
-                                                                                                                                                                                              (f_sHq
-                                                                                                                                                                                                 (f_sHq
-                                                                                                                                                                                                    (f_sHq
-                                                                                                                                                                                                       (f_sHq
-                                                                                                                                                                                                          (f_sHq
-                                                                                                                                                                                                             (f_sHq
-                                                                                                                                                                                                                (f_sHq
-                                                                                                                                                                                                                   (f_sHq
-                                                                                                                                                                                                                      (f_sHq
-                                                                                                                                                                                                                         (f_sHq
-                                                                                                                                                                                                                            (f_sHq
-                                                                                                                                                                                                                               (f_sHq
-                                                                                                                                                                                                                                  (f_sHq
-                                                                                                                                                                                                                                     (f_sHq
-                                                                                                                                                                                                                                        (f_sHq
-                                                                                                                                                                                                                                           (f_sHq
-                                                                                                                                                                                                                                              (f_sHq
-                                                                                                                                                                                                                                                 (f_sHq
-                                                                                                                                                                                                                                                    (f_sHq
-                                                                                                                                                                                                                                                       (f_sHq
-                                                                                                                                                                                                                                                          (f_sHq
-                                                                                                                                                                                                                                                             (f_sHq
-                                                                                                                                                                                                                                                                (f_sHq
-                                                                                                                                                                                                                                                                   (f_sHq
-                                                                                                                                                                                                                                                                      (f_sHq
-                                                                                                                                                                                                                                                                         (f_sHq
-                                                                                                                                                                                                                                                                            (f_sHq
-                                                                                                                                                                                                                                                                               (f_sHq
-                                                                                                                                                                                                                                                                                  (f_sHq
-                                                                                                                                                                                                                                                                                     (f_sHq
-                                                                                                                                                                                                                                                                                        (f_sHq
-                                                                                                                                                                                                                                                                                           (f_sHq
-                                                                                                                                                                                                                                                                                              (f_sHq
-                                                                                                                                                                                                                                                                                                 (f_sHq
-                                                                                                                                                                                                                                                                                                    (f_sHq
-                                                                                                                                                                                                                                                                                                       (f_sHq
-                                                                                                                                                                                                                                                                                                          (f_sHq
-                                                                                                                                                                                                                                                                                                             (f_sHq
-                                                                                                                                                                                                                                                                                                                (f_sHq
-                                                                                                                                                                                                                                                                                                                   (f_sHq
-                                                                                                                                                                                                                                                                                                                      (f_sHq
-                                                                                                                                                                                                                                                                                                                         (f_sHq
-                                                                                                                                                                                                                                                                                                                            (f_sHq
-                                                                                                                                                                                                                                                                                                                               (f_sHq
+        f_sHy
+          = ($dMyFunctor_ayi
+             `cast` (Foo.N:MyFunctor[0] <f_ayh>_N
+                     :: MyFunctor f_ayh
+                        ~R# (forall a b.
+                             (Domain f_ayh a, Domain f_ayh b) =>
+                             (a -> b) -> f_ayh a -> f_ayh b)))
+              @Int @Int irred_ayj irred_ayj GHC.Num.$fNumInt_$cnegate } in
+      \ (x_X4N :: f_ayh Int) ->
+        f_sHy
+          (f_sHy
+             (f_sHy
+                (f_sHy
+                   (f_sHy
+                      (f_sHy
+                         (f_sHy
+                            (f_sHy
+                               (f_sHy
+                                  (f_sHy
+                                     (f_sHy
+                                        (f_sHy
+                                           (f_sHy
+                                              (f_sHy
+                                                 (f_sHy
+                                                    (f_sHy
+                                                       (f_sHy
+                                                          (f_sHy
+                                                             (f_sHy
+                                                                (f_sHy
+                                                                   (f_sHy
+                                                                      (f_sHy
+                                                                         (f_sHy
+                                                                            (f_sHy
+                                                                               (f_sHy
+                                                                                  (f_sHy
+                                                                                     (f_sHy
+                                                                                        (f_sHy
+                                                                                           (f_sHy
+                                                                                              (f_sHy
+                                                                                                 (f_sHy
+                                                                                                    (f_sHy
+                                                                                                       (f_sHy
+                                                                                                          (f_sHy
+                                                                                                             (f_sHy
+                                                                                                                (f_sHy
+                                                                                                                   (f_sHy
+                                                                                                                      (f_sHy
+                                                                                                                         (f_sHy
+                                                                                                                            (f_sHy
+                                                                                                                               (f_sHy
+                                                                                                                                  (f_sHy
+                                                                                                                                     (f_sHy
+                                                                                                                                        (f_sHy
+                                                                                                                                           (f_sHy
+                                                                                                                                              (f_sHy
+                                                                                                                                                 (f_sHy
+                                                                                                                                                    (f_sHy
+                                                                                                                                                       (f_sHy
+                                                                                                                                                          (f_sHy
+                                                                                                                                                             (f_sHy
+                                                                                                                                                                (f_sHy
+                                                                                                                                                                   (f_sHy
+                                                                                                                                                                      (f_sHy
+                                                                                                                                                                         (f_sHy
+                                                                                                                                                                            (f_sHy
+                                                                                                                                                                               (f_sHy
+                                                                                                                                                                                  (f_sHy
+                                                                                                                                                                                     (f_sHy
+                                                                                                                                                                                        (f_sHy
+                                                                                                                                                                                           (f_sHy
+                                                                                                                                                                                              (f_sHy
+                                                                                                                                                                                                 (f_sHy
+                                                                                                                                                                                                    (f_sHy
+                                                                                                                                                                                                       (f_sHy
+                                                                                                                                                                                                          (f_sHy
+                                                                                                                                                                                                             (f_sHy
+                                                                                                                                                                                                                (f_sHy
+                                                                                                                                                                                                                   (f_sHy
+                                                                                                                                                                                                                      (f_sHy
+                                                                                                                                                                                                                         (f_sHy
+                                                                                                                                                                                                                            (f_sHy
+                                                                                                                                                                                                                               (f_sHy
+                                                                                                                                                                                                                                  (f_sHy
+                                                                                                                                                                                                                                     (f_sHy
+                                                                                                                                                                                                                                        (f_sHy
+                                                                                                                                                                                                                                           (f_sHy
+                                                                                                                                                                                                                                              (f_sHy
+                                                                                                                                                                                                                                                 (f_sHy
+                                                                                                                                                                                                                                                    (f_sHy
+                                                                                                                                                                                                                                                       (f_sHy
+                                                                                                                                                                                                                                                          (f_sHy
+                                                                                                                                                                                                                                                             (f_sHy
+                                                                                                                                                                                                                                                                (f_sHy
+                                                                                                                                                                                                                                                                   (f_sHy
+                                                                                                                                                                                                                                                                      (f_sHy
+                                                                                                                                                                                                                                                                         (f_sHy
+                                                                                                                                                                                                                                                                            (f_sHy
+                                                                                                                                                                                                                                                                               (f_sHy
+                                                                                                                                                                                                                                                                                  (f_sHy
+                                                                                                                                                                                                                                                                                     (f_sHy
+                                                                                                                                                                                                                                                                                        (f_sHy
+                                                                                                                                                                                                                                                                                           (f_sHy
+                                                                                                                                                                                                                                                                                              (f_sHy
+                                                                                                                                                                                                                                                                                                 (f_sHy
+                                                                                                                                                                                                                                                                                                    (f_sHy
+                                                                                                                                                                                                                                                                                                       (f_sHy
+                                                                                                                                                                                                                                                                                                          (f_sHy
+                                                                                                                                                                                                                                                                                                             (f_sHy
+                                                                                                                                                                                                                                                                                                                (f_sHy
+                                                                                                                                                                                                                                                                                                                   (f_sHy
+                                                                                                                                                                                                                                                                                                                      (f_sHy
+                                                                                                                                                                                                                                                                                                                         (f_sHy
+                                                                                                                                                                                                                                                                                                                            (f_sHy
+                                                                                                                                                                                                                                                                                                                               (f_sHy
                                                                                                                                                                                                                                                                                                                                   x_X4N))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
 
--- RHS size: {terms: 8, types: 4, coercions: 4, joins: 0/0}
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+lvl_sHI :: Int
+[LclId]
+lvl_sHI = GHC.Types.I# 0#
+
+-- RHS size: {terms: 6, types: 5, coercions: 0, joins: 0/0}
 foo :: [Int] -> [Int]
 [LclIdX,
  Arity=1,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 60 0}]
+ Str=<U>,
+ Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+         Tmpl= \ (xs_awV [Occ=Once1] :: [Int]) ->
+                 GHC.Base.build
+                   @Int
+                   (\ (@b1_aHe)
+                      (c_aHf [Occ=Once1, OS=OneShot] :: Int -> b1_aHe -> b1_aHe)
+                      (n_aHg [Occ=Once1, OS=OneShot] :: b1_aHe) ->
+                      GHC.Base.foldr
+                        @Int
+                        @b1_aHe
+                        (GHC.Base.mapFB @Int @b1_aHe @Int c_aHf GHC.Num.$fNumInt_$cnegate)
+                        n_aHg
+                        (GHC.Types.: @Int lvl_sHI xs_awV))}]
 foo
-  = \ (xs_axd :: [Int]) ->
-      shared
-        @[]
-        Foo.$fMyFunctor[]
-        (GHC.Classes.(%%)
-         `cast` (Sub (Sym (Foo.D:R:Domain[]a[0] <Int>_N))
-                 :: (() :: Constraint) ~R# Domain [] Int))
-        (GHC.Types.: @Int (GHC.Types.I# 0#) xs_axd)
+  = \ (xs_awV :: [Int]) ->
+      map
+        @Int
+        @Int
+        GHC.Num.$fNumInt_$cnegate
+        (GHC.Types.: @Int lvl_sHI xs_awV)
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+lvl_sHJ :: Int
+[LclId]
+lvl_sHJ = lvl_sHI
 
--- RHS size: {terms: 8, types: 4, coercions: 4, joins: 0/0}
+-- RHS size: {terms: 6, types: 5, coercions: 0, joins: 0/0}
 bar :: [Int] -> [Int]
 [LclIdX,
  Arity=1,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 60 10}]
+ Str=<1U>,
+ Cpr=m2,
+ Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+         Tmpl= \ (xs_awW [Occ=Once1] :: [Int]) ->
+                 GHC.Types.:
+                   @Int
+                   lvl_sHI
+                   (GHC.Base.build
+                      @Int
+                      (\ (@b1_aHe)
+                         (c_aHf [Occ=Once1, OS=OneShot] :: Int -> b1_aHe -> b1_aHe)
+                         (n_aHg [Occ=Once1, OS=OneShot] :: b1_aHe) ->
+                         GHC.Base.foldr
+                           @Int
+                           @b1_aHe
+                           (GHC.Base.mapFB @Int @b1_aHe @Int c_aHf GHC.Num.$fNumInt_$cnegate)
+                           n_aHg
+                           xs_awW))}]
 bar
-  = \ (xs_axe :: [Int]) ->
+  = \ (xs_awW :: [Int]) ->
       GHC.Types.:
-        @Int
-        (GHC.Types.I# 0#)
-        (shared
-           @[]
-           Foo.$fMyFunctor[]
-           (GHC.Classes.(%%)
-            `cast` (Sub (Sym (Foo.D:R:Domain[]a[0] <Int>_N))
-                    :: (() :: Constraint) ~R# Domain [] Int))
-           xs_axe)
+        @Int lvl_sHI (map @Int @Int GHC.Num.$fNumInt_$cnegate xs_awW)
 
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
-$trModule_sHj :: GHC.Prim.Addr#
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
-$trModule_sHj = "main"#
+$trModule_sHr :: GHC.Prim.Addr#
+[LclId]
+$trModule_sHr = "main"#
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
-$trModule_sHk :: GHC.Types.TrName
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-$trModule_sHk = GHC.Types.TrNameS $trModule_sHj
+$trModule_sHs :: GHC.Types.TrName
+[LclId]
+$trModule_sHs = GHC.Types.TrNameS $trModule_sHr
 
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
-$trModule_sHl :: GHC.Prim.Addr#
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
-$trModule_sHl = "Foo"#
+$trModule_sHt :: GHC.Prim.Addr#
+[LclId]
+$trModule_sHt = "Foo"#
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
-$trModule_sHm :: GHC.Types.TrName
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-$trModule_sHm = GHC.Types.TrNameS $trModule_sHl
+$trModule_sHu :: GHC.Types.TrName
+[LclId]
+$trModule_sHu = GHC.Types.TrNameS $trModule_sHt
 
 -- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
 Foo.$trModule :: GHC.Types.Module
-[LclIdX,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-Foo.$trModule = GHC.Types.Module $trModule_sHk $trModule_sHm
+[LclIdX]
+Foo.$trModule = GHC.Types.Module $trModule_sHs $trModule_sHu
 
 -- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
-$krep_aGA [InlPrag=[~]] :: GHC.Types.KindRep
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-$krep_aGA
+$krep_aGF [InlPrag=[~]] :: GHC.Types.KindRep
+[LclId]
+$krep_aGF
   = GHC.Types.KindRepTyConApp
       GHC.Types.$tcConstraint (GHC.Types.[] @GHC.Types.KindRep)
 
 -- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
-$krep_aGz [InlPrag=[~]] :: GHC.Types.KindRep
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-$krep_aGz = GHC.Types.KindRepFun GHC.Types.krep$*Arr* $krep_aGA
+$krep_aGE [InlPrag=[~]] :: GHC.Types.KindRep
+[LclId]
+$krep_aGE = GHC.Types.KindRepFun GHC.Types.krep$*Arr* $krep_aGF
 
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
-$tcMyFunctor_sHn :: GHC.Prim.Addr#
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 40 0}]
-$tcMyFunctor_sHn = "MyFunctor"#
+$tcMyFunctor_sHv :: GHC.Prim.Addr#
+[LclId]
+$tcMyFunctor_sHv = "MyFunctor"#
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
-$tcMyFunctor_sHo :: GHC.Types.TrName
-[LclId,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
-$tcMyFunctor_sHo = GHC.Types.TrNameS $tcMyFunctor_sHn
+$tcMyFunctor_sHw :: GHC.Types.TrName
+[LclId]
+$tcMyFunctor_sHw = GHC.Types.TrNameS $tcMyFunctor_sHv
 
 -- RHS size: {terms: 7, types: 0, coercions: 0, joins: 0/0}
 Foo.$tcMyFunctor :: GHC.Types.TyCon
-[LclIdX,
- Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
-         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+[LclIdX]
 Foo.$tcMyFunctor
   = GHC.Types.TyCon
-      12837160846121910345##64
-      787075802864859973##64
+      12837160846121910345##
+      787075802864859973##
       Foo.$trModule
-      $tcMyFunctor_sHo
+      $tcMyFunctor_sHw
       0#
-      $krep_aGz
-
-
-
+      $krep_aGE
diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T
index b92f24cd5b..5a018cdb2d 100644
--- a/testsuite/tests/simplCore/should_compile/all.T
+++ b/testsuite/tests/simplCore/should_compile/all.T
@@ -136,9 +136,14 @@ test('T5366',
 test('T7796', [], makefile_test, ['T7796'])
 test('T5550', omit_ways(prof_ways), compile, [''])
 test('T7865', normal, makefile_test, ['T7865'])
-# T7785: Check that we generate the specialising RULE. Might not be listed in
-#        -ddump-rules because of Note [Trimming auto-rules], hence grep
-test('T7785', [ only_ways(['optasm']), grep_errmsg(r'RULE') ], compile, ['-ddump-spec'])
+
+# T7785: we want to check that we specialise 'shared'. But Tidy discards the
+# rule (see Note [Trimming auto-rules] in GHC.Iface.Tidy)
+# So, rather arbitrarily, we dump the output of CSE and grep for SPEC
+test('T7785', [ only_ways(['optasm']),
+               grep_errmsg(r'SPEC') ],
+      compile, ['-ddump-cse'])
+
 test('T7702',
      [extra_files(['T7702plugin']),
       pre_cmd('$MAKE -s --no-print-directory -C T7702plugin package.T7702 TOP={top}'),
diff --git a/testsuite/tests/simplCore/should_run/T18012.hs b/testsuite/tests/simplCore/should_run/T18012.hs
index 9118b75ff4..9ce1f1fb9d 100644
--- a/testsuite/tests/simplCore/should_run/T18012.hs
+++ b/testsuite/tests/simplCore/should_run/T18012.hs
@@ -32,10 +32,10 @@ notRule x = x
 {-# INLINE [0] notRule #-}
 {-# RULES "notRule/False" [~0] notRule False = True #-}
 
-f :: T -> () -> Bool
-f (D a) () = notRule a
+f :: () -> T -> Bool
+f () (D a) = notRule a
 {-# INLINE [100] f #-} -- so it isn’t inlined before FloatOut
 
 g :: () -> Bool
-g x = f (D False) x
+g x = f x (D False)
 {-# NOINLINE g #-}
diff --git a/testsuite/tests/simplCore/should_run/T19569a.hs b/testsuite/tests/simplCore/should_run/T19569a.hs
index bffef2c6df..a732e1f81f 100644
--- a/testsuite/tests/simplCore/should_run/T19569a.hs
+++ b/testsuite/tests/simplCore/should_run/T19569a.hs
@@ -3,6 +3,11 @@
 -- so I added it to testsuite to catch such regressions in the future.
 
 -- It might be acceptable for this test to fail if you make changes to the simplifier. But generally such a failure shouldn't be accepted without good reason.
+--
+-- For example, one of the numerical instabilities was/is caused by a rewrite rule
+-- in GHC.Real which rewrites powers with small exponents. See !8082, changes in the
+-- simplifier caused this rewrite rule to trigger (or not) which then produced different
+-- results.
 
 -- The excessive whitespace is the result of running the original benchmark which was a .lhs file through unlit.
 
diff --git a/testsuite/tests/simplCore/should_run/all.T b/testsuite/tests/simplCore/should_run/all.T
index 53bcde5169..509ae1ff57 100644
--- a/testsuite/tests/simplCore/should_run/all.T
+++ b/testsuite/tests/simplCore/should_run/all.T
@@ -97,7 +97,9 @@ test('NumConstantFolding16', normal, compile_and_run, [''])
 test('NumConstantFolding32', normal, compile_and_run, [''])
 test('NumConstantFolding', normal, compile_and_run, [''])
 test('T19413', normal, compile_and_run, [''])
+
 test('T19569a', [only_ways(['optasm']),extra_run_opts('True 1000000')], compile_and_run, ['-O2'])
+
 test('T20203', normal, compile, ['-O -dsuppress-all -dsuppress-uniques -dno-typeable-binds -ddump-simpl'])
 test('T19313', normal, compile_and_run, [''])
 test('UnliftedArgRule', normal, compile_and_run, [''])
diff --git a/testsuite/tests/stranal/should_compile/EtaExpansion.hs b/testsuite/tests/stranal/should_compile/EtaExpansion.hs
new file mode 100644
index 0000000000..0558adac0b
--- /dev/null
+++ b/testsuite/tests/stranal/should_compile/EtaExpansion.hs
@@ -0,0 +1,13 @@
+module Foo( wombat ) where
+
+-- We expect to eta-expand f to arity 2, but not to arity 3
+-- See Note [Bottoming bindings] in GHC.Core.Opt.Simplify
+f :: String -> Int -> Int -> Int
+{-# NOINLINE f #-}
+f s = error s
+
+g :: (Int -> Int -> Int) -> Maybe Int
+{-# NOINLINE g #-}
+g h = let h1 = h 2 in Just (h1 2 + h1 3)
+
+wombat s = g (f s)
diff --git a/testsuite/tests/stranal/should_compile/T18894b.hs b/testsuite/tests/stranal/should_compile/T18894b.hs
index e90f34e3fd..99a4bf954d 100644
--- a/testsuite/tests/stranal/should_compile/T18894b.hs
+++ b/testsuite/tests/stranal/should_compile/T18894b.hs
@@ -17,4 +17,14 @@ f :: Int -> Int
 f 1 = 0
 f m
   | odd m     = eta m 2
-  | otherwise = eta 2 m
+  | otherwise = eta m m
+
+{-
+An earlier version of this test had (eta 2 m) in the otherwise case.
+But then (eta 2) could be floated out; and indeed if 'f' is applied
+many times, then sharing (eta 2) might be good.  And if we inlined
+eta, we certainly would share (expensive 2).
+
+So I made the test more robust at testing what we actually want here,
+by changing to (eta m m).
+-}
diff --git a/testsuite/tests/stranal/should_compile/all.T b/testsuite/tests/stranal/should_compile/all.T
index e9ae6e11ba..02428987fc 100644
--- a/testsuite/tests/stranal/should_compile/all.T
+++ b/testsuite/tests/stranal/should_compile/all.T
@@ -84,3 +84,4 @@ test('T21150', [ grep_errmsg(r'( t\d? :: Int)') ], compile, ['-dsuppress-uniques
 # T21128: Check that y is not reboxed in $wtheresCrud
 test('T21128', [ grep_errmsg(r'let { y = I\#') ], multimod_compile, ['T21128', '-v0 -dsuppress-uniques -dsuppress-all -ddump-simpl'])
 test('T21265', normal, compile, [''])
+test('EtaExpansion', normal, compile, [''])
diff --git a/testsuite/tests/stranal/sigs/BottomFromInnerLambda.stderr b/testsuite/tests/stranal/sigs/BottomFromInnerLambda.stderr
index 1c944f8520..8784af67b7 100644
--- a/testsuite/tests/stranal/sigs/BottomFromInnerLambda.stderr
+++ b/testsuite/tests/stranal/sigs/BottomFromInnerLambda.stderr
@@ -1,18 +1,18 @@
 
 ==================== Strictness signatures ====================
 BottomFromInnerLambda.expensive: <1!P(SL)>
-BottomFromInnerLambda.f: <1!P(SL)>
+BottomFromInnerLambda.f: <1!S><1!S>b
 
 
 
 ==================== Cpr signatures ====================
 BottomFromInnerLambda.expensive: 1
-BottomFromInnerLambda.f:
+BottomFromInnerLambda.f: b
 
 
 
 ==================== Strictness signatures ====================
 BottomFromInnerLambda.expensive: <1!P(1L)>
-BottomFromInnerLambda.f: <1!P(1L)>
+BottomFromInnerLambda.f: <1!P(1!S)><1!S>b
 
 
diff --git a/testsuite/tests/stranal/sigs/T20746.stderr b/testsuite/tests/stranal/sigs/T20746.stderr
index 5be614867a..2b54d3b8ff 100644
--- a/testsuite/tests/stranal/sigs/T20746.stderr
+++ b/testsuite/tests/stranal/sigs/T20746.stderr
@@ -1,6 +1,6 @@
 
 ==================== Strictness signatures ====================
-Foo.f: <MP(A,1C1(L),A)><L>
+Foo.f: <LP(A,SCS(L),A)><L>
 Foo.foogle: <L><L>
 
 
diff --git a/utils/genprimopcode/Main.hs b/utils/genprimopcode/Main.hs
index bf1ed76f9e..6984f4a296 100644
--- a/utils/genprimopcode/Main.hs
+++ b/utils/genprimopcode/Main.hs
@@ -417,18 +417,41 @@ asInfix :: String -> String
 asInfix nm | isAlpha (head nm) = "`" ++ nm ++ "`"
            | otherwise         = nm
 
+
+{- Note [OPTIONS_GHC in GHC.PrimopWrappers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In PrimopWrappers we set some crucial GHC options
+
+* Eta reduction: -fno-do-eta-reduction
+  In PrimopWrappers we builds a wrapper for each primop, thus
+      plusInt# = \a b. plusInt# a b
+  That's a pretty odd definition, becaues it looks recursive. What
+  actually happens is that it makes a curried, top-level bindings for
+  `plusInt#`.  When we compile PrimopWrappers, the code generator spots
+  (plusInt# a b) and generates an add instruction.
+
+  Its very important that we don't eta-reduce this to
+      plusInt# = plusInt#
+  because then the special rule in the code generator doesn't fire.
+
+* Worker-wrapper: performing WW on this module is harmful even, two reasons:
+  1. Inferred strictness signatures are all bottom (because of the apparent
+     recursion), which is a lie
+  2. Doing the worker/wrapper split based on that information will
+     introduce references to absentError, which isn't available at
+     this point.
+
+  We prevent strictness analyis and w/w by simply doing -O0.  It's
+  a very simple module and there is no optimisation to be done
+-}
+
 gen_wrappers :: Info -> String
 gen_wrappers (Info _ entries)
    =    "{-# LANGUAGE MagicHash, NoImplicitPrelude, UnboxedTuples #-}\n"
         -- Dependencies on Prelude must be explicit in libraries/base, but we
         -- don't need the Prelude here so we add NoImplicitPrelude.
-     ++ "{-# OPTIONS_GHC -Wno-deprecations -O0 #-}\n"
-        -- No point in optimising this at all.
-        -- Performing WW on this module is harmful even, two reasons:
-        --   1. Inferred strictness signatures are all bottom, which is a lie
-        --   2. Doing the worker/wrapper split based on that information will
-        --      introduce references to absentError,
-        --      which isn't available at this point.
+     ++ "{-# OPTIONS_GHC -Wno-deprecations -O0 -fno-do-eta-reduction #-}\n"
+        -- Very important OPTIONS_GHC!  See Note [OPTIONS_GHC in GHC.PrimopWrappers]
      ++ "module GHC.PrimopWrappers where\n"
      ++ "import qualified GHC.Prim\n"
      ++ "import GHC.Tuple ()\n"
author	Simon Peyton Jones <simonpj@microsoft.com>	2021-07-23 23:57:01 +0100
committer	Matthew Pickering <matthewtpickering@gmail.com>	2022-05-27 15:16:30 +0100
commit	30d4cf24dc8098b140893992863c31e3a1937d26 (patch)
tree	ca979794290acbe5b298e313d574fb807c2a8fbb
parent	6cfe39b9d325b0656c74de0b5f5ff039ced98059 (diff)
download	haskell-wip/T18993b.tar.gz