diff options
Diffstat (limited to 'compiler/stranal')
-rw-r--r-- | compiler/stranal/DmdAnal.hs | 8 | ||||
-rw-r--r-- | compiler/stranal/WorkWrap.hs | 153 | ||||
-rw-r--r-- | compiler/stranal/WwLib.hs | 115 |
3 files changed, 177 insertions, 99 deletions
diff --git a/compiler/stranal/DmdAnal.hs b/compiler/stranal/DmdAnal.hs index 49912413e4..b606804079 100644 --- a/compiler/stranal/DmdAnal.hs +++ b/compiler/stranal/DmdAnal.hs @@ -13,6 +13,8 @@ module DmdAnal ( dmdAnalProgram ) where #include "HsVersions.h" +import GhcPrelude + import DynFlags import WwLib ( findTypeShape, deepSplitProductType_maybe ) import Demand -- All of it @@ -399,7 +401,7 @@ situation actually arises in GHC.IO.Handle.Internals.wantReadableHandle So if the scrutinee is a primop call, we *don't* apply the state hack: - - If is a simple, terminating one like getMaskingState, + - If it is a simple, terminating one like getMaskingState, applying the hack is over-conservative. - If the primop is raise# then it returns bottom, so the case alternatives are already discarded. @@ -642,7 +644,7 @@ dmdAnalRhsLetDown top_lvl rec_flag env let_dmd id rhs Nothing | (bndrs, body) <- collectBinders rhs -> (bndrs, body, mkBodyDmd env body) - env_body = foldl extendSigsWithLam env bndrs + env_body = foldl' extendSigsWithLam env bndrs (body_ty, body') = dmdAnal env_body body_dmd body body_ty' = removeDmdTyArgs body_ty -- zap possible deep CPR info (DmdType rhs_fv rhs_dmds rhs_res, bndrs') @@ -1191,7 +1193,7 @@ extendSigsWithLam env id extendEnvForProdAlt :: AnalEnv -> CoreExpr -> Id -> DataCon -> [Var] -> AnalEnv -- See Note [CPR in a product case alternative] extendEnvForProdAlt env scrut case_bndr dc bndrs - = foldl do_con_arg env1 ids_w_strs + = foldl' do_con_arg env1 ids_w_strs where env1 = extendAnalEnv NotTopLevel env case_bndr case_bndr_sig diff --git a/compiler/stranal/WorkWrap.hs b/compiler/stranal/WorkWrap.hs index 9d741f5f4c..34cfd64ecd 100644 --- a/compiler/stranal/WorkWrap.hs +++ b/compiler/stranal/WorkWrap.hs @@ -7,6 +7,8 @@ {-# LANGUAGE CPP #-} module WorkWrap ( wwTopBinds ) where +import GhcPrelude + import CoreSyn import CoreUnfold ( certainlyWillInline, mkWwInlineRule, mkWorkerUnfolding ) import CoreUtils ( exprType, exprIsHNF ) @@ -180,7 +182,7 @@ If we have where f is strict in y, we might get a more efficient loop by w/w'ing f. But that would make a new unfolding which would overwrite the old -one! So the function would no longer be ININABLE, and in particular +one! So the function would no longer be INLNABLE, and in particular will not be specialised at call sites in other modules. This comes in practice (Trac #6056). @@ -230,7 +232,7 @@ has no wrapper, the worker for g will rebox p. So we get g x y p = case p of (I# p#) -> $wg x y p# -Now, in this case the reboxing will float into the True branch, an so +Now, in this case the reboxing will float into the True branch, and so the allocation will only happen on the error path. But it won't float inwards if there are multiple branches that call (f p), so the reboxing will happen on every call of g. Disaster. @@ -240,8 +242,8 @@ NOINLINE pragma to the worker. (See Trac #13143 for a real-world example.) -Note [Activation for workers] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Note [Worker activation] +~~~~~~~~~~~~~~~~~~~~~~~~ Follows on from Note [Worker-wrapper for INLINABLE functions] It is *vital* that if the worker gets an INLINABLE pragma (from the @@ -258,7 +260,9 @@ original activation. Consider f y = let z = expensive y in ... -If expensive's worker inherits the wrapper's activation, we'll get +If expensive's worker inherits the wrapper's activation, +we'll get this (because of the compromise in point (2) of +Note [Wrapper activation]) {-# NOINLINE[0] $wexpensive #-} $wexpensive x = x + 1 @@ -344,36 +348,63 @@ call: Note [Wrapper activation] ~~~~~~~~~~~~~~~~~~~~~~~~~ -When should the wrapper inlining be active? It must not be active -earlier than the current Activation of the Id (eg it might have a -NOINLINE pragma). But in fact strictness analysis happens fairly -late in the pipeline, and we want to prioritise specialisations over -strictness. Eg if we have - module Foo where - f :: Num a => a -> Int -> a - f n 0 = n -- Strict in the Int, hence wrapper - f n x = f (n+n) (x-1) - - g :: Int -> Int - g x = f x x -- Provokes a specialisation for f - - module Bar where - import Foo - - h :: Int -> Int - h x = f 3 x - -Then we want the specialisation for 'f' to kick in before the wrapper does. - -Now in fact the 'gentle' simplification pass encourages this, by -having rules on, but inlinings off. But that's kind of lucky. It seems -more robust to give the wrapper an Activation of (ActiveAfter 0), -so that it becomes active in an importing module at the same time that -it appears in the first place in the defining module. - -At one stage I tried making the wrapper inlining always-active, and -that had a very bad effect on nofib/imaginary/x2n1; a wrapper was -inlined before the specialisation fired. +When should the wrapper inlining be active? + +1. It must not be active earlier than the current Activation of the + Id + +2. It should be active at some point, despite (1) because of + Note [Worker-wrapper for NOINLINE functions] + +3. For ordinary functions with no pragmas we want to inline the + wrapper as early as possible (Trac #15056). Suppose another module + defines f x = g x x + and suppose there is some RULE for (g True True). Then if we have + a call (f True), we'd expect to inline 'f' and the RULE will fire. + But if f is w/w'd (which it might be), we want the inlining to + occur just as if it hadn't been. + + (This only matters if f's RHS is big enough to w/w, but small + enough to inline given the call site, but that can happen.) + +4. We do not want to inline the wrapper before specialisation. + module Foo where + f :: Num a => a -> Int -> a + f n 0 = n -- Strict in the Int, hence wrapper + f n x = f (n+n) (x-1) + + g :: Int -> Int + g x = f x x -- Provokes a specialisation for f + + module Bar where + import Foo + + h :: Int -> Int + h x = f 3 x + + In module Bar we want to give specialisations a chance to fire + before inlining f's wrapper. + +Reminder: Note [Don't w/w INLINE things], so we don't need to worry + about INLINE things here. + +Conclusion: + - If the user said NOINLINE[n], respect that + - If the user said NOINLINE, inline the wrapper as late as + poss (phase 0). This is a compromise driven by (2) above + - Otherwise inline wrapper in phase 2. That allows the + 'gentle' simplification pass to apply specialisation rules + +Historical note: At one stage I tried making the wrapper inlining +always-active, and that had a very bad effect on nofib/imaginary/x2n1; +a wrapper was inlined before the specialisation fired. + +Note [Wrapper NoUserInline] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The use an inl_inline of NoUserInline on the wrapper distinguishes +this pragma from one that was given by the user. In particular, CSE +will not happen if there is a user-specified pragma, but should happen +for w/w’ed things (#14186). -} tryWW :: DynFlags @@ -463,29 +494,29 @@ splitFun :: DynFlags -> FamInstEnvs -> Id -> IdInfo -> [Demand] -> DmdResult -> splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr res_info) ) do -- The arity should match the signature - stuff <- mkWwBodies dflags fam_envs rhs_fvs mb_join_arity fun_ty - wrap_dmds use_res_info + stuff <- mkWwBodies dflags fam_envs rhs_fvs fn_id wrap_dmds use_res_info case stuff of Just (work_demands, join_arity, wrap_fn, work_fn) -> do work_uniq <- getUniqueM let work_rhs = work_fn rhs - work_inline = inl_inline inl_prag - work_act = case work_inline of - -- See Note [Activation for workers] - NoInline -> inl_act inl_prag - _ -> wrap_act + work_act = case fn_inline_spec of -- See Note [Worker activation] + NoInline -> fn_act + _ -> wrap_act + work_prag = InlinePragma { inl_src = SourceText "{-# INLINE" - , inl_inline = work_inline + , inl_inline = fn_inline_spec , inl_sat = Nothing , inl_act = work_act , inl_rule = FunLike } - -- idl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions] - -- idl_act: see Note [Activation for workers] - -- inl_rule: it does not make sense for workers to be constructorlike. + -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions] + -- inl_act: see Note [Worker activation] + -- inl_rule: it does not make sense for workers to be constructorlike. + work_join_arity | isJoinId fn_id = Just join_arity | otherwise = Nothing -- worker is join point iff wrapper is join point -- (see Note [Don't CPR join points]) + work_id = mkWorkerId work_uniq fn_id (exprType work_rhs) `setIdOccInfo` occInfo fn_info -- Copy over occurrence info from parent @@ -495,7 +526,7 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs `setInlinePragma` work_prag - `setIdUnfolding` mkWorkerUnfolding dflags work_fn (unfoldingInfo fn_info) + `setIdUnfolding` mkWorkerUnfolding dflags work_fn fn_unfolding -- See Note [Worker-wrapper for INLINABLE functions] `setIdStrictness` mkClosedStrictSig work_demands work_res_info @@ -517,18 +548,21 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs worker_demand | single_call = mkWorkerDemand work_arity | otherwise = topDmd - - wrap_act = ActiveAfter NoSourceText 0 wrap_rhs = wrap_fn work_id - wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE" - , inl_inline = Inline + wrap_act = case fn_act of -- See Note [Wrapper activation] + ActiveAfter {} -> fn_act + NeverActive -> activeDuringFinal + _ -> activeAfterInitial + wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE" + , inl_inline = NoUserInline , inl_sat = Nothing , inl_act = wrap_act , inl_rule = rule_match_info } - -- See Note [Wrapper activation] - -- The RuleMatchInfo is (and must be) unaffected + -- inl_act: see Note [Wrapper activation] + -- inl_inline: see Note [Wrapper NoUserInline] + -- inl_rule: RuleMatchInfo is (and must be) unaffected - wrap_id = fn_id `setIdUnfolding` mkWwInlineRule wrap_rhs arity + wrap_id = fn_id `setIdUnfolding` mkWwInlineRule dflags wrap_rhs arity `setInlinePragma` wrap_prag `setIdOccInfo` noOccInfo -- Zap any loop-breaker-ness, to avoid bleating from Lint @@ -541,11 +575,12 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs Nothing -> return [(fn_id, rhs)] where - mb_join_arity = isJoinId_maybe fn_id rhs_fvs = exprFreeVars rhs - fun_ty = idType fn_id - inl_prag = inlinePragInfo fn_info - rule_match_info = inlinePragmaRuleMatchInfo inl_prag + fn_inl_prag = inlinePragInfo fn_info + fn_inline_spec = inl_inline fn_inl_prag + fn_act = inl_act fn_inl_prag + rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag + fn_unfolding = unfoldingInfo fn_info arity = arityInfo fn_info -- The arity is set by the simplifier using exprEtaExpandArity -- So it may be more than the number of top-level-visible lambdas @@ -654,7 +689,7 @@ then the splitting will go deeper too. splitThunk :: DynFlags -> FamInstEnvs -> RecFlag -> Var -> Expr Var -> UniqSM [(Var, Expr Var)] splitThunk dflags fam_envs is_rec fn_id rhs = ASSERT(not (isJoinId fn_id)) - do { (useful,_, wrap_fn, work_fn) <- mkWWstr dflags fam_envs [fn_id] + do { (useful,_, wrap_fn, work_fn) <- mkWWstr dflags fam_envs False [fn_id] ; let res = [ (fn_id, Let (NonRec fn_id rhs) (wrap_fn (work_fn (Var fn_id)))) ] ; if useful then ASSERT2( isNonRec is_rec, ppr fn_id ) -- The thunk must be non-recursive return res diff --git a/compiler/stranal/WwLib.hs b/compiler/stranal/WwLib.hs index f83aafe7b0..040a6d7da9 100644 --- a/compiler/stranal/WwLib.hs +++ b/compiler/stranal/WwLib.hs @@ -13,17 +13,20 @@ module WwLib ( mkWwBodies, mkWWstr, mkWorkerArgs #include "HsVersions.h" +import GhcPrelude + import CoreSyn import CoreUtils ( exprType, mkCast ) import Id import IdInfo ( JoinArity, vanillaIdInfo ) import DataCon import Demand -import MkCore ( mkRuntimeErrorApp, aBSENT_ERROR_ID, mkCoreUbxTup +import MkCore ( mkAbsentErrorApp, mkCoreUbxTup , mkCoreApp, mkCoreLet ) import MkId ( voidArgId, voidPrimId ) -import TysPrim ( voidPrimTy ) import TysWiredIn ( tupleDataCon ) +import TysPrim ( voidPrimTy ) +import Literal ( absentLiteralOf ) import VarEnv ( mkInScopeSet ) import VarSet ( VarSet ) import Type @@ -31,7 +34,6 @@ import RepType ( isVoidTy ) import Coercion import FamInstEnv import BasicTypes ( Boxity(..) ) -import Literal ( absentLiteralOf ) import TyCon import UniqSupply import Unique @@ -121,8 +123,7 @@ mkWwBodies :: DynFlags -> FamInstEnvs -> VarSet -- Free vars of RHS -- See Note [Freshen WW arguments] - -> Maybe JoinArity -- Just ar <=> is join point with join arity ar - -> Type -- Type of original function + -> Id -- The original function -> [Demand] -- Strictness of original function -> DmdResult -- Info about function result -> UniqSM (Maybe WwResult) @@ -138,12 +139,14 @@ mkWwBodies :: DynFlags -- let x = (a,b) in -- E -mkWwBodies dflags fam_envs rhs_fvs mb_join_arity fun_ty demands res_info +mkWwBodies dflags fam_envs rhs_fvs fun_id demands res_info = do { let empty_subst = mkEmptyTCvSubst (mkInScopeSet rhs_fvs) -- See Note [Freshen WW arguments] - ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) <- mkWWargs empty_subst fun_ty demands - ; (useful1, work_args, wrap_fn_str, work_fn_str) <- mkWWstr dflags fam_envs wrap_args + ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) + <- mkWWargs empty_subst fun_ty demands + ; (useful1, work_args, wrap_fn_str, work_fn_str) + <- mkWWstr dflags fam_envs has_inlineable_prag wrap_args -- Do CPR w/w. See Note [Always do CPR w/w] ; (useful2, wrap_fn_cpr, work_fn_cpr, cpr_res_ty) @@ -156,7 +159,7 @@ mkWwBodies dflags fam_envs rhs_fvs mb_join_arity fun_ty demands res_info ; if isWorkerSmallEnough dflags work_args && not (too_many_args_for_join_point wrap_args) - && (useful1 && not only_one_void_argument || useful2) + && ((useful1 && not only_one_void_argument) || useful2) then return (Just (worker_args_dmds, length work_call_args, wrapper_body, worker_body)) else return Nothing @@ -169,6 +172,11 @@ mkWwBodies dflags fam_envs rhs_fvs mb_join_arity fun_ty demands res_info -- f's RHS is now trivial (size 1) we still want the __inline__ to prevent -- fw from being inlined into f's RHS where + fun_ty = idType fun_id + mb_join_arity = isJoinId_maybe fun_id + has_inlineable_prag = isStableUnfolding (realIdUnfolding fun_id) + -- See Note [Do not unpack class dictionaries] + -- Note [Do not split void functions] only_one_void_argument | [d] <- demands @@ -261,11 +269,21 @@ mkWorkerArgs dflags args res_ty | otherwise = (args ++ [voidArgId], args ++ [voidPrimId]) where + -- See "Making wrapper args" section above needsAValueLambda = - isUnliftedType res_ty + lifted + -- We may encounter a levity-polymorphic result, in which case we + -- conservatively assume that we have laziness that needs preservation. + -- See #15186. || not (gopt Opt_FunToThunk dflags) -- see Note [Protecting the last value argument] + -- Might the result be lifted? + lifted = + case isLiftedType_maybe res_ty of + Just lifted -> lifted + Nothing -> True + {- Note [Protecting the last value argument] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -488,6 +506,8 @@ To avoid this: mkWWstr :: DynFlags -> FamInstEnvs + -> Bool -- True <=> INLINEABLE pragama on this function defn + -- See Note [Do not unpack class dictionaries] -> [Var] -- Wrapper args; have their demand info on them -- *Includes type variables* -> UniqSM (Bool, -- Is this useful @@ -499,13 +519,18 @@ mkWWstr :: DynFlags CoreExpr -> CoreExpr) -- Worker body, lacking the original body of the function, -- and lacking its lambdas. -- This fn does the reboxing -mkWWstr _ _ [] - = return (False, [], nop_fn, nop_fn) +mkWWstr dflags fam_envs has_inlineable_prag args + = go args + where + go_one arg = mkWWstr_one dflags fam_envs has_inlineable_prag arg -mkWWstr dflags fam_envs (arg : args) = do - (useful1, args1, wrap_fn1, work_fn1) <- mkWWstr_one dflags fam_envs arg - (useful2, args2, wrap_fn2, work_fn2) <- mkWWstr dflags fam_envs args - return (useful1 || useful2, args1 ++ args2, wrap_fn1 . wrap_fn2, work_fn1 . work_fn2) + go [] = return (False, [], nop_fn, nop_fn) + go (arg : args) = do { (useful1, args1, wrap_fn1, work_fn1) <- go_one arg + ; (useful2, args2, wrap_fn2, work_fn2) <- go args + ; return ( useful1 || useful2 + , args1 ++ args2 + , wrap_fn1 . wrap_fn2 + , work_fn1 . work_fn2) } {- Note [Unpacking arguments with product and polymorphic demands] @@ -542,9 +567,12 @@ as-yet-un-filled-in pkgState files. -- brings into scope work_args (via cases) -- * work_fn assumes work_args are in scope, a -- brings into scope wrap_arg (via lets) -mkWWstr_one :: DynFlags -> FamInstEnvs -> Var - -> UniqSM (Bool, [Var], CoreExpr -> CoreExpr, CoreExpr -> CoreExpr) -mkWWstr_one dflags fam_envs arg +mkWWstr_one :: DynFlags -> FamInstEnvs + -> Bool -- True <=> INLINEABLE pragama on this function defn + -- See Note [Do not unpack class dictionaries] + -> Var + -> UniqSM (Bool, [Var], CoreExpr -> CoreExpr, CoreExpr -> CoreExpr) +mkWWstr_one dflags fam_envs has_inlineable_prag arg | isTyVar arg = return (False, [arg], nop_fn, nop_fn) @@ -579,8 +607,10 @@ mkWWstr_one dflags fam_envs arg | isStrictDmd dmd , Just cs <- splitProdDmd_maybe dmd -- See Note [Unpacking arguments with product and polymorphic demands] + , not (has_inlineable_prag && isClassPred arg_ty) + -- See Note [Do not unpack class dictionaries] , Just (data_con, inst_tys, inst_con_arg_tys, co) - <- deepSplitProductType_maybe fam_envs (idType arg) + <- deepSplitProductType_maybe fam_envs arg_ty , cs `equalLength` inst_con_arg_tys -- See Note [mkWWstr and unsafeCoerce] = do { (uniq1:uniqs) <- getUniquesM @@ -592,7 +622,7 @@ mkWWstr_one dflags fam_envs arg -- in Simplify.hs; and see Trac #13890 rebox_fn = Let (NonRec arg_no_unf con_app) con_app = mkConApp2 data_con inst_tys unpk_args `mkCast` mkSymCo co - ; (_, worker_args, wrap_fn, work_fn) <- mkWWstr dflags fam_envs unpk_args + ; (_, worker_args, wrap_fn, work_fn) <- mkWWstr dflags fam_envs False unpk_args ; return (True, worker_args, unbox_fn . wrap_fn, work_fn . rebox_fn) } -- Don't pass the arg, rebox instead @@ -600,7 +630,8 @@ mkWWstr_one dflags fam_envs arg = return (False, [arg], nop_fn, nop_fn) where - dmd = idDemandInfo arg + arg_ty = idType arg + dmd = idDemandInfo arg mk_ww_arg uniq ty sub_dmd = setIdDemandInfo (mk_ww_local uniq ty) sub_dmd ---------------------- @@ -678,10 +709,12 @@ BUT if f is strict in the Ord dictionary, we might unpack it, to get and the type-class specialiser can't specialise that. An example is Trac #6056. -Moreover, dictionaries can have a lot of fields, so unpacking them can -increase closure sizes. +But in any other situation a dictionary is just an ordinary value, +and can be unpacked. So we track the INLINABLE pragma, and switch +off the unpacking in mkWWstr_one (see the isClassPred test). -Conclusion: don't unpack dictionaries. +Historical note: Trac #14955 describes how I got this fix wrong +the first time. -} deepSplitProductType_maybe @@ -697,7 +730,6 @@ deepSplitProductType_maybe fam_envs ty `orElse` (mkRepReflCo ty, ty) , Just (tc, tc_args) <- splitTyConApp_maybe ty1 , Just con <- isDataProductTyCon_maybe tc - , not (isClassTyCon tc) -- See Note [Do not unpack class dictionaries] , let arg_tys = dataConInstArgTys con tc_args strict_marks = dataConRepStrictness con = Just (con, tc_args, zipEqual "dspt" arg_tys strict_marks, co) @@ -893,15 +925,24 @@ example, Trac #4306. For these we find a suitable literal, using Literal.absentLiteralOf. We don't have literals for every primitive type, so the function is partial. - [I did try the experiment of using an error thunk for unlifted - things too, relying on the simplifier to drop it as dead code, - by making absentError - (a) *not* be a bottoming Id, - (b) be "ok for speculation" - But that relies on the simplifier finding that it really - is dead code, which is fragile, and indeed failed when - profiling is on, which disables various optimisations. So - using a literal will do.] +Note: I did try the experiment of using an error thunk for unlifted +things too, relying on the simplifier to drop it as dead code. +But this is fragile + + - It fails when profiling is on, which disables various optimisations + + - It fails when reboxing happens. E.g. + data T = MkT Int Int# + f p@(MkT a _) = ...g p.... + where g is /lazy/ in 'p', but only uses the first component. Then + 'f' is /strict/ in 'p', and only uses the first component. So we only + pass that component to the worker for 'f', which reconstructs 'p' to + pass it to 'g'. Alas we can't say + ...f (MkT a (absentError Int# "blah"))... + bacause `MkT` is strict in its Int# argument, so we get an absentError + exception when we shouldn't. Very annoying! + +So absentError is only used for lifted types. -} mk_absent_let :: DynFlags -> Id -> Maybe (CoreExpr -> CoreExpr) @@ -917,12 +958,12 @@ mk_absent_let dflags arg = WARN( True, text "No absent value for" <+> ppr arg_ty ) Nothing where - arg_ty = idType arg - abs_rhs = mkRuntimeErrorApp aBSENT_ERROR_ID arg_ty msg lifted_arg = arg `setIdStrictness` exnSig -- Note in strictness signature that this is bottoming -- (for the sake of the "empty case scrutinee not known to -- diverge for sure lint" warning) + arg_ty = idType arg + abs_rhs = mkAbsentErrorApp arg_ty msg msg = showSDoc (gopt_set dflags Opt_SuppressUniques) (ppr arg <+> ppr (idType arg)) -- We need to suppress uniques here because otherwise they'd |