summaryrefslogtreecommitdiff
path: root/compiler/simplCore/SetLevels.lhs
Commit message (Collapse)AuthorAgeFilesLines
* compiler: de-lhs simplCore/Austin Seipp2014-12-031-1121/+0
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Add comments explaining ProbOneShotSimon Peyton Jones2014-11-041-0/+1
|
* Fix comment typosJan Stolarek2014-10-311-1/+1
|
* Comments onlySimon Peyton Jones2014-08-281-0/+3
|
* Don't float out (classop dict e1 e2)Simon Peyton Jones2014-08-281-9/+20
| | | | | | | | | | | A class op applied to a dictionary doesn't do much work, so it's not a great idea to float it out (except possibly to the top level. See Note [Floating over-saturated applications] in SetLevels I also renamed "floatOutPartialApplications" to "floatOutOverSatApps"; the former is deeply confusing, since there is no partial application involved -- quite the reverse, it is *over* saturated.
* simplCore: detabify/dewhitespace SetLevelsAustin Seipp2014-08-201-272/+264
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-1/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Fix last-minute typo in SetLevels commit ef44a4Simon Peyton Jones2014-03-111-1/+2
| | | | Sorry about that...
* Make SetLevels do substitution properly (fixes Trac #8714)Simon Peyton Jones2014-03-111-332/+292
| | | | | | | | | | | | | | | | | | | | | | Nowadays SetLevels floats case expressions as well as let-bindings, and case expressions bind type variables. We need to clone all such floated binders, to avoid accidental name capture. But I'd forgotten to substitute for the cloned type variables, causing #8714. (In the olden days only Ids were cloned, from let-bindings.) This patch fixes the bug and does quite a bit of clean-up refactoring as well, by putting the context level in the LvlEnv. There is no effect on performance, except that nofib 'rewrite' improves allocations by 3%. On investigation I think it was a fluke to do with loop-cutting in big letrec nests. But at least it's a fluke in the right direction. Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- Min -0.4% -3.0% -19.4% -19.4% -26.7% Max -0.0% +0.0% +17.9% +17.9% 0.0% Geometric Mean -0.1% -0.0% -0.7% -0.7% -0.4%
* Improve the handling of used-once stuffSimon Peyton Jones2013-12-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joachim and I are committing this onto a branch so that we can share it, but we expect to do a bit more work before merging it onto head. Nofib staus: - Most programs, no change - A few improve - A couple get worse (cacheprof, tak, rfib) Investigating the "get worse" set is what's holding up putting this on head. The major issue is this. Consider map (f g) ys where f's demand signature looks like f :: <L,C1(C1(U))> -> <L,U> -> . So 'f' is not saturated. What demand do we place on g? Answer C(C1(U)) That is, the inner C1 should stay, even though f is not saturated. I found that this made a significant difference in the demand signatures inferred in GHC.IO, which uses lots of higher-order exception handlers. I also had to add used-once demand signatures for some of the 'catch' primops, so that we know their handlers are only called once.
* Globally replace "hackage.haskell.org" with "ghc.haskell.org"Simon Marlow2013-10-011-1/+1
|
* Comments onlySimon Peyton Jones2013-01-221-3/+4
|
* Major patch to implement the new Demand AnalyserSimon Peyton Jones2013-01-171-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the result of Ilya Sergey's internship at MSR. It constitutes a thorough overhaul and simplification of the demand analyser. It makes a solid foundation on which we can now build. Main changes are * Instead of having one combined type for Demand, a Demand is now a pair (JointDmd) of - a StrDmd and - an AbsDmd. This allows strictness and absence to be though about quite orthogonally, and greatly reduces brain melt-down. * Similarly in the DmdResult type, it's a pair of - a PureResult (indicating only divergence/non-divergence) - a CPRResult (which deals only with the CPR property * In IdInfo, the strictnessInfo field contains a StrictSig, not a Maybe StrictSig demandInfo field contains a Demand, not a Maybe Demand We don't need Nothing (to indicate no strictness/demand info) any more; topSig/topDmd will do. * Remove "boxity" analysis entirely. This was an attempt to avoid "reboxing", but it added complexity, is extremely ad-hoc, and makes very little difference in practice. * Remove the "unboxing strategy" computation. This was an an attempt to ensure that a worker didn't get zillions of arguments by unboxing big tuples. But in fact removing it DRAMATICALLY reduces allocation in an inner loop of the I/O library (where the threshold argument-count had been set just too low). It's exceptional to have a zillion arguments and I don't think it's worth the complexity, especially since it turned out to have a serious performance hit. * Remove quite a bit of ad-hoc cruft * Move worthSplittingFun, worthSplittingThunk from WorkWrap to Demand. This allows JointDmd to be fully abstract, examined only inside Demand. Everything else really follows from these changes. All of this is really just refactoring, so we don't expect big performance changes, but acutally the numbers look quite good. Here is a full nofib run with some highlights identified: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- expert -2.6% -15.5% 0.00 0.00 +0.0% fluid -2.4% -7.1% 0.01 0.01 +0.0% gg -2.5% -28.9% 0.02 0.02 -33.3% integrate -2.6% +3.2% +2.6% +2.6% +0.0% mandel2 -2.6% +4.2% 0.01 0.01 +0.0% nucleic2 -2.0% -16.3% 0.11 0.11 +0.0% para -2.6% -20.0% -11.8% -11.7% +0.0% parser -2.5% -17.9% 0.05 0.05 +0.0% prolog -2.6% -13.0% 0.00 0.00 +0.0% puzzle -2.6% +2.2% +0.8% +0.8% +0.0% sorting -2.6% -35.9% 0.00 0.00 +0.0% treejoin -2.6% -52.2% -9.8% -9.9% +0.0% -------------------------------------------------------------------------------- Min -2.7% -52.2% -11.8% -11.7% -33.3% Max -1.8% +4.2% +10.5% +10.5% +7.7% Geometric Mean -2.5% -2.8% -0.4% -0.5% -0.4% Things to note * Binary sizes are smaller. I don't know why, but it's good. * Allocation is sometiemes a *lot* smaller. I believe that all the big numbers (I checked treejoin, gg, sorting) arise from one place, namely a function GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of which have several arugments. Not w/w'ing both arguments (which is what we did before) has a big effect. So the big win in actually somewhat accidental, gained by removing the "unboxing strategy" code. * A couple of benchmarks allocate slightly more. This turns out to be due to reboxing (integrate). But the biggest increase is mandel2, and *that* turned out also to be a somewhat accidental loss of CSE, and pointed the way to doing better CSE: see Trac #7596. * Runtimes are never very reliable, but seem to improve very slightly. All in all, a good piece of work. Thank you Ilya!
* Comments onlySimon Peyton Jones2012-03-201-1/+2
|
* Move sortQuantVars to MkCoreSimon Peyton Jones2012-02-171-18/+15
|
* Comments onlySimon Peyton Jones2012-02-161-1/+4
|
* Comments onlySimon Peyton Jones2012-01-121-2/+2
|
* Move mkPiTypes back to Type, rename mkForAllArrowKinds to mkPiKindsJose Pedro Magalhaes2011-11-161-2/+2
|
* New kind-polymorphic coreJose Pedro Magalhaes2011-11-111-13/+9
| | | | | | | | | This big patch implements a kind-polymorphic core for GHC. The current implementation focuses on making sure that all kind-monomorphic programs still work in the new core; it is not yet guaranteed that kind-polymorphic programs (using the new -XPolyKinds flag) will work. For more information, see http://haskell.org/haskellwiki/GHC/Kinds
* Use -fwarn-tabs when validatingIan Lynagh2011-11-041-0/+7
| | | | | We only use it for "compiler" sources, i.e. not for libraries. Many modules have a -fno-warn-tabs kludge for now.
* Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-021-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
* Recover proper sharing for Integer literalsSimon Peyton Jones2011-10-211-1/+13
| | | | | | | | | | | Trac #5549 showed a loss of performance for GHC 7.4. What was happening was that an integer literal was being allocated each time around a loop, rather than being floated to top level and shared. Two fixes * Make the float-out pass float literals that are non-trivial * Make the inliner *not* treat Integer literals as size-zero
* fix Note cross-refSimon Marlow2011-10-051-1/+1
|
* Make a new type synonym CoreProgram = [CoreBind]Simon Peyton Jones2011-09-231-1/+1
| | | | | | | | | | | and comment its invariants in Note [CoreProgram] in CoreSyn I'm not totally convinced that CoreProgram is the right name (perhaps CoreTopBinds might better), but it is useful to have a clue that you are looking at the top-level bindings. This is only a matter of a type synonym change; no deep refactoring here.
* Fix handing of CoVars in SetLevels: it wasn't renaming occurrences of ↵Max Bolingbroke2011-09-061-31/+30
| | | | | | | | case-bound coercion variabes Conflicts: compiler/simplCore/SetLevels.lhs
* Fix two bugs in caes-floating (fixes Trac #5453)Simon Peyton Jones2011-09-051-52/+105
| | | | | | | | | | The problem is documented in the ticket. The patch does two things 1. Make exprOkForSpeculation return False for a non-exhaustive case 2. In SetLevels.lvlExpr, look at the *result* scrutinee, not the *input* scrutinee, when testing for evaluated-ness
* Comments onlySimon Peyton Jones2011-08-231-2/+3
|
* A second bite at the case-floating patchSimon Peyton Jones2011-06-301-38/+56
| | | | | | | When floating a case outwards we must be careful to clone the binders, since their scope is widening. Plus lots of tidying up.
* Add case-floating to the float-out passSimon Peyton Jones2011-06-271-125/+177
| | | | | | | | | | | | | | | | | | There are two things in this patch. First, a new feature. Given (case x of I# y -> ...) where 'x' is known to be evaluated, the float-out pass will float the case outwards towards x's binding. Of course this doesn't happen if 'x' is evaluated because of an enclosing case (becuase then the inner case would be eliminated) but it *does* happen when x is bound by a constructor with a strict field. This happens in DPH. Trac #4081. The second change is a significant refactoring of the way the let-floater works. Now SetLevels makes a decision about whether the let (or case) will move, and records that decision in the FloatSpec flag. This change makes the whole caboodle much easier to think about.
* The final batch of changes for the new coercion representationSimon Peyton Jones2011-05-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | * Fix bugs in the packing and unpacking of data constructors with equality predicates in their types * Remove PredCo altogether; instead, coercions between predicated types (like (Eq a, [a]~b) => blah) are treated as if they were precisely their underlying representation type Eq a -> ((~) [a] b) -> blah in this case * Similarly, Type.coreView no longer treats equality predciates specially. * Implement the cast-of-coercion optimisation in Simplify.simplCoercionF Numerous other small bug-fixes and refactorings. Annoyingly, OptCoercion had Windows line endings, and this patch switches to Unix, so it looks as if every line has changed.
* This BIG PATCH contains most of the work for the New Coercion RepresentationSimon Peyton Jones2011-04-191-6/+9
| | | | | | | | | | | | | | See the paper "Practical aspects of evidence based compilation in System FC" * Coercion becomes a data type, distinct from Type * Coercions become value-level things, rather than type-level things, (although the value is zero bits wide, like the State token) A consequence is that a coerion abstraction increases the arity by 1 (just like a dictionary abstraction) * There is a new constructor in CoreExpr, namely Coercion, to inject coercions into terms
* Some infrastruture for lambda-liftingsimonpj@microsoft.com2010-11-161-54/+72
| | | | | | | | | | | This stuff should have no effect but it sets things up so that we can try floating out lambdas of n value arguments. The new (secret) flag is -ffloatt-lam-args=n. This is *not* working yet, but it's got tangled up with other stuff I want to commit, and it does no harm.
* Fix a long-standing bug the float-out pass simonpj@microsoft.com2010-10-261-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were failing to float out a binding that could be floated, because of a confusion in the Lam case of floatExpr. In investigating this I also discoverd that there is really no point at all in giving a different level to variables in a binding group, so I've now given them all the same (in SetLevels.lvlLamBndrs The overall difference is quite minor in a nofib run: Program Size Allocs Runtime Elapsed ------------------------------------------------------------- Min +0.0% -8.5% -28.4% -28.7% Max +0.0% +0.7% -0.7% -1.1% Geometric Mean +0.0% -0.0% -11.6% -11.8% I don't trust those runtimes, but smaller is good! The 8.5% improvement in allocation in fulsom, and seems real. The 0.7% allocation increase only happens in programs with very small allocation. I tracked one down to a call of this form GHC.IO.Handle.Internals.mkDuplexHandle5 = \ args -> GHC.IO.Handle.Internals.openTextEncoding1 mb_codec ha_type (\mb_encoder mb_decoder -> blah) With the new floater the argument of openTextEncoding1 becomes (let lvl = .. in \mb_encoder mb_decoder -> blah) And rightly so. However in fact this argument is a continuation and hence is called once, so the floating is fruitless. Roll on one-shot-function analysis (which I know how to do but fail to get to!).
* Float out partial applicationsSimon Marlow2010-10-081-9/+40
| | | | | | | | | | This fixes at least one case of performance regression in 7.0, and is nice win on nofib: Program Size Allocs Runtime Elapsed Min +0.3% -63.0% -38.5% -38.7% Max +1.2% +0.2% +0.9% +0.9% Geometric Mean +0.6% -3.0% -6.4% -6.6%
* Implement INLINABLE pragma simonpj@microsoft.com2010-09-151-1/+1
| | | | Implements Trac #4299. Documentation to come.
* Super-monster patch implementing the new typechecker -- at lastsimonpj@microsoft.com2010-09-131-2/+2
| | | | | | | | | This major patch implements the new OutsideIn constraint solving algorithm in the typecheker, following our JFP paper "Modular type inference with local assumptions". Done with major help from Dimitrios Vytiniotis and Brent Yorgey.
* Fix egregious bug in SetLevels.notWorthFloatingsimonpj@microsoft.com2010-08-131-4/+4
| | | | | | This bug just led to stupid code, which would later be optimised away, but better not to generate stupid code in the first place.
* Move all the CoreToDo stuff into CoreMonadsimonpj@microsoft.com2009-12-181-2/+1
| | | | | | | | | | | | | | | | This patch moves a lot of code around, but has zero functionality change. The idea is that the types CoreToDo SimplifierSwitch SimplifierMode FloatOutSwitches and the main core-to-core pipeline construction belong in simplCore/, and *not* in DynFlags.
* Bottom extraction: float out bottoming expressions to top levelsimonpj@microsoft.com2009-12-111-32/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is to float out bottoming expressions to top level, abstracting them over any variables they mention, if necessary. This is good because it makes functions smaller (and more likely to inline), by keeping error code out of line. See Note [Bottoming floats] in SetLevels. On the way, this fixes the HPC failures for cg059 and friends. I've been meaning to do this for some time. See Maessen's paper 1999 "Bottom extraction: factoring error handling out of functional programs" (unpublished I think). Here are the nofib results: Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- Min +0.1% -7.8% -14.4% -32.5% Max +0.5% +0.2% +1.6% +13.8% Geometric Mean +0.4% -0.2% -4.9% -6.7% Module sizes -1 s.d. ----- -2.6% +1 s.d. ----- +2.3% Average ----- -0.2% Compile times: -1 s.d. ----- -11.4% +1 s.d. ----- +4.3% Average ----- -3.8% I'm think program sizes have crept up because the base library is bigger -- module sizes in nofib decrease very slightly. In turn I think that may be because the floating generates a call where there was no call before. Anyway I think it's acceptable. The main changes are: * SetLevels floats out things that exprBotStrictness_maybe identifies as bottom. Make sure to pin on the right strictness info to the newly created Ids, so that the info ends up in interface files. Since FloatOut is run twice, we have to be careful that we don't treat the function created by the first float-out as a candidate for the second; this is what worthFloating does. See SetLevels Note [Bottoming floats] Note [Bottoming floats: eta expansion] * Be careful not to inline top-level bottoming functions; this would just undo what the floating transformation achieves. See CoreUnfold Note [Do not inline top-level bottoming functions Ensuring this requires a bit of extra plumbing, but nothing drastic.. * Similarly pre/postInlineUnconditionally should be careful not to re-inline top-level bottoming things! See SimplUtils Note [Top-level botomming Ids] Note [Top level and postInlineUnconditionally]
* Minor refactoring to remove redundant codesimonpj@microsoft.com2009-12-071-7/+1
|
* Remove the (very) old strictness analysersimonpj@microsoft.com2009-11-191-2/+2
| | | | | | | I finally got tired of the #ifdef OLD_STRICTNESS stuff. I had been keeping it around in the hope of doing old-to-new comparisions, but have failed to do so for many years, so I don't think it's going to happen. This patch deletes the clutter.
* The Big INLINE Patch: totally reorganise way that INLINE pragmas worksimonpj@microsoft.com2009-10-291-58/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has been a long time in gestation and has, as a result, accumulated some extra bits and bobs that are only loosely related. I separated the bits that are easy to split off, but the rest comes as one big patch, I'm afraid. Note that: * It comes together with a patch to the 'base' library * Interface file formats change slightly, so you need to recompile all libraries The patch is mainly giant tidy-up, driven in part by the particular stresses of the Data Parallel Haskell project. I don't expect a big performance win for random programs. Still, here are the nofib results, relative to the state of affairs without the patch Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- Min -12.7% -14.5% -17.5% -17.8% Max +4.7% +10.9% +9.1% +8.4% Geometric Mean +0.9% -0.1% -5.6% -7.3% The +10.9% allocation outlier is rewrite, which happens to have a very delicate optimisation opportunity involving an interaction of CSE and inlining (see nofib/Simon-nofib-notes). The fact that the 'before' case found the optimisation is somewhat accidental. Runtimes seem to go down, but I never kno wwhether to really trust this number. Binary sizes wobble a bit, but nothing drastic. The Main Ideas are as follows. InlineRules ~~~~~~~~~~~ When you say {-# INLINE f #-} f x = <rhs> you intend that calls (f e) are replaced by <rhs>[e/x] So we should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle with it. Meanwhile, we can optimise <rhs> to our heart's content, leaving the original unfolding intact in Unfolding of 'f'. So the representation of an Unfolding has changed quite a bit (see CoreSyn). An INLINE pragma gives rise to an InlineRule unfolding. Moreover, it's only used when 'f' is applied to the specified number of arguments; that is, the number of argument on the LHS of the '=' sign in the original source definition. For example, (.) is now defined in the libraries like this {-# INLINE (.) #-} (.) f g = \x -> f (g x) so that it'll inline when applied to two arguments. If 'x' appeared on the left, thus (.) f g x = f (g x) it'd only inline when applied to three arguments. This slightly-experimental change was requested by Roman, but it seems to make sense. Other associated changes * Moving the deck chairs in DsBinds, which processes the INLINE pragmas * In the old system an INLINE pragma made the RHS look like (Note InlineMe <rhs>) The Note switched off optimisation in <rhs>. But it was quite fragile in corner cases. The new system is more robust, I believe. In any case, the InlineMe note has disappeared * The workerInfo of an Id has also been combined into its Unfolding, so it's no longer a separate field of the IdInfo. * Many changes in CoreUnfold, esp in callSiteInline, which is the critical function that decides which function to inline. Lots of comments added! * exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly associated with "does this expression unfold to a constructor application". It can now do some limited beta reduction too, which Roman found was an important. Instance declarations ~~~~~~~~~~~~~~~~~~~~~ It's always been tricky to get the dfuns generated from instance declarations to work out well. This is particularly important in the Data Parallel Haskell project, and I'm now on my fourth attempt, more or less. There is a detailed description in TcInstDcls, particularly in Note [How instance declarations are translated]. Roughly speaking we now generate a top-level helper function for every method definition in an instance declaration, so that the dfun takes a particularly stylised form: dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc... In fact, it's *so* stylised that we never need to unfold a dfun. Instead ClassOps have a special rewrite rule that allows us to short-cut dictionary selection. Suppose dfun :: Ord a -> Ord [a] d :: Ord a Then compare (dfun a d) --> compare_list a d in one rewrite, without first inlining the 'compare' selector and the body of the dfun. To support this a) ClassOps have a BuiltInRule (see MkId.dictSelRule) b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding) which is exploited in CoreUnfold.exprIsConApp_maybe Implmenting all this required a root-and-branch rework of TcInstDcls and bits of TcClassDcl. Default methods ~~~~~~~~~~~~~~~ If you give an INLINE pragma to a default method, it should be just as if you'd written out that code in each instance declaration, including the INLINE pragma. I think that it now *is* so. As a result, library code can be simpler; less duplication. The CONLIKE pragma ~~~~~~~~~~~~~~~~~~ In the DPH project, Roman found cases where he had p n k = let x = replicate n k in ...(f x)...(g x).... {-# RULE f (replicate x) = f_rep x #-} Normally the RULE would not fire, because doing so involves (in effect) duplicating the redex (replicate n k). A new experimental modifier to the INLINE pragma, {-# INLINE CONLIKE replicate #-}, allows you to tell GHC to be prepared to duplicate a call of this function if it allows a RULE to fire. See Note [CONLIKE pragma] in BasicTypes Join points ~~~~~~~~~~~ See Note [Case binders and join points] in Simplify Other refactoring ~~~~~~~~~~~~~~~~~ * I moved endPass from CoreLint to CoreMonad, with associated jigglings * Better pretty-printing of Core * The top-level RULES (ones that are not rules for locally-defined things) are now substituted on every simplifier iteration. I'm not sure how we got away without doing this before. This entails a bit more plumbing in SimplCore. * The necessary stuff to serialise and deserialise the new info across interface files. * Something about bottoming floats in SetLevels Note [Bottoming floats] * substUnfolding has moved from SimplEnv to CoreSubs, where it belongs -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- anna +2.4% -0.5% 0.16 0.17 ansi +2.6% -0.1% 0.00 0.00 atom -3.8% -0.0% -1.0% -2.5% awards +3.0% +0.7% 0.00 0.00 banner +3.3% -0.0% 0.00 0.00 bernouilli +2.7% +0.0% -4.6% -6.9% boyer +2.6% +0.0% 0.06 0.07 boyer2 +4.4% +0.2% 0.01 0.01 bspt +3.2% +9.6% 0.02 0.02 cacheprof +1.4% -1.0% -12.2% -13.6% calendar +2.7% -1.7% 0.00 0.00 cichelli +3.7% -0.0% 0.13 0.14 circsim +3.3% +0.0% -2.3% -9.9% clausify +2.7% +0.0% 0.05 0.06 comp_lab_zift +2.6% -0.3% -7.2% -7.9% compress +3.3% +0.0% -8.5% -9.6% compress2 +3.6% +0.0% -15.1% -17.8% constraints +2.7% -0.6% -10.0% -10.7% cryptarithm1 +4.5% +0.0% -4.7% -5.7% cryptarithm2 +4.3% -14.5% 0.02 0.02 cse +4.4% -0.0% 0.00 0.00 eliza +2.8% -0.1% 0.00 0.00 event +2.6% -0.0% -4.9% -4.4% exp3_8 +2.8% +0.0% -4.5% -9.5% expert +2.7% +0.3% 0.00 0.00 fem -2.0% +0.6% 0.04 0.04 fft -6.0% +1.8% 0.05 0.06 fft2 -4.8% +2.7% 0.13 0.14 fibheaps +2.6% -0.6% 0.05 0.05 fish +4.1% +0.0% 0.03 0.04 fluid -2.1% -0.2% 0.01 0.01 fulsom -4.8% +9.2% +9.1% +8.4% gamteb -7.1% -1.3% 0.10 0.11 gcd +2.7% +0.0% 0.05 0.05 gen_regexps +3.9% -0.0% 0.00 0.00 genfft +2.7% -0.1% 0.05 0.06 gg -2.7% -0.1% 0.02 0.02 grep +3.2% -0.0% 0.00 0.00 hidden -0.5% +0.0% -11.9% -13.3% hpg -3.0% -1.8% +0.0% -2.4% ida +2.6% -1.2% 0.17 -9.0% infer +1.7% -0.8% 0.08 0.09 integer +2.5% -0.0% -2.6% -2.2% integrate -5.0% +0.0% -1.3% -2.9% knights +4.3% -1.5% 0.01 0.01 lcss +2.5% -0.1% -7.5% -9.4% life +4.2% +0.0% -3.1% -3.3% lift +2.4% -3.2% 0.00 0.00 listcompr +4.0% -1.6% 0.16 0.17 listcopy +4.0% -1.4% 0.17 0.18 maillist +4.1% +0.1% 0.09 0.14 mandel +2.9% +0.0% 0.11 0.12 mandel2 +4.7% +0.0% 0.01 0.01 minimax +3.8% -0.0% 0.00 0.00 mkhprog +3.2% -4.2% 0.00 0.00 multiplier +2.5% -0.4% +0.7% -1.3% nucleic2 -9.3% +0.0% 0.10 0.10 para +2.9% +0.1% -0.7% -1.2% paraffins -10.4% +0.0% 0.20 -1.9% parser +3.1% -0.0% 0.05 0.05 parstof +1.9% -0.0% 0.00 0.01 pic -2.8% -0.8% 0.01 0.02 power +2.1% +0.1% -8.5% -9.0% pretty -12.7% +0.1% 0.00 0.00 primes +2.8% +0.0% 0.11 0.11 primetest +2.5% -0.0% -2.1% -3.1% prolog +3.2% -7.2% 0.00 0.00 puzzle +4.1% +0.0% -3.5% -8.0% queens +2.8% +0.0% 0.03 0.03 reptile +2.2% -2.2% 0.02 0.02 rewrite +3.1% +10.9% 0.03 0.03 rfib -5.2% +0.2% 0.03 0.03 rsa +2.6% +0.0% 0.05 0.06 scc +4.6% +0.4% 0.00 0.00 sched +2.7% +0.1% 0.03 0.03 scs -2.6% -0.9% -9.6% -11.6% simple -4.0% +0.4% -14.6% -14.9% solid -5.6% -0.6% -9.3% -14.3% sorting +3.8% +0.0% 0.00 0.00 sphere -3.6% +8.5% 0.15 0.16 symalg -1.3% +0.2% 0.03 0.03 tak +2.7% +0.0% 0.02 0.02 transform +2.0% -2.9% -8.0% -8.8% treejoin +3.1% +0.0% -17.5% -17.8% typecheck +2.9% -0.3% -4.6% -6.6% veritas +3.9% -0.3% 0.00 0.00 wang -6.2% +0.0% 0.18 -9.8% wave4main -10.3% +2.6% -2.1% -2.3% wheel-sieve1 +2.7% -0.0% +0.3% -0.6% wheel-sieve2 +2.7% +0.0% -3.7% -7.5% x2n1 -4.1% +0.1% 0.03 0.04 -------------------------------------------------------------------------------- Min -12.7% -14.5% -17.5% -17.8% Max +4.7% +10.9% +9.1% +8.4% Geometric Mean +0.9% -0.1% -5.6% -7.3%
* Remove unused importsIan Lynagh2009-07-071-1/+1
|
* Don't float case expressions in full lazinesssimonpj@microsoft.com2009-04-021-0/+12
| | | | | | See Note [Case MFEs]; don't float case expressions from a strict context.
* Improve transferPolyIdInfo for value-arg abstractionsimonpj@microsoft.com2009-02-041-1/+1
| | | | | | | | | | | If we float a binding out of a *value* lambda, the fixing-up of IdInfo is a bit more complicated than before. Since in principle FloatOut can do this (and thus can do full lambda lifting), it's imporrtant that transferPolyIdInfo does the Right Thing. This doensn't matter unless you use FloatOut's abilty to lambda-lift, which GHC mostly doesn't, yet. But Max used it and tripped over this bug.
* Make record selectors into ordinary functionssimonpj@microsoft.com2009-01-021-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This biggish patch addresses Trac #2670. The main effect is to make record selectors into ordinary functions, whose unfoldings appear in interface files, in contrast to their previous existence as magic "implicit Ids". This means that the usual machinery of optimisation, analysis, and inlining applies to them, which was failing before when the selector was somewhat complicated. (Which it can be when strictness annotations, unboxing annotations, and GADTs are involved.) The change involves the following points * Changes in Var.lhs to the representation of Var. Now a LocalId can have an IdDetails as well as a GlobalId. In particular, the information that an Id is a record selector is kept in the IdDetails. While compiling the current module, the record selector *must* be a LocalId, so that it participates properly in compilation (free variables etc). This led me to change the (hidden) representation of Var, so that there is now only one constructor for Id, not two. * The IdDetails is persisted into interface files, so that an importing module can see which Ids are records selectors. * In TcTyClDecls, we generate the record-selector bindings in renamed, but not typechecked form. In this way, we can get the typechecker to add all the types and so on, which is jolly helpful especially when GADTs or type families are involved. Just like derived instance declarations. This is the big new chunk of 180 lines of code (much of which is commentary). A call to the same function, mkAuxBinds, is needed in TcInstDcls for associated types. * The typechecker therefore has to pin the correct IdDetails on to the record selector, when it typechecks it. There was a neat way to do this, by adding a new sort of signature to HsBinds.Sig, namely IdSig. This contains an Id (with the correct Name, Type, and IdDetails); the type checker uses it as the binder for the final binding. This worked out rather easily. * Record selectors are no longer "implicit ids", which entails changes to IfaceSyn.ifaceDeclSubBndrs HscTypes.implicitTyThings TidyPgm.getImplicitBinds (These three functions must agree.) * MkId.mkRecordSelectorId is deleted entirely, some 300+ lines (incl comments) of very error prone code. Happy days. * A TyCon no longer contains the list of record selectors: algTcSelIds is gone The renamer is unaffected, including the way that import and export of record selectors is handled. Other small things * IfaceSyn.ifaceDeclSubBndrs had a fragile test for whether a data constructor had a wrapper. I've replaced that with an explicit flag in the interface file. More robust I hope. * I renamed isIdVar to isId, which touched a few otherwise-unrelated files.
* Rollback INLINE patchesSimon Marlow2008-12-161-54/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rolling back: Fri Dec 5 16:54:00 GMT 2008 simonpj@microsoft.com * Completely new treatment of INLINE pragmas (big patch) This is a major patch, which changes the way INLINE pragmas work. Although lots of files are touched, the net is only +21 lines of code -- and I bet that most of those are comments! HEADS UP: interface file format has changed, so you'll need to recompile everything. There is not much effect on overall performance for nofib, probably because those programs don't make heavy use of INLINE pragmas. Program Size Allocs Runtime Elapsed Min -11.3% -6.9% -9.2% -8.2% Max -0.1% +4.6% +7.5% +8.9% Geometric Mean -2.2% -0.2% -1.0% -0.8% (The +4.6% for on allocs is cichelli; see other patch relating to -fpass-case-bndr-to-join-points.) The old INLINE system ~~~~~~~~~~~~~~~~~~~~~ The old system worked like this. A function with an INLINE pragam got a right-hand side which looked like f = __inline_me__ (\xy. e) The __inline_me__ part was an InlineNote, and was treated specially in various ways. Notably, the simplifier didn't inline inside an __inline_me__ note. As a result, the code for f itself was pretty crappy. That matters if you say (map f xs), because then you execute the code for f, rather than inlining a copy at the call site. The new story: InlineRules ~~~~~~~~~~~~~~~~~~~~~~~~~~ The new system removes the InlineMe Note altogether. Instead there is a new constructor InlineRule in CoreSyn.Unfolding. This is a bit like a RULE, in that it remembers the template to be inlined inside the InlineRule. No simplification or inlining is done on an InlineRule, just like RULEs. An Id can have an InlineRule *or* a CoreUnfolding (since these are two constructors from Unfolding). The simplifier treats them differently: - An InlineRule is has the substitution applied (like RULES) but is otherwise left undisturbed. - A CoreUnfolding is updated with the new RHS of the definition, on each iteration of the simplifier. An InlineRule fires regardless of size, but *only* when the function is applied to enough arguments. The "arity" of the rule is specified (by the programmer) as the number of args on the LHS of the "=". So it makes a difference whether you say {-# INLINE f #-} f x = \y -> e or f x y = e This is one of the big new features that InlineRule gives us, and it is one that Roman really wanted. In contrast, a CoreUnfolding can fire when it is applied to fewer args than than the function has lambdas, provided the result is small enough. Consequential stuff ~~~~~~~~~~~~~~~~~~~ * A 'wrapper' no longer has a WrapperInfo in the IdInfo. Instead, the InlineRule has a field identifying wrappers. * Of course, IfaceSyn and interface serialisation changes appropriately. * Making implication constraints inline nicely was a bit fiddly. In the end I added a var_inline field to HsBInd.VarBind, which is why this patch affects the type checker slightly * I made some changes to the way in which eta expansion happens in CorePrep, mainly to ensure that *arguments* that become let-bound are also eta-expanded. I'm still not too happy with the clarity and robustness fo the result. * We now complain if the programmer gives an INLINE pragma for a recursive function (prevsiously we just ignored it). Reason for change: we don't want an InlineRule on a LoopBreaker, because then we'd have to check for loop-breaker-hood at occurrence sites (which isn't currenlty done). Some tests need changing as a result. This patch has been in my tree for quite a while, so there are probably some other minor changes. M ./compiler/basicTypes/Id.lhs -11 M ./compiler/basicTypes/IdInfo.lhs -82 M ./compiler/basicTypes/MkId.lhs -2 +2 M ./compiler/coreSyn/CoreFVs.lhs -2 +25 M ./compiler/coreSyn/CoreLint.lhs -5 +1 M ./compiler/coreSyn/CorePrep.lhs -59 +53 M ./compiler/coreSyn/CoreSubst.lhs -22 +31 M ./compiler/coreSyn/CoreSyn.lhs -66 +92 M ./compiler/coreSyn/CoreUnfold.lhs -112 +112 M ./compiler/coreSyn/CoreUtils.lhs -185 +184 M ./compiler/coreSyn/MkExternalCore.lhs -1 M ./compiler/coreSyn/PprCore.lhs -4 +40 M ./compiler/deSugar/DsBinds.lhs -70 +118 M ./compiler/deSugar/DsForeign.lhs -2 +4 M ./compiler/deSugar/DsMeta.hs -4 +3 M ./compiler/hsSyn/HsBinds.lhs -3 +3 M ./compiler/hsSyn/HsUtils.lhs -2 +7 M ./compiler/iface/BinIface.hs -11 +25 M ./compiler/iface/IfaceSyn.lhs -13 +21 M ./compiler/iface/MkIface.lhs -24 +19 M ./compiler/iface/TcIface.lhs -29 +23 M ./compiler/main/TidyPgm.lhs -55 +49 M ./compiler/parser/ParserCore.y -5 +6 M ./compiler/simplCore/CSE.lhs -2 +1 M ./compiler/simplCore/FloatIn.lhs -6 +1 M ./compiler/simplCore/FloatOut.lhs -23 M ./compiler/simplCore/OccurAnal.lhs -36 +5 M ./compiler/simplCore/SetLevels.lhs -59 +54 M ./compiler/simplCore/SimplCore.lhs -48 +52 M ./compiler/simplCore/SimplEnv.lhs -26 +22 M ./compiler/simplCore/SimplUtils.lhs -28 +4 M ./compiler/simplCore/Simplify.lhs -91 +109 M ./compiler/specialise/Specialise.lhs -15 +18 M ./compiler/stranal/WorkWrap.lhs -14 +11 M ./compiler/stranal/WwLib.lhs -2 +2 M ./compiler/typecheck/Inst.lhs -1 +3 M ./compiler/typecheck/TcBinds.lhs -17 +27 M ./compiler/typecheck/TcClassDcl.lhs -1 +2 M ./compiler/typecheck/TcExpr.lhs -4 +6 M ./compiler/typecheck/TcForeign.lhs -1 +1 M ./compiler/typecheck/TcGenDeriv.lhs -14 +13 M ./compiler/typecheck/TcHsSyn.lhs -3 +2 M ./compiler/typecheck/TcInstDcls.lhs -5 +4 M ./compiler/typecheck/TcRnDriver.lhs -2 +11 M ./compiler/typecheck/TcSimplify.lhs -10 +17 M ./compiler/vectorise/VectType.hs +7 Mon Dec 8 12:43:10 GMT 2008 simonpj@microsoft.com * White space only M ./compiler/simplCore/Simplify.lhs -2 Mon Dec 8 12:48:40 GMT 2008 simonpj@microsoft.com * Move simpleOptExpr from CoreUnfold to CoreSubst M ./compiler/coreSyn/CoreSubst.lhs -1 +87 M ./compiler/coreSyn/CoreUnfold.lhs -72 +1 Mon Dec 8 17:30:18 GMT 2008 simonpj@microsoft.com * Use CoreSubst.simpleOptExpr in place of the ad-hoc simpleSubst (reduces code too) M ./compiler/deSugar/DsBinds.lhs -50 +16 Tue Dec 9 17:03:02 GMT 2008 simonpj@microsoft.com * Fix Trac #2861: bogus eta expansion Urghlhl! I "tided up" the treatment of the "state hack" in CoreUtils, but missed an unexpected interaction with the way that a bottoming function simply swallows excess arguments. There's a long Note [State hack and bottoming functions] to explain (which accounts for most of the new lines of code). M ./compiler/coreSyn/CoreUtils.lhs -16 +53 Mon Dec 15 10:02:21 GMT 2008 Simon Marlow <marlowsd@gmail.com> * Revert CorePrep part of "Completely new treatment of INLINE pragmas..." The original patch said: * I made some changes to the way in which eta expansion happens in CorePrep, mainly to ensure that *arguments* that become let-bound are also eta-expanded. I'm still not too happy with the clarity and robustness fo the result. Unfortunately this change apparently broke some invariants that were relied on elsewhere, and in particular lead to panics when compiling with profiling on. Will re-investigate in the new year. M ./compiler/coreSyn/CorePrep.lhs -53 +58 M ./configure.ac -1 +1 Mon Dec 15 12:28:51 GMT 2008 Simon Marlow <marlowsd@gmail.com> * revert accidental change to configure.ac M ./configure.ac -1 +1
* Completely new treatment of INLINE pragmas (big patch)simonpj@microsoft.com2008-12-051-59/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a major patch, which changes the way INLINE pragmas work. Although lots of files are touched, the net is only +21 lines of code -- and I bet that most of those are comments! HEADS UP: interface file format has changed, so you'll need to recompile everything. There is not much effect on overall performance for nofib, probably because those programs don't make heavy use of INLINE pragmas. Program Size Allocs Runtime Elapsed Min -11.3% -6.9% -9.2% -8.2% Max -0.1% +4.6% +7.5% +8.9% Geometric Mean -2.2% -0.2% -1.0% -0.8% (The +4.6% for on allocs is cichelli; see other patch relating to -fpass-case-bndr-to-join-points.) The old INLINE system ~~~~~~~~~~~~~~~~~~~~~ The old system worked like this. A function with an INLINE pragam got a right-hand side which looked like f = __inline_me__ (\xy. e) The __inline_me__ part was an InlineNote, and was treated specially in various ways. Notably, the simplifier didn't inline inside an __inline_me__ note. As a result, the code for f itself was pretty crappy. That matters if you say (map f xs), because then you execute the code for f, rather than inlining a copy at the call site. The new story: InlineRules ~~~~~~~~~~~~~~~~~~~~~~~~~~ The new system removes the InlineMe Note altogether. Instead there is a new constructor InlineRule in CoreSyn.Unfolding. This is a bit like a RULE, in that it remembers the template to be inlined inside the InlineRule. No simplification or inlining is done on an InlineRule, just like RULEs. An Id can have an InlineRule *or* a CoreUnfolding (since these are two constructors from Unfolding). The simplifier treats them differently: - An InlineRule is has the substitution applied (like RULES) but is otherwise left undisturbed. - A CoreUnfolding is updated with the new RHS of the definition, on each iteration of the simplifier. An InlineRule fires regardless of size, but *only* when the function is applied to enough arguments. The "arity" of the rule is specified (by the programmer) as the number of args on the LHS of the "=". So it makes a difference whether you say {-# INLINE f #-} f x = \y -> e or f x y = e This is one of the big new features that InlineRule gives us, and it is one that Roman really wanted. In contrast, a CoreUnfolding can fire when it is applied to fewer args than than the function has lambdas, provided the result is small enough. Consequential stuff ~~~~~~~~~~~~~~~~~~~ * A 'wrapper' no longer has a WrapperInfo in the IdInfo. Instead, the InlineRule has a field identifying wrappers. * Of course, IfaceSyn and interface serialisation changes appropriately. * Making implication constraints inline nicely was a bit fiddly. In the end I added a var_inline field to HsBInd.VarBind, which is why this patch affects the type checker slightly * I made some changes to the way in which eta expansion happens in CorePrep, mainly to ensure that *arguments* that become let-bound are also eta-expanded. I'm still not too happy with the clarity and robustness fo the result. * We now complain if the programmer gives an INLINE pragma for a recursive function (prevsiously we just ignored it). Reason for change: we don't want an InlineRule on a LoopBreaker, because then we'd have to check for loop-breaker-hood at occurrence sites (which isn't currenlty done). Some tests need changing as a result. This patch has been in my tree for quite a while, so there are probably some other minor changes.
* Add (a) CoreM monad, (b) new Annotations featuresimonpj@microsoft.com2008-10-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch, written by Max Bolingbroke, does two things 1. It adds a new CoreM monad (defined in simplCore/CoreMonad), which is used as the top-level monad for all the Core-to-Core transformations (starting at SimplCore). It supports * I/O (for debug printing) * Unique supply * Statistics gathering * Access to the HscEnv, RuleBase, Annotations, Module The patch therefore refactors the top "skin" of every Core-to-Core pass, but does not change their functionality. 2. It adds a completely new facility to GHC: Core "annotations". The idea is that you can say {#- ANN foo (Just "Hello") #-} which adds the annotation (Just "Hello") to the top level function foo. These annotations can be looked up in any Core-to-Core pass, and are persisted into interface files. (Hence a Core-to-Core pass can also query the annotations of imported things.) Furthermore, a Core-to-Core pass can add new annotations (eg strictness info) of its own, which can be queried by importing modules. The design of the annotation system is somewhat in flux. It's designed to work with the (upcoming) dynamic plug-ins mechanism, but is meanwhile independently useful. Do not merge to 6.10!
* Don't float an expression wrapped in a castsimonpj@microsoft.com2008-10-211-0/+7
| | | | | | | | | | There is no point in floating out an expression wrapped in a coercion; If we do we'll transform lvl = e |> co [_$_] to lvl' = e; lvl = lvl' |> co and then inline lvl. Better just to float out the payload (e).