| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is the result of Ilya Sergey's internship at MSR. It
constitutes a thorough overhaul and simplification of the demand
analyser. It makes a solid foundation on which we can now build.
Main changes are
* Instead of having one combined type for Demand, a Demand is
now a pair (JointDmd) of
- a StrDmd and
- an AbsDmd.
This allows strictness and absence to be though about quite
orthogonally, and greatly reduces brain melt-down.
* Similarly in the DmdResult type, it's a pair of
- a PureResult (indicating only divergence/non-divergence)
- a CPRResult (which deals only with the CPR property
* In IdInfo, the
strictnessInfo field contains a StrictSig, not a Maybe StrictSig
demandInfo field contains a Demand, not a Maybe Demand
We don't need Nothing (to indicate no strictness/demand info)
any more; topSig/topDmd will do.
* Remove "boxity" analysis entirely. This was an attempt to
avoid "reboxing", but it added complexity, is extremely
ad-hoc, and makes very little difference in practice.
* Remove the "unboxing strategy" computation. This was an an
attempt to ensure that a worker didn't get zillions of
arguments by unboxing big tuples. But in fact removing it
DRAMATICALLY reduces allocation in an inner loop of the
I/O library (where the threshold argument-count had been
set just too low). It's exceptional to have a zillion arguments
and I don't think it's worth the complexity, especially since
it turned out to have a serious performance hit.
* Remove quite a bit of ad-hoc cruft
* Move worthSplittingFun, worthSplittingThunk from WorkWrap to
Demand. This allows JointDmd to be fully abstract, examined
only inside Demand.
Everything else really follows from these changes.
All of this is really just refactoring, so we don't expect
big performance changes, but acutally the numbers look quite
good. Here is a full nofib run with some highlights identified:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
expert -2.6% -15.5% 0.00 0.00 +0.0%
fluid -2.4% -7.1% 0.01 0.01 +0.0%
gg -2.5% -28.9% 0.02 0.02 -33.3%
integrate -2.6% +3.2% +2.6% +2.6% +0.0%
mandel2 -2.6% +4.2% 0.01 0.01 +0.0%
nucleic2 -2.0% -16.3% 0.11 0.11 +0.0%
para -2.6% -20.0% -11.8% -11.7% +0.0%
parser -2.5% -17.9% 0.05 0.05 +0.0%
prolog -2.6% -13.0% 0.00 0.00 +0.0%
puzzle -2.6% +2.2% +0.8% +0.8% +0.0%
sorting -2.6% -35.9% 0.00 0.00 +0.0%
treejoin -2.6% -52.2% -9.8% -9.9% +0.0%
--------------------------------------------------------------------------------
Min -2.7% -52.2% -11.8% -11.7% -33.3%
Max -1.8% +4.2% +10.5% +10.5% +7.7%
Geometric Mean -2.5% -2.8% -0.4% -0.5% -0.4%
Things to note
* Binary sizes are smaller. I don't know why, but it's good.
* Allocation is sometiemes a *lot* smaller. I believe that all the big numbers
(I checked treejoin, gg, sorting) arise from one place, namely a function
GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of
which have several arugments. Not w/w'ing both arguments (which is what
we did before) has a big effect. So the big win in actually somewhat
accidental, gained by removing the "unboxing strategy" code.
* A couple of benchmarks allocate slightly more. This turns out
to be due to reboxing (integrate). But the biggest increase is
mandel2, and *that* turned out also to be a somewhat accidental
loss of CSE, and pointed the way to doing better CSE: see Trac
#7596.
* Runtimes are never very reliable, but seem to improve very slightly.
All in all, a good piece of work. Thank you Ilya!
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were being inconsistent about how we tested whether dump flags
were enabled; in particular, sometimes we also checked the verbosity,
and sometimes we didn't.
This lead to oddities such as "ghc -v4" printing an "Asm code" section
which didn't contain any code, and "-v4" enabled some parts of
"-ddump-deriv" but not others.
Now all the tests use dopt, which also takes the verbosity into account
as appropriate.
|
|
|
|
|
| |
This avoids confusion due to [DynFlag] and DynFlags being completely
different types.
|
|
|
|
| |
Thanks to Peter Wortmann for pointing out this bug.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two changes here
* The main change here is to enhance the FloatIn pass so that it can
float case-bindings inwards. In particular the case bindings for
array indexing.
* Also change the code in Simplify, to allow a case on array
indexing (ie can_fail is true) to be discarded altogether if its
results are unused.
Lots of new comments in PrimOp about can_fail and has_side_effects
Some refactoring to share the FloatBind data structure between
FloatIn and FloatOut
|
| |
|
|
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User visible changes
====================
Profilng
--------
Flags renamed (the old ones are still accepted for now):
OLD NEW
--------- ------------
-auto-all -fprof-auto
-auto -fprof-exported
-caf-all -fprof-cafs
New flags:
-fprof-auto Annotates all bindings (not just top-level
ones) with SCCs
-fprof-top Annotates just top-level bindings with SCCs
-fprof-exported Annotates just exported bindings with SCCs
-fprof-no-count-entries Do not maintain entry counts when profiling
(can make profiled code go faster; useful with
heap profiling where entry counts are not used)
Cost-centre stacks have a new semantics, which should in most cases
result in more useful and intuitive profiles. If you find this not to
be the case, please let me know. This is the area where I have been
experimenting most, and the current solution is probably not the
final version, however it does address all the outstanding bugs and
seems to be better than GHC 7.2.
Stack traces
------------
+RTS -xc now gives more information. If the exception originates from
a CAF (as is common, because GHC tends to lift exceptions out to the
top-level), then the RTS walks up the stack and reports the stack in
the enclosing update frame(s).
Result: +RTS -xc is much more useful now - but you still have to
compile for profiling to get it. I've played around a little with
adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
quite accurately.
I plan to add more facilities for stack tracing (e.g. in GHCi) in the
future.
Coverage (HPC)
--------------
* derived instances are now coloured yellow if they weren't used
* likewise record field names
* entry counts are more accurate (hpc --fun-entry-count)
* tab width is now correct (markup was previously off in source with
tabs)
Internal changes
================
In Core, the Note constructor has been replaced by
Tick (Tickish b) (Expr b)
which is used to represent all the kinds of source annotation we
support: profiling SCCs, HPC ticks, and GHCi breakpoints.
Depending on the properties of the Tickish, different transformations
apply to Tick. See CoreUtils.mkTick for details.
Tickets
=======
This commit closes the following tickets, test cases to follow:
- Close #2552: not a bug, but the behaviour is now more intuitive
(test is T2552)
- Close #680 (test is T680)
- Close #1531 (test is result001)
- Close #949 (test is T949)
- Close #2466: test case has bitrotted (doesn't compile against current
version of vector-space package)
|
|
|
|
|
|
|
|
|
|
|
| |
and comment its invariants in Note [CoreProgram] in CoreSyn
I'm not totally convinced that CoreProgram is the right name
(perhaps CoreTopBinds might better), but it is useful to have
a clue that you are looking at the top-level bindings.
This is only a matter of a type synonym change; no deep
refactoring here.
|
|
|
|
|
|
|
|
|
|
| |
The problem is documented in the ticket. The patch
does two things
1. Make exprOkForSpeculation return False for a non-exhaustive case
2. In SetLevels.lvlExpr, look at the *result* scrutinee, not the
*input* scrutinee, when testing for evaluated-ness
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes Trac #5341 and #5342. The question is about
what to do when floating out of the RHS of a Rec-bound
function, when there's a FloatCase involved. For FloatLets
they can join the Rec block, but FloatCases can't. But
we don't want to mess with the arity (that was the bug).
So in this (rather exotic case) we push the FloatCase
back inside any lambdas.
See Note [Floating out of Rec rhss]. It's a slightly ugly fix, but I
can't think of anything better, and I don't think it has any practical
impact.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two things in this patch. First, a new feature.
Given (case x of I# y -> ...)
where 'x' is known to be evaluated, the float-out pass
will float the case outwards towards x's binding. Of
course this doesn't happen if 'x' is evaluated because
of an enclosing case (becuase then the inner case would
be eliminated) but it *does* happen when x is bound by
a constructor with a strict field. This happens in DPH.
Trac #4081.
The second change is a significant refactoring of the
way the let-floater works. Now SetLevels makes a decision
about whether the let (or case) will move, and records
that decision in the FloatSpec flag. This change makes
the whole caboodle much easier to think about.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See the paper "Practical aspects of evidence based compilation in System FC"
* Coercion becomes a data type, distinct from Type
* Coercions become value-level things, rather than type-level things,
(although the value is zero bits wide, like the State token)
A consequence is that a coerion abstraction increases the arity by 1
(just like a dictionary abstraction)
* There is a new constructor in CoreExpr, namely Coercion, to inject
coercions into terms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were failing to float out a binding that could be floated,
because of a confusion in the Lam case of floatExpr.
In investigating this I also discoverd that there is really
no point at all in giving a different level to variables in
a binding group, so I've now given them all the same (in
SetLevels.lvlLamBndrs
The overall difference is quite minor in a nofib run:
Program Size Allocs Runtime Elapsed
-------------------------------------------------------------
Min +0.0% -8.5% -28.4% -28.7%
Max +0.0% +0.7% -0.7% -1.1%
Geometric Mean +0.0% -0.0% -11.6% -11.8%
I don't trust those runtimes, but smaller is good! The 8.5%
improvement in allocation in fulsom, and seems real. The
0.7% allocation increase only happens in programs with
very small allocation. I tracked one down to a call of this form
GHC.IO.Handle.Internals.mkDuplexHandle5
= \ args -> GHC.IO.Handle.Internals.openTextEncoding1
mb_codec ha_type
(\mb_encoder mb_decoder -> blah)
With the new floater the argument of openTextEncoding1 becomes
(let lvl = .. in \mb_encoder mb_decoder -> blah)
And rightly so. However in fact this argument is a continuation
and hence is called once, so the floating is fruitless.
Roll on one-shot-function analysis (which I know how to do
but fail to get to!).
|
|
|
|
|
|
|
|
|
| |
This major patch implements the new OutsideIn constraint solving
algorithm in the typecheker, following our JFP paper "Modular type
inference with local assumptions".
Done with major help from Dimitrios Vytiniotis and Brent Yorgey.
|
|
|
|
|
|
|
|
| |
The problem was that a strict binding was getting floated
out into a letrec. This only happened when profiling was
on. It exposed a fragility in the floating strategy. This
patch makes it more robust. See
Note [Avoiding unnecessary floating]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves a lot of code around, but has zero functionality change.
The idea is that the types
CoreToDo
SimplifierSwitch
SimplifierMode
FloatOutSwitches
and
the main core-to-core pipeline construction
belong in simplCore/, and *not* in DynFlags.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to float out bottoming expressions to top level,
abstracting them over any variables they mention, if necessary. This
is good because it makes functions smaller (and more likely to
inline), by keeping error code out of line.
See Note [Bottoming floats] in SetLevels.
On the way, this fixes the HPC failures for cg059 and friends.
I've been meaning to do this for some time. See Maessen's paper 1999
"Bottom extraction: factoring error handling out of functional
programs" (unpublished I think).
Here are the nofib results:
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
Min +0.1% -7.8% -14.4% -32.5%
Max +0.5% +0.2% +1.6% +13.8%
Geometric Mean +0.4% -0.2% -4.9% -6.7%
Module sizes
-1 s.d. ----- -2.6%
+1 s.d. ----- +2.3%
Average ----- -0.2%
Compile times:
-1 s.d. ----- -11.4%
+1 s.d. ----- +4.3%
Average ----- -3.8%
I'm think program sizes have crept up because the base library
is bigger -- module sizes in nofib decrease very slightly. In turn
I think that may be because the floating generates a call where
there was no call before. Anyway I think it's acceptable.
The main changes are:
* SetLevels floats out things that exprBotStrictness_maybe
identifies as bottom. Make sure to pin on the right
strictness info to the newly created Ids, so that the
info ends up in interface files.
Since FloatOut is run twice, we have to be careful that we
don't treat the function created by the first float-out as
a candidate for the second; this is what worthFloating does.
See SetLevels Note [Bottoming floats]
Note [Bottoming floats: eta expansion]
* Be careful not to inline top-level bottoming functions; this
would just undo what the floating transformation achieves.
See CoreUnfold Note [Do not inline top-level bottoming functions
Ensuring this requires a bit of extra plumbing, but nothing drastic..
* Similarly pre/postInlineUnconditionally should be
careful not to re-inline top-level bottoming things!
See SimplUtils Note [Top-level botomming Ids]
Note [Top level and postInlineUnconditionally]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The effect was that, in deeply-nested applications, FloatOut would
take quadratic time. A good example was compiling
programs/barton-mangler-bug/Expected.hs
in which FloatOut had a visible pause of a couple of seconds!
Profiling showed that 40% of the entire compile time was being
consumbed by the single function partitionByMajorLevel.
The bug was that the floating bindings (type FloatBinds) was kept
as a list, which was partitioned at each binding site. In programs
with deeply nested lists, such as
e1 : e2 : e3 : .... : e5000 : []
this led to quadratic behaviour.
The solution is to use a proper finite-map representation;
see the new definition of FloatBinds near the bottom of FloatOut.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch has been a long time in gestation and has, as a
result, accumulated some extra bits and bobs that are only
loosely related. I separated the bits that are easy to split
off, but the rest comes as one big patch, I'm afraid.
Note that:
* It comes together with a patch to the 'base' library
* Interface file formats change slightly, so you need to
recompile all libraries
The patch is mainly giant tidy-up, driven in part by the
particular stresses of the Data Parallel Haskell project. I don't
expect a big performance win for random programs. Still, here are the
nofib results, relative to the state of affairs without the patch
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
Min -12.7% -14.5% -17.5% -17.8%
Max +4.7% +10.9% +9.1% +8.4%
Geometric Mean +0.9% -0.1% -5.6% -7.3%
The +10.9% allocation outlier is rewrite, which happens to have a
very delicate optimisation opportunity involving an interaction
of CSE and inlining (see nofib/Simon-nofib-notes). The fact that
the 'before' case found the optimisation is somewhat accidental.
Runtimes seem to go down, but I never kno wwhether to really trust
this number. Binary sizes wobble a bit, but nothing drastic.
The Main Ideas are as follows.
InlineRules
~~~~~~~~~~~
When you say
{-# INLINE f #-}
f x = <rhs>
you intend that calls (f e) are replaced by <rhs>[e/x] So we
should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle
with it. Meanwhile, we can optimise <rhs> to our heart's content,
leaving the original unfolding intact in Unfolding of 'f'.
So the representation of an Unfolding has changed quite a bit
(see CoreSyn). An INLINE pragma gives rise to an InlineRule
unfolding.
Moreover, it's only used when 'f' is applied to the
specified number of arguments; that is, the number of argument on
the LHS of the '=' sign in the original source definition.
For example, (.) is now defined in the libraries like this
{-# INLINE (.) #-}
(.) f g = \x -> f (g x)
so that it'll inline when applied to two arguments. If 'x' appeared
on the left, thus
(.) f g x = f (g x)
it'd only inline when applied to three arguments. This slightly-experimental
change was requested by Roman, but it seems to make sense.
Other associated changes
* Moving the deck chairs in DsBinds, which processes the INLINE pragmas
* In the old system an INLINE pragma made the RHS look like
(Note InlineMe <rhs>)
The Note switched off optimisation in <rhs>. But it was quite
fragile in corner cases. The new system is more robust, I believe.
In any case, the InlineMe note has disappeared
* The workerInfo of an Id has also been combined into its Unfolding,
so it's no longer a separate field of the IdInfo.
* Many changes in CoreUnfold, esp in callSiteInline, which is the critical
function that decides which function to inline. Lots of comments added!
* exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly
associated with "does this expression unfold to a constructor application".
It can now do some limited beta reduction too, which Roman found
was an important.
Instance declarations
~~~~~~~~~~~~~~~~~~~~~
It's always been tricky to get the dfuns generated from instance
declarations to work out well. This is particularly important in
the Data Parallel Haskell project, and I'm now on my fourth attempt,
more or less.
There is a detailed description in TcInstDcls, particularly in
Note [How instance declarations are translated]. Roughly speaking
we now generate a top-level helper function for every method definition
in an instance declaration, so that the dfun takes a particularly
stylised form:
dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc...
In fact, it's *so* stylised that we never need to unfold a dfun.
Instead ClassOps have a special rewrite rule that allows us to
short-cut dictionary selection. Suppose dfun :: Ord a -> Ord [a]
d :: Ord a
Then
compare (dfun a d) --> compare_list a d
in one rewrite, without first inlining the 'compare' selector
and the body of the dfun.
To support this
a) ClassOps have a BuiltInRule (see MkId.dictSelRule)
b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding)
which is exploited in CoreUnfold.exprIsConApp_maybe
Implmenting all this required a root-and-branch rework of TcInstDcls
and bits of TcClassDcl.
Default methods
~~~~~~~~~~~~~~~
If you give an INLINE pragma to a default method, it should be just
as if you'd written out that code in each instance declaration, including
the INLINE pragma. I think that it now *is* so. As a result, library
code can be simpler; less duplication.
The CONLIKE pragma
~~~~~~~~~~~~~~~~~~
In the DPH project, Roman found cases where he had
p n k = let x = replicate n k
in ...(f x)...(g x)....
{-# RULE f (replicate x) = f_rep x #-}
Normally the RULE would not fire, because doing so involves
(in effect) duplicating the redex (replicate n k). A new
experimental modifier to the INLINE pragma, {-# INLINE CONLIKE
replicate #-}, allows you to tell GHC to be prepared to duplicate
a call of this function if it allows a RULE to fire.
See Note [CONLIKE pragma] in BasicTypes
Join points
~~~~~~~~~~~
See Note [Case binders and join points] in Simplify
Other refactoring
~~~~~~~~~~~~~~~~~
* I moved endPass from CoreLint to CoreMonad, with associated jigglings
* Better pretty-printing of Core
* The top-level RULES (ones that are not rules for locally-defined things)
are now substituted on every simplifier iteration. I'm not sure how
we got away without doing this before. This entails a bit more plumbing
in SimplCore.
* The necessary stuff to serialise and deserialise the new
info across interface files.
* Something about bottoming floats in SetLevels
Note [Bottoming floats]
* substUnfolding has moved from SimplEnv to CoreSubs, where it belongs
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
anna +2.4% -0.5% 0.16 0.17
ansi +2.6% -0.1% 0.00 0.00
atom -3.8% -0.0% -1.0% -2.5%
awards +3.0% +0.7% 0.00 0.00
banner +3.3% -0.0% 0.00 0.00
bernouilli +2.7% +0.0% -4.6% -6.9%
boyer +2.6% +0.0% 0.06 0.07
boyer2 +4.4% +0.2% 0.01 0.01
bspt +3.2% +9.6% 0.02 0.02
cacheprof +1.4% -1.0% -12.2% -13.6%
calendar +2.7% -1.7% 0.00 0.00
cichelli +3.7% -0.0% 0.13 0.14
circsim +3.3% +0.0% -2.3% -9.9%
clausify +2.7% +0.0% 0.05 0.06
comp_lab_zift +2.6% -0.3% -7.2% -7.9%
compress +3.3% +0.0% -8.5% -9.6%
compress2 +3.6% +0.0% -15.1% -17.8%
constraints +2.7% -0.6% -10.0% -10.7%
cryptarithm1 +4.5% +0.0% -4.7% -5.7%
cryptarithm2 +4.3% -14.5% 0.02 0.02
cse +4.4% -0.0% 0.00 0.00
eliza +2.8% -0.1% 0.00 0.00
event +2.6% -0.0% -4.9% -4.4%
exp3_8 +2.8% +0.0% -4.5% -9.5%
expert +2.7% +0.3% 0.00 0.00
fem -2.0% +0.6% 0.04 0.04
fft -6.0% +1.8% 0.05 0.06
fft2 -4.8% +2.7% 0.13 0.14
fibheaps +2.6% -0.6% 0.05 0.05
fish +4.1% +0.0% 0.03 0.04
fluid -2.1% -0.2% 0.01 0.01
fulsom -4.8% +9.2% +9.1% +8.4%
gamteb -7.1% -1.3% 0.10 0.11
gcd +2.7% +0.0% 0.05 0.05
gen_regexps +3.9% -0.0% 0.00 0.00
genfft +2.7% -0.1% 0.05 0.06
gg -2.7% -0.1% 0.02 0.02
grep +3.2% -0.0% 0.00 0.00
hidden -0.5% +0.0% -11.9% -13.3%
hpg -3.0% -1.8% +0.0% -2.4%
ida +2.6% -1.2% 0.17 -9.0%
infer +1.7% -0.8% 0.08 0.09
integer +2.5% -0.0% -2.6% -2.2%
integrate -5.0% +0.0% -1.3% -2.9%
knights +4.3% -1.5% 0.01 0.01
lcss +2.5% -0.1% -7.5% -9.4%
life +4.2% +0.0% -3.1% -3.3%
lift +2.4% -3.2% 0.00 0.00
listcompr +4.0% -1.6% 0.16 0.17
listcopy +4.0% -1.4% 0.17 0.18
maillist +4.1% +0.1% 0.09 0.14
mandel +2.9% +0.0% 0.11 0.12
mandel2 +4.7% +0.0% 0.01 0.01
minimax +3.8% -0.0% 0.00 0.00
mkhprog +3.2% -4.2% 0.00 0.00
multiplier +2.5% -0.4% +0.7% -1.3%
nucleic2 -9.3% +0.0% 0.10 0.10
para +2.9% +0.1% -0.7% -1.2%
paraffins -10.4% +0.0% 0.20 -1.9%
parser +3.1% -0.0% 0.05 0.05
parstof +1.9% -0.0% 0.00 0.01
pic -2.8% -0.8% 0.01 0.02
power +2.1% +0.1% -8.5% -9.0%
pretty -12.7% +0.1% 0.00 0.00
primes +2.8% +0.0% 0.11 0.11
primetest +2.5% -0.0% -2.1% -3.1%
prolog +3.2% -7.2% 0.00 0.00
puzzle +4.1% +0.0% -3.5% -8.0%
queens +2.8% +0.0% 0.03 0.03
reptile +2.2% -2.2% 0.02 0.02
rewrite +3.1% +10.9% 0.03 0.03
rfib -5.2% +0.2% 0.03 0.03
rsa +2.6% +0.0% 0.05 0.06
scc +4.6% +0.4% 0.00 0.00
sched +2.7% +0.1% 0.03 0.03
scs -2.6% -0.9% -9.6% -11.6%
simple -4.0% +0.4% -14.6% -14.9%
solid -5.6% -0.6% -9.3% -14.3%
sorting +3.8% +0.0% 0.00 0.00
sphere -3.6% +8.5% 0.15 0.16
symalg -1.3% +0.2% 0.03 0.03
tak +2.7% +0.0% 0.02 0.02
transform +2.0% -2.9% -8.0% -8.8%
treejoin +3.1% +0.0% -17.5% -17.8%
typecheck +2.9% -0.3% -4.6% -6.6%
veritas +3.9% -0.3% 0.00 0.00
wang -6.2% +0.0% 0.18 -9.8%
wave4main -10.3% +2.6% -2.1% -2.3%
wheel-sieve1 +2.7% -0.0% +0.3% -0.6%
wheel-sieve2 +2.7% +0.0% -3.7% -7.5%
x2n1 -4.1% +0.1% 0.03 0.04
--------------------------------------------------------------------------------
Min -12.7% -14.5% -17.5% -17.8%
Max +4.7% +10.9% +9.1% +8.4%
Geometric Mean +0.9% -0.1% -5.6% -7.3%
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rolling back:
Fri Dec 5 16:54:00 GMT 2008 simonpj@microsoft.com
* Completely new treatment of INLINE pragmas (big patch)
This is a major patch, which changes the way INLINE pragmas work.
Although lots of files are touched, the net is only +21 lines of
code -- and I bet that most of those are comments!
HEADS UP: interface file format has changed, so you'll need to
recompile everything.
There is not much effect on overall performance for nofib,
probably because those programs don't make heavy use of INLINE pragmas.
Program Size Allocs Runtime Elapsed
Min -11.3% -6.9% -9.2% -8.2%
Max -0.1% +4.6% +7.5% +8.9%
Geometric Mean -2.2% -0.2% -1.0% -0.8%
(The +4.6% for on allocs is cichelli; see other patch relating to
-fpass-case-bndr-to-join-points.)
The old INLINE system
~~~~~~~~~~~~~~~~~~~~~
The old system worked like this. A function with an INLINE pragam
got a right-hand side which looked like
f = __inline_me__ (\xy. e)
The __inline_me__ part was an InlineNote, and was treated specially
in various ways. Notably, the simplifier didn't inline inside an
__inline_me__ note.
As a result, the code for f itself was pretty crappy. That matters
if you say (map f xs), because then you execute the code for f,
rather than inlining a copy at the call site.
The new story: InlineRules
~~~~~~~~~~~~~~~~~~~~~~~~~~
The new system removes the InlineMe Note altogether. Instead there
is a new constructor InlineRule in CoreSyn.Unfolding. This is a
bit like a RULE, in that it remembers the template to be inlined inside
the InlineRule. No simplification or inlining is done on an InlineRule,
just like RULEs.
An Id can have an InlineRule *or* a CoreUnfolding (since these are two
constructors from Unfolding). The simplifier treats them differently:
- An InlineRule is has the substitution applied (like RULES) but
is otherwise left undisturbed.
- A CoreUnfolding is updated with the new RHS of the definition,
on each iteration of the simplifier.
An InlineRule fires regardless of size, but *only* when the function
is applied to enough arguments. The "arity" of the rule is specified
(by the programmer) as the number of args on the LHS of the "=". So
it makes a difference whether you say
{-# INLINE f #-}
f x = \y -> e or f x y = e
This is one of the big new features that InlineRule gives us, and it
is one that Roman really wanted.
In contrast, a CoreUnfolding can fire when it is applied to fewer
args than than the function has lambdas, provided the result is small
enough.
Consequential stuff
~~~~~~~~~~~~~~~~~~~
* A 'wrapper' no longer has a WrapperInfo in the IdInfo. Instead,
the InlineRule has a field identifying wrappers.
* Of course, IfaceSyn and interface serialisation changes appropriately.
* Making implication constraints inline nicely was a bit fiddly. In
the end I added a var_inline field to HsBInd.VarBind, which is why
this patch affects the type checker slightly
* I made some changes to the way in which eta expansion happens in
CorePrep, mainly to ensure that *arguments* that become let-bound
are also eta-expanded. I'm still not too happy with the clarity
and robustness fo the result.
* We now complain if the programmer gives an INLINE pragma for
a recursive function (prevsiously we just ignored it). Reason for
change: we don't want an InlineRule on a LoopBreaker, because then
we'd have to check for loop-breaker-hood at occurrence sites (which
isn't currenlty done). Some tests need changing as a result.
This patch has been in my tree for quite a while, so there are
probably some other minor changes.
M ./compiler/basicTypes/Id.lhs -11
M ./compiler/basicTypes/IdInfo.lhs -82
M ./compiler/basicTypes/MkId.lhs -2 +2
M ./compiler/coreSyn/CoreFVs.lhs -2 +25
M ./compiler/coreSyn/CoreLint.lhs -5 +1
M ./compiler/coreSyn/CorePrep.lhs -59 +53
M ./compiler/coreSyn/CoreSubst.lhs -22 +31
M ./compiler/coreSyn/CoreSyn.lhs -66 +92
M ./compiler/coreSyn/CoreUnfold.lhs -112 +112
M ./compiler/coreSyn/CoreUtils.lhs -185 +184
M ./compiler/coreSyn/MkExternalCore.lhs -1
M ./compiler/coreSyn/PprCore.lhs -4 +40
M ./compiler/deSugar/DsBinds.lhs -70 +118
M ./compiler/deSugar/DsForeign.lhs -2 +4
M ./compiler/deSugar/DsMeta.hs -4 +3
M ./compiler/hsSyn/HsBinds.lhs -3 +3
M ./compiler/hsSyn/HsUtils.lhs -2 +7
M ./compiler/iface/BinIface.hs -11 +25
M ./compiler/iface/IfaceSyn.lhs -13 +21
M ./compiler/iface/MkIface.lhs -24 +19
M ./compiler/iface/TcIface.lhs -29 +23
M ./compiler/main/TidyPgm.lhs -55 +49
M ./compiler/parser/ParserCore.y -5 +6
M ./compiler/simplCore/CSE.lhs -2 +1
M ./compiler/simplCore/FloatIn.lhs -6 +1
M ./compiler/simplCore/FloatOut.lhs -23
M ./compiler/simplCore/OccurAnal.lhs -36 +5
M ./compiler/simplCore/SetLevels.lhs -59 +54
M ./compiler/simplCore/SimplCore.lhs -48 +52
M ./compiler/simplCore/SimplEnv.lhs -26 +22
M ./compiler/simplCore/SimplUtils.lhs -28 +4
M ./compiler/simplCore/Simplify.lhs -91 +109
M ./compiler/specialise/Specialise.lhs -15 +18
M ./compiler/stranal/WorkWrap.lhs -14 +11
M ./compiler/stranal/WwLib.lhs -2 +2
M ./compiler/typecheck/Inst.lhs -1 +3
M ./compiler/typecheck/TcBinds.lhs -17 +27
M ./compiler/typecheck/TcClassDcl.lhs -1 +2
M ./compiler/typecheck/TcExpr.lhs -4 +6
M ./compiler/typecheck/TcForeign.lhs -1 +1
M ./compiler/typecheck/TcGenDeriv.lhs -14 +13
M ./compiler/typecheck/TcHsSyn.lhs -3 +2
M ./compiler/typecheck/TcInstDcls.lhs -5 +4
M ./compiler/typecheck/TcRnDriver.lhs -2 +11
M ./compiler/typecheck/TcSimplify.lhs -10 +17
M ./compiler/vectorise/VectType.hs +7
Mon Dec 8 12:43:10 GMT 2008 simonpj@microsoft.com
* White space only
M ./compiler/simplCore/Simplify.lhs -2
Mon Dec 8 12:48:40 GMT 2008 simonpj@microsoft.com
* Move simpleOptExpr from CoreUnfold to CoreSubst
M ./compiler/coreSyn/CoreSubst.lhs -1 +87
M ./compiler/coreSyn/CoreUnfold.lhs -72 +1
Mon Dec 8 17:30:18 GMT 2008 simonpj@microsoft.com
* Use CoreSubst.simpleOptExpr in place of the ad-hoc simpleSubst (reduces code too)
M ./compiler/deSugar/DsBinds.lhs -50 +16
Tue Dec 9 17:03:02 GMT 2008 simonpj@microsoft.com
* Fix Trac #2861: bogus eta expansion
Urghlhl! I "tided up" the treatment of the "state hack" in CoreUtils, but
missed an unexpected interaction with the way that a bottoming function
simply swallows excess arguments. There's a long
Note [State hack and bottoming functions]
to explain (which accounts for most of the new lines of code).
M ./compiler/coreSyn/CoreUtils.lhs -16 +53
Mon Dec 15 10:02:21 GMT 2008 Simon Marlow <marlowsd@gmail.com>
* Revert CorePrep part of "Completely new treatment of INLINE pragmas..."
The original patch said:
* I made some changes to the way in which eta expansion happens in
CorePrep, mainly to ensure that *arguments* that become let-bound
are also eta-expanded. I'm still not too happy with the clarity
and robustness fo the result.
Unfortunately this change apparently broke some invariants that were
relied on elsewhere, and in particular lead to panics when compiling
with profiling on.
Will re-investigate in the new year.
M ./compiler/coreSyn/CorePrep.lhs -53 +58
M ./configure.ac -1 +1
Mon Dec 15 12:28:51 GMT 2008 Simon Marlow <marlowsd@gmail.com>
* revert accidental change to configure.ac
M ./configure.ac -1 +1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a major patch, which changes the way INLINE pragmas work.
Although lots of files are touched, the net is only +21 lines of
code -- and I bet that most of those are comments!
HEADS UP: interface file format has changed, so you'll need to
recompile everything.
There is not much effect on overall performance for nofib,
probably because those programs don't make heavy use of INLINE pragmas.
Program Size Allocs Runtime Elapsed
Min -11.3% -6.9% -9.2% -8.2%
Max -0.1% +4.6% +7.5% +8.9%
Geometric Mean -2.2% -0.2% -1.0% -0.8%
(The +4.6% for on allocs is cichelli; see other patch relating to
-fpass-case-bndr-to-join-points.)
The old INLINE system
~~~~~~~~~~~~~~~~~~~~~
The old system worked like this. A function with an INLINE pragam
got a right-hand side which looked like
f = __inline_me__ (\xy. e)
The __inline_me__ part was an InlineNote, and was treated specially
in various ways. Notably, the simplifier didn't inline inside an
__inline_me__ note.
As a result, the code for f itself was pretty crappy. That matters
if you say (map f xs), because then you execute the code for f,
rather than inlining a copy at the call site.
The new story: InlineRules
~~~~~~~~~~~~~~~~~~~~~~~~~~
The new system removes the InlineMe Note altogether. Instead there
is a new constructor InlineRule in CoreSyn.Unfolding. This is a
bit like a RULE, in that it remembers the template to be inlined inside
the InlineRule. No simplification or inlining is done on an InlineRule,
just like RULEs.
An Id can have an InlineRule *or* a CoreUnfolding (since these are two
constructors from Unfolding). The simplifier treats them differently:
- An InlineRule is has the substitution applied (like RULES) but
is otherwise left undisturbed.
- A CoreUnfolding is updated with the new RHS of the definition,
on each iteration of the simplifier.
An InlineRule fires regardless of size, but *only* when the function
is applied to enough arguments. The "arity" of the rule is specified
(by the programmer) as the number of args on the LHS of the "=". So
it makes a difference whether you say
{-# INLINE f #-}
f x = \y -> e or f x y = e
This is one of the big new features that InlineRule gives us, and it
is one that Roman really wanted.
In contrast, a CoreUnfolding can fire when it is applied to fewer
args than than the function has lambdas, provided the result is small
enough.
Consequential stuff
~~~~~~~~~~~~~~~~~~~
* A 'wrapper' no longer has a WrapperInfo in the IdInfo. Instead,
the InlineRule has a field identifying wrappers.
* Of course, IfaceSyn and interface serialisation changes appropriately.
* Making implication constraints inline nicely was a bit fiddly. In
the end I added a var_inline field to HsBInd.VarBind, which is why
this patch affects the type checker slightly
* I made some changes to the way in which eta expansion happens in
CorePrep, mainly to ensure that *arguments* that become let-bound
are also eta-expanded. I'm still not too happy with the clarity
and robustness fo the result.
* We now complain if the programmer gives an INLINE pragma for
a recursive function (prevsiously we just ignored it). Reason for
change: we don't want an InlineRule on a LoopBreaker, because then
we'd have to check for loop-breaker-hood at occurrence sites (which
isn't currenlty done). Some tests need changing as a result.
This patch has been in my tree for quite a while, so there are
probably some other minor changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch, written by Max Bolingbroke, does two things
1. It adds a new CoreM monad (defined in simplCore/CoreMonad),
which is used as the top-level monad for all the Core-to-Core
transformations (starting at SimplCore). It supports
* I/O (for debug printing)
* Unique supply
* Statistics gathering
* Access to the HscEnv, RuleBase, Annotations, Module
The patch therefore refactors the top "skin" of every Core-to-Core
pass, but does not change their functionality.
2. It adds a completely new facility to GHC: Core "annotations".
The idea is that you can say
{#- ANN foo (Just "Hello") #-}
which adds the annotation (Just "Hello") to the top level function
foo. These annotations can be looked up in any Core-to-Core pass,
and are persisted into interface files. (Hence a Core-to-Core pass
can also query the annotations of imported things.) Furthermore,
a Core-to-Core pass can add new annotations (eg strictness info)
of its own, which can be queried by importing modules.
The design of the annotation system is somewhat in flux. It's
designed to work with the (upcoming) dynamic plug-ins mechanism,
but is meanwhile independently useful.
Do not merge to 6.10!
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We really should not float anything out of an _inline_me_ Note,
for reasons described in this new comment:
-- Do no floating at all inside INLINE.
-- The SetLevels pass did not clone the bindings, so it's
-- unsafe to do any floating, even if we dump the results
-- inside the Note (which is what we used to do).
I'm about to get rid of these _inline_me_ Notes, but it's
better to fix it anyway. I found this bug when implementing System IF.
|
|
|
|
| |
Modules that need it import it themselves instead.
|
|
|
|
|
|
|
|
| |
I've gotten this wrong more than once. Hopefully this has it nailed.
The issue is that in float-out we must abstract over the correct
variables.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Older GHCs can't parse OPTIONS_GHC.
This also changes the URL referenced for the -w options from
WorkingConventions#Warnings to CodingStyle#Warnings for the compiler
modules.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do not want the FloatOut pass to transform
f = \x. e
to
f = let lvl = ... in \x.e
The arity pinned on f isn't right any more; and see
Note [Floating out of RHSs].
Core Lint is now spotting the arity lossage (for a letrec), which is
how I spotted this bug.
I also re-jigged the code around floatBind; it's a bit tidier now.
|
|
|
|
|
|
|
|
|
|
|
| |
Fri Aug 4 18:13:20 EDT 2006 Manuel M T Chakravarty <chak@cse.unsw.edu.au>
* Massive patch for the first months work adding System FC to GHC #30
Broken up massive patch -=chak
Original log message:
This is (sadly) all done in one patch to avoid Darcs bugs.
It's not complete work... more FC stuff to come. A compiler
using just this patch will fail dismally.
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|