| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
I forget all the details, but I spent some time trying to
understand the current setup, and tried to simplify it a bit
|
| |
|
| |
|
|
|
|
|
|
| |
The "InlineRule" is gone now, so this is just making the comments line
up with the code. A function does change its name though:
updModeForInlineRules --> updModeForStableUnfoldings
|
|
|
|
| |
More modular, less code. No change in behaviour.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two main refactorings here
1. Move the uf_arity field
out of CoreUnfolding
into UnfWhen
It's a lot tidier there. If I've got this right, no behaviour
should change.
2. Define specUnfolding and use it in DsBinds and Specialise
a) commons-up some shared code
b) makes sure that Specialise correctly specialises DFun
unfoldings (which it didn't before)
The two got put together because both ended up interacting in the
specialiser.
They cause zero difference to nofib.
|
|
|
|
|
|
| |
Mainly in Simplify.rebuildCase. The old code wasn't wrong, but I kept
mis-understanding it. This patch cuts splits out "pure seq" from "strict
let", which makes it much easier to grok.
|
|
|
|
|
|
|
|
|
| |
It's not obvious why the simplifier generates code that correctly satisfies
the let/app invariant. This patch does some minor refactoring, but the main
point is to document pre-conditions to key functions, namely that the rhs
passed in satisfies the let/app invariant.
There shouldn't be any change in behaviour.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the simplifier eliminate a redundant seq like
case x of y -> ...y....
where y is used strictly. GHC used to do this, but I made it less
aggressive in
commit 28d9a03253e8fd613667526a170b684f2017d299 (Jan 2013)
However #8900 shows that doing so sometimes loses good
transformations; and the transformation is valid according to "A
semantics for imprecise exceptions". So I'm restoring the old
behaviour.
See Note [Eliminating redundant seqs]
|
| |
|
|
|
|
|
|
|
|
|
| |
For DFunUnfoldings we were failing to occurrence-analyse the unfolding,
and that meant that a loop breaker wasn't marked as such, which in turn
meant it was inlined away when it still had occurrence sites. See
Note [Occurrrence analysis of unfoldings] in CoreUnfold.
This is a pretty long-standing bug, happily nailed by John Lato.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were re-using the super-magical "state token" type (which has
VoidRep and is zero bits wide) for situations in which we simply want
to lambda-abstract over a zero-bit argument. For example, join points:
case (case x of { True -> e1; False -> e2 }) of
Red -> f1
Blue -> True
==>
let $j1 = \voidArg::Void# -> f1
in
case x of
True -> case e1 of
Red -> $j1 void
Blue -> True
False -> case e2 of
Red -> $j1 void
Blue -> True
This patch introduces
* The new primitive type GHC.Prim.Void#, with PrimRep = VoidRep
* A new global Id GHC.Prim.voidPrimId :: Void#.
This has no binding because the code generator drops it,
but is used as an argument (eg in the call of $j1)
* A new local Id, MkId.voidArgId, which can be lambda-bound
when you need to lambda-abstract over it.
and uses them throughout.
Now the State# thing is used only when we need state!
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The presenting issue was that we were never eta-expanding
f (\x -> case x of (a,b) -> \s -> blah)
and that meant we were allocating two lambdas instead of one.
See Note [Eta expanding lambdas] in SimplUtils.
However I didn't want to eta expand the lambda, and then try all over
again for tryEtaExpandRhs. Yet the latter is important in the context
of a let-binding it can do simple arity analysis. So I ended up
refactoring CallCtxt so that it tells when we are on the RHS of a let.
I also moved findRhsArity from SimplUtils to CoreArity.
Performance increases nicely. Here are the ones where allocation improved
by more than 0.5%. Notice the nice decrease in binary size too.
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
ansi -2.3% -0.9% 0.00 0.00 +0.0%
bspt -2.1% -9.7% 0.01 0.01 -33.3%
fasta -1.8% -11.7% -3.4% -3.6% +0.0%
fft -1.9% -1.3% 0.06 0.06 +11.1%
reverse-complem -1.9% -18.1% -1.9% -2.8% +0.0%
sphere -1.8% -4.5% 0.09 0.09 +0.0%
transform -1.8% -2.3% -4.6% -3.1% +0.0%
--------------------------------------------------------------------------------
Min -3.0% -18.1% -13.9% -14.6% -35.7%
Max -1.3% +0.0% +7.7% +7.7% +50.0%
Geometric Mean -1.9% -0.6% -2.1% -2.1% -0.2%
|
|
|
|
|
|
| |
and add new, simpler topNormaliseType
This is just a minor refactoring
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 6579a6c we removed existing comparison primops and introduced new ones
returning Int# instead of Bool. This commit (and associated commits in
array, base, dph, ghc-prim, integer-gmp, integer-simple, primitive, testsuite and
template-haskell) restores old names of primops. This allows us to keep
our API cleaner at the price of not having backwards compatibility.
This patch also temporalily disables fix for #8317 (optimization of
tagToEnum# at Core level). We need to fix #8326 first, otherwise
our primops code will be very slow.
|
|
|
|
| |
See Note [Optimising tagToEnum#] in Simplify
|
|
|
|
| |
cf http://ghc.haskell.org/trac/ghc/wiki/LateDmd
|
|
|
|
|
|
|
| |
E.g. (f e1 |> g) e2
If f is strict in two aguments, we want to see that in e2
Hence ArgSpec in SimplUtils
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a long-standing regression (Trac #7797), which meant that in
particular the Eq [Char] instance does not get specialised.
(The *methods* do, but the dictionary itself doesn't.) So when you
call a function
f :: Eq a => blah
on a string type (ie a=[Char]), 7.6 passes a dictionary of un-specialised
methods.
This only matters when calling an overloaded function from a
specialised context, but that does matter in some programs. I
remember (though I cannot find the details) that Nick Frisby discovered
this to be the source of some pretty solid performanc regresisons.
Anyway it works now. The key change is that a DFunUnfolding now takes
a form that is both simpler than before (the DFunArg type is eliminated)
and more general:
data Unfolding
= ...
| DFunUnfolding { -- The Unfolding of a DFunId
-- See Note [DFun unfoldings]
-- df = /\a1..am. \d1..dn. MkD t1 .. tk
-- (op1 a1..am d1..dn)
-- (op2 a1..am d1..dn)
df_bndrs :: [Var], -- The bound variables [a1..m],[d1..dn]
df_con :: DataCon, -- The dictionary data constructor (never a newtype datacon)
df_args :: [CoreExpr] -- Args of the data con: types, superclasses and methods,
} -- in positional order
That in turn allowed me to re-enable the DFunUnfolding specialisation in
DsBinds. Lots of details here in TcInstDcls:
Note [SPECIALISE instance pragmas]
I also did some refactoring, in particular to pass the InScopeSet to
exprIsConApp_maybe (which in turn means it has to go to a RuleFun).
NB: Interface file format has changed!
|
| |
|
|\
| |
| |
| |
| | |
Conflicts:
compiler/basicTypes/DataCon.lhs
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | | |
This makes CaseElim happen a bit more often.
See Note [Case binder next] in Simplify.
This came up when investigating Trac #7542.
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The main payload of this patch is to extend CPR so that it
detects when a function always returns a result constructed
with the *same* constructor, even if the constructor comes from
a sum type. This doesn't matter very often, but it does improve
some things (results below).
Binary sizes increase a little bit, I think because there are more
wrappers. This with -split-objs. Without split-ojbs binary sizes
increased by 6% even for HelloWorld.hs. It's hard to see exactly why,
but I think it was because System.Posix.Types.o got included in the
linked binary, whereas it didn't before.
Program Size Allocs Runtime Elapsed TotalMem
fluid +1.8% -0.3% 0.01 0.01 +0.0%
tak +2.2% -0.2% 0.02 0.02 +0.0%
ansi +1.7% -0.3% 0.00 0.00 +0.0%
cacheprof +1.6% -0.3% +0.6% +0.5% +1.4%
parstof +1.4% -4.4% 0.00 0.00 +0.0%
reptile +2.0% +0.3% 0.02 0.02 +0.0%
----------------------------------------------------------------------
Min +1.1% -4.4% -4.7% -4.7% -15.0%
Max +2.3% +0.3% +8.3% +9.4% +50.0%
Geometric Mean +1.9% -0.1% +0.6% +0.7% +0.3%
Other things in this commit
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Got rid of the Lattice class in Demand
* Refactored the way that products and newtypes are
decomposed (no change in functionality)
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is the result of Ilya Sergey's internship at MSR. It
constitutes a thorough overhaul and simplification of the demand
analyser. It makes a solid foundation on which we can now build.
Main changes are
* Instead of having one combined type for Demand, a Demand is
now a pair (JointDmd) of
- a StrDmd and
- an AbsDmd.
This allows strictness and absence to be though about quite
orthogonally, and greatly reduces brain melt-down.
* Similarly in the DmdResult type, it's a pair of
- a PureResult (indicating only divergence/non-divergence)
- a CPRResult (which deals only with the CPR property
* In IdInfo, the
strictnessInfo field contains a StrictSig, not a Maybe StrictSig
demandInfo field contains a Demand, not a Maybe Demand
We don't need Nothing (to indicate no strictness/demand info)
any more; topSig/topDmd will do.
* Remove "boxity" analysis entirely. This was an attempt to
avoid "reboxing", but it added complexity, is extremely
ad-hoc, and makes very little difference in practice.
* Remove the "unboxing strategy" computation. This was an an
attempt to ensure that a worker didn't get zillions of
arguments by unboxing big tuples. But in fact removing it
DRAMATICALLY reduces allocation in an inner loop of the
I/O library (where the threshold argument-count had been
set just too low). It's exceptional to have a zillion arguments
and I don't think it's worth the complexity, especially since
it turned out to have a serious performance hit.
* Remove quite a bit of ad-hoc cruft
* Move worthSplittingFun, worthSplittingThunk from WorkWrap to
Demand. This allows JointDmd to be fully abstract, examined
only inside Demand.
Everything else really follows from these changes.
All of this is really just refactoring, so we don't expect
big performance changes, but acutally the numbers look quite
good. Here is a full nofib run with some highlights identified:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
expert -2.6% -15.5% 0.00 0.00 +0.0%
fluid -2.4% -7.1% 0.01 0.01 +0.0%
gg -2.5% -28.9% 0.02 0.02 -33.3%
integrate -2.6% +3.2% +2.6% +2.6% +0.0%
mandel2 -2.6% +4.2% 0.01 0.01 +0.0%
nucleic2 -2.0% -16.3% 0.11 0.11 +0.0%
para -2.6% -20.0% -11.8% -11.7% +0.0%
parser -2.5% -17.9% 0.05 0.05 +0.0%
prolog -2.6% -13.0% 0.00 0.00 +0.0%
puzzle -2.6% +2.2% +0.8% +0.8% +0.0%
sorting -2.6% -35.9% 0.00 0.00 +0.0%
treejoin -2.6% -52.2% -9.8% -9.9% +0.0%
--------------------------------------------------------------------------------
Min -2.7% -52.2% -11.8% -11.7% -33.3%
Max -1.8% +4.2% +10.5% +10.5% +7.7%
Geometric Mean -2.5% -2.8% -0.4% -0.5% -0.4%
Things to note
* Binary sizes are smaller. I don't know why, but it's good.
* Allocation is sometiemes a *lot* smaller. I believe that all the big numbers
(I checked treejoin, gg, sorting) arise from one place, namely a function
GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of
which have several arugments. Not w/w'ing both arguments (which is what
we did before) has a big effect. So the big win in actually somewhat
accidental, gained by removing the "unboxing strategy" code.
* A couple of benchmarks allocate slightly more. This turns out
to be due to reboxing (integrate). But the biggest increase is
mandel2, and *that* turned out also to be a somewhat accidental
loss of CSE, and pointed the way to doing better CSE: see Trac
#7596.
* Runtimes are never very reliable, but seem to improve very slightly.
All in all, a good piece of work. Thank you Ilya!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See Note [Case elimination: lifted case]:
We used to do case elimination if
(c) the scrutinee is a variable and 'x' is used strictly
But that changes
case x of { _ -> error "bad" }
--> error "bad"
which is very puzzling if 'x' is later bound to (error "good").
Where the order of evaluation is specified (via seq or case)
we should respect it.
c.f. Note [Empty case alternatives] in CoreSyn, which is how
I came across this.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes most of Trac #3990. Consider
data family D a
data instance D Double = CD Int Int
data T = T {-# UNPACK #-} !(D Double)
Then we want the (D Double unpacked).
To do this we need to construct a suitable coercion, and it's much
safer to record that coercion in the interface file, lest the in-scope
instances differ somehow. That in turn means elaborating the HsBang
type to include a coercion.
To do that I moved HsBang from BasicTypes to DataCon, which caused
quite a few minor knock-on changes.
Interface-file format has changed!
Still to do: need to do knot-tying to allow instances to take effect
within the same module.
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This minor refactoring re-attaches Note [Add unfolding for scrutinee].
It had become detached, which led me on a bit of a wild goose
chase.
While I was at it, I made the code work right for the case where
the scrutinee is of form (x |> co); I don't think this is an important
improvement.
I also make simplAlt unconditionally zap occurrence information on
case-alternative binders (see Note [Case alternative occ info]);
it was almost always being zapped and the additional complexity seems
not worth it.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We were being inconsistent about how we tested whether dump flags
were enabled; in particular, sometimes we also checked the verbosity,
and sometimes we didn't.
This lead to oddities such as "ghc -v4" printing an "Asm code" section
which didn't contain any code, and "-v4" enabled some parts of
"-ddump-deriv" but not others.
Now all the tests use dopt, which also takes the verbosity into account
as appropriate.
|
| |
| |
| |
| |
| | |
Mostly d -> g (matching DynFlag -> GeneralFlag).
Also renamed if* to when*, matching the Haskell if/when names
|
| |
| |
| |
| |
| | |
This avoids confusion due to [DynFlag] and DynFlags being completely
different types.
|
|/ |
|
| |
|
|
|
|
|
|
|
|
| |
I also removed the default values from the "Discounts and thresholds"
note: most of them were no longer up-to-date.
Along the way I added FloatSuffix to the argument parser, analogous to
IntSuffix.
|
| |
|
| |
|
|
|
|
|
| |
All uses of it are now in an IO Monad, so we don't need to use
a trace-like function.
|