summaryrefslogtreecommitdiff
path: root/compiler/Simon-log
diff options
context:
space:
mode:
Diffstat (limited to 'compiler/Simon-log')
-rw-r--r--compiler/Simon-log1260
1 files changed, 1260 insertions, 0 deletions
diff --git a/compiler/Simon-log b/compiler/Simon-log
new file mode 100644
index 0000000000..9d60ccc6eb
--- /dev/null
+++ b/compiler/Simon-log
@@ -0,0 +1,1260 @@
+ ------------------------------------
+ GHCI hacking
+ ------------------------------------
+
+* Don't forget to put deferred-type-decls back into RnIfaces
+
+* Do we want to record a package name in a .hi file?
+ Does pi_mod have a ModuleName or a Module?
+
+ ------------------------------------
+ Mainly FunDeps (23 Jan 01)
+ ------------------------------------
+
+This commit re-engineers the handling of functional dependencies.
+A functional dependency is no longer an Inst; instead, the necessary
+dependencies are snaffled out of their Class when necessary.
+
+As part of this exercise I found that I had to re-work how to do generalisation
+in a binding group. There is rather exhaustive documentation on the new Plan
+at the top of TcSimplify.
+
+ ******************
+ WARNING: I have compiled all the libraries with this new compiler
+ and all looks well, but I have not run many programs.
+ Things may break. Let me know if so.
+ ******************
+
+The main changes are these:
+
+1. typecheck/TcBinds and TcSimplify have a lot of changes due to the
+ new generalisation and context reduction story. There are extensive
+ comments at the start of TcSimplify
+
+2. typecheck/TcImprove is removed altogether. Instead, improvement is
+ interleaved with context reduction (until a fixpoint is reached).
+ All this is done in TcSimplify.
+
+3. types/FunDeps has new exports
+ * 'improve' does improvement, returning a list of equations
+ * 'grow' and 'oclose' close a list of type variables wrt a set of
+ PredTypes, but in slightly different ways. Comments in file.
+
+4. I improved the way in which we check that main::IO t. It's tidier now.
+
+In addition
+
+* typecheck/TcMatches:
+ a) Tidy up, introducing a common function tcCheckExistentialPat
+
+ b) Improve the typechecking of parallel list comprehensions,
+ which wasn't quite right before. (see comments with tcStmts)
+
+ WARNING: (b) is untested! Jeff, you might want to check.
+
+* Numerous other incidental changes in the typechecker
+
+* Manuel found that rules don't fire well when you have partial applications
+ from overloading. For example, we may get
+
+ f a (d::Ord a) = let m_g = g a d
+ in
+ \y :: a -> ...(m_g (h y))...
+
+ The 'method' m_g doesn't get inlined because (g a d) might be a redex.
+ Yet a rule that looks like
+ g a d (h y) = ...
+ won't fire because that doesn't show up. One way out would be to make
+ the rule matcher a bit less paranoid about duplicating work, but instead
+ I've added a flag
+ -fno-method-sharing
+ which controls whether we generate things like m_g in the first place.
+ It's not clear that they are a win in the first place.
+
+ The flag is actually consulted in Inst.tcInstId
+
+
+
+ ------------------------------------
+ Mainly PredTypes (28 Sept 00)
+ ------------------------------------
+
+Three things in this commit:
+
+ 1. Main thing: tidy up PredTypes
+ 2. Move all Keys into PrelNames
+ 3. Check for unboxed tuples in function args
+
+1. Tidy up PredTypes
+~~~~~~~~~~~~~~~~~~~~
+The main thing in this commit is to modify the representation of Types
+so that they are a (much) better for the qualified-type world. This
+should simplify Jeff's life as he proceeds with implicit parameters
+and functional dependencies. In particular, PredType, introduced by
+Jeff, is now blessed and dignified with a place in TypeRep.lhs:
+
+ data PredType = Class Class [Type]
+ | IParam Name Type
+
+Consider these examples:
+ f :: (Eq a) => a -> Int
+ g :: (?x :: Int -> Int) => a -> Int
+ h :: (r\l) => {r} => {l::Int | r}
+
+Here the "Eq a" and "?x :: Int -> Int" and "r\l" are all called
+*predicates*, and are represented by a PredType. (We don't support
+TREX records yet, but the setup is designed to expand to allow them.)
+
+In addition, Type gains an extra constructor:
+
+ data Type = .... | PredTy PredType
+
+so that PredType is injected directly into Type. So the type
+ p => t
+is represented by
+ PredType p `FunTy` t
+
+I have deleted the hackish IPNote stuff; predicates are dealt with entirely
+through PredTys, not through NoteTy at all.
+
+
+2. Move Keys into PrelNames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This is just a housekeeping operation. I've moved all the pre-assigned Uniques
+(aka Keys) from Unique.lhs into PrelNames.lhs. I've also moved knowKeyRdrNames
+from PrelInfo down into PrelNames. This localises in PrelNames lots of stuff
+about predefined names. Previously one had to alter three files to add one,
+now only one.
+
+3. Unboxed tuples
+~~~~~~~~~~~~~~~~~~
+Add a static check for unboxed tuple arguments. E.g.
+ data T = T (# Int, Int #)
+is illegal
+
+
+
+ ---------------------------------------
+ Update in place
+ ---------------------------------------
+
+-funfolding-update-in-place
+Switching it on doesn't affect many programs, except these
+sphere is because it makes a critical function (vecsub) more inlinable
+
+ sphere 66465k -20.61%
+ infer 13390k +1.27%
+ parstof 1461k +1.18%
+ fluid 3442k +1.61%
+ atom 177163k +13.20%
+ bspt 4837k +4.85%
+ cichelli 33546k +2.69%
+ typecheck 146023k +1.47%
+
+
+ ---------------------------------------
+ Simon's tuning changes: early Sept 2000
+ ---------------------------------------
+
+Library changes
+~~~~~~~~~~~~~~~
+* Eta expand PrelShow.showLitChar. It's impossible to compile this well,
+ and it makes a big difference to some programs (e.g. gen_regexps)
+
+* Make PrelList.concat into a good producer (in the foldr/build sense)
+
+
+Flag changes
+~~~~~~~~~~~~
+* Add -ddump-hi-diffs to print out changes in interface files. Useful
+ when watching what the compiler is doing
+
+* Add -funfolding-update-in-place to enable the experimental optimisation
+ that makes the inliner a bit keener to inline if it's in the RHS of
+ a thunk that might be updated in place. Sometimes this is a bad idea
+ (one example is in spectral/sphere; see notes in nofib/Simon-nofib-notes)
+
+
+Tuning things
+~~~~~~~~~~~~~
+* Fix a bug in SetLevels.lvlMFE. (change ctxt_lvl to dest_level)
+ I don't think this has any performance effect, but it saves making
+ a redundant let-binding that is later eliminated.
+
+* Desugar.dsProgram and DsForeign
+ Glom together all the bindings into a single Rec. Previously the
+ bindings generated by 'foreign' declarations were not glommed together, but
+ this led to an infelicity (i.e. poorer code than necessary) in the modules
+ that actually declare Float and Double (explained a bit more in Desugar.dsProgram)
+
+* OccurAnal.shortMeOut and IdInfo.shortableIdInfo
+ Don't do the occurrence analyser's shorting out stuff for things which
+ have rules. Comments near IdInfo.shortableIdInfo.
+ This is deeply boring, and mainly to do with making rules work well.
+ Maybe rules should have phases attached too....
+
+* CprAnalyse.addIdCprInfo
+ Be a bit more willing to add CPR information to thunks;
+ in particular, if the strictness analyser has just discovered that this
+ is a strict let, then the let-to-case transform will happen, and CPR is fine.
+ This made a big difference to PrelBase.modInt, which had something like
+ modInt = \ x -> let r = ... -> I# v in
+ ...body strict in r...
+ r's RHS isn't a value yet; but modInt returns r in various branches, so
+ if r doesn't have the CPR property then neither does modInt
+
+* MkId.mkDataConWrapId
+ Arrange that vanilla constructors, like (:) and I#, get unfoldings that are
+ just a simple variable $w:, $wI#. This ensures they'll be inlined even into
+ rules etc, which makes matching a bit more reliable. The downside is that in
+ situations like (map (:) xs), we'll end up with (map (\y ys. $w: y ys) xs.
+ Which is tiresome but it doesn't happen much.
+
+* SaAbsInt.findStrictness
+ Deal with the case where a thing with no arguments is bottom. This is Good.
+ E.g. module M where { foo = error "help" }
+ Suppose we have in another module
+ case M.foo of ...
+ Then we'd like to do the case-of-error transform, without inlining foo.
+
+
+Tidying up things
+~~~~~~~~~~~~~~~~~
+* Reorganised Simplify.completeBinding (again).
+
+* Removed the is_bot field in CoreUnfolding (is_cheap is true if is_bot is!)
+ This is just a tidy up
+
+* HsDecls and others
+ Remove the NewCon constructor from ConDecl. It just added code, and nothing else.
+ And it led to a bug in MkIface, which though that a newtype decl was always changing!
+
+* IdInfo and many others
+ Remove all vestiges of UpdateInfo (hasn't been used for years)
+
+ ------------------------------
+ Join points Sept 2000
+ ------------------------------
+
+With Andrew Kennedy, I found out why a few of the join points introduced by
+the simplifier end up as *not* let-no-escpaed. Here's an example:
+
+f x y = case (pwr x b) == 1 of
+ False -> False
+ True -> pwr x c == 1
+
+This compiles to:
+ f = \ @ t w :: Integer ->
+ let {
+ $j :: (State# RealWorld -> Bool)
+ P
+ $j
+ = \ w1 :: (State# RealWorld) ->
+ case pwr w c of wild {
+ S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
+ J# s d1 ->
+ case cmpIntegerInt# s d1 1 of wild2 {
+ 0 -> $wTrue; __DEFAULT -> $wFalse
+ }
+ }
+ } in
+ case pwr w b of wild {
+ S# i ->
+ case i of wild1 { 1 -> $j realWorld#; __DEFAULT -> $wFalse };
+ J# s d1 ->
+ case cmpIntegerInt# s d1 1 of wild2 {
+ 0 -> $j realWorld#; __DEFAULT -> $wFalse
+ }
+ }
+
+Now consider
+
+ case (f x) of
+ True -> False
+ False -> True
+
+Suppose f is inlined into this case. No new join points are introduced,
+because the alternatives are both small. But the consumer
+ case [.] of {True -> False; False -> True}
+will move into the body of f, be duplicated 4 ways, and end up consuming
+the result of the four outcomes at the body of f. This yields:
+ $j :: (State# RealWorld -> Bool)
+ P
+ $j
+ = \ w1 :: (State# RealWorld) ->
+ case pwr w c of wild {
+ S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
+ J# s d1 ->
+ case cmpIntegerInt# s d1 1 of wild2 {
+ 0 -> $wTrue; __DEFAULT -> $wFalse
+ }
+ }
+ } in
+ case pwr w b of wild {
+ S# i ->
+ case i of wild1 { 1 -> case $j realWorld# of {T->F; F->T}
+ ; __DEFAULT -> $wTrue };
+ J# s d1 ->
+ case cmpIntegerInt# s d1 1 of wild2 {
+ 0 -> case $j realWorld# of {T->F; F->T}
+ ; __DEFAULT -> $wTrue
+ }
+ }
+
+And, voila, the join point $j isn't let-no-escaped any more.
+The point is that the consuming context can't "see inside" the join point.
+It's a phase ordering thing. If f is inlined before the join points
+are built in the first place, then all is well.
+
+
+
+ -----------------------------
+ Sept 7 2000
+ -----------------------------
+
+* Make the simplifier's Stop continuation record whether the expression being
+ simplified is the RHS of a thunk, or (say) the body of a lambda or case RHS.
+ In the thunk case we want to be a bit keener about inlining if the type of
+ the thunk is amenable to update in place.
+
+* SetLevels was being a bit too eager to float things to the top
+ level; e.g. _inline_me_ (\a -> e); here e got floated...
+ Easily fixed by a change to ltMajLvl
+
+* Make CoreUnfold.calcUnfoldingGuidance a bit less keen to make case expressions
+ seem small. The original idea was to make inlined wrappers look small, so that
+ when we inline a wrapper it doesn't make call site (much) bigger
+ Otherwise we get nasty phase ordering stuff:
+ -- f x = g x x
+ -- h y = ...(f e)...
+ If we inline g's wrapper, f looks big, and doesn't get inlined
+ into h; if we inline f first, while it looks small, then g's
+ wrapper will get inlined later anyway. To avoid this nasty
+ ordering difference, we make (case a of (x,y) -> ...),
+ *where a is one of the arguments* look free.
+
+ BUT (a) It's too eager. We don't want to inline a wrapper into a
+ context with no benefit.
+ E.g. \ x. f (x+x) o point in inlining (+) here!
+
+ (b) It's ineffective. Once g's wrapper is inlined, its case-expressions
+ aren't scrutinising arguments any more
+
+ So I've rescinded this idea for now. cases still look fairly small.
+
+* Fix interestingArg, which was being too liberal, and hence doing
+ too much inlining.
+
+* Extended CoreUtils.exprIsCheap to make two more things cheap:
+ - case (coerce x) of ...
+ - let x = y +# z
+ This makes a bit more eta expansion happen. It was provoked by
+ a program of Marcin's.
+
+* The simplifier used to glom together all the top-level bindings into
+ a single Rec every time it was invoked. The reason for this is explained
+ in SimplCore.lhs, but for at least one simple program it meant that the
+ simplifier never got around to unravelling the recursive group into
+ non-recursive pieces. So I've put the glomming under explicit flag
+ control with a -fglom-binds simplifier pass. A side benefit is
+ that because it happens less often, the (expensive) SCC algorithm
+ runs less often.
+
+* MkIface.ifaceBinds. Make sure that we emit rules for things
+ (like class operations) that don't get a top-level binding in the
+ interface file. Previously such rules were silently forgotten.
+
+* Move transformRhs to *after* simplification, which makes it a
+ little easier to do, and means that the arity it computes is
+ readily available to completeBinding. This gets much better
+ arities.
+
+* Do coerce splitting in completeBinding. This gets good code for
+ newtype CInt = CInt Int
+
+ test:: CInt -> Int
+ test x = case x of
+ 1 -> 2
+ 2 -> 4
+ 3 -> 8
+ 4 -> 16
+ _ -> 0
+
+* Modify the meaning of "arity" so that during compilation it means
+ "if you apply this function to fewer args, it will do virtually
+ no work". So, for example
+ f = coerce t (\x -> e)
+ has arity at least 1. When a function is exported, it's arity becomes
+ the number of exposed, top-level lambdas, which is subtly different.
+ But that's ok.
+
+ I removed CoreUtils.exprArity altogether: it looked only at the exposed
+ lambdas. Instead, we use exprEtaExpandArity exclusively.
+
+ All of this makes I/O programs work much better.
+
+
+ -----------------------------
+ Sept 4 2000
+ -----------------------------
+
+* PrimRep, TysPrim. Add PrimPtrRep as the representation for
+ MVars and MutVars. Previously they were given PtrRep, but that
+ crashed dataReturnConvPrim! Here's the program the killed it:
+ data STRef s a = STRef (MutVar# s a)
+ from (STRef x) = x
+
+* Make the desugarer use string equality for string literal
+ patterns longer than 1 character. And put a specialised
+ eqString into PrelBase, with a suitable specialisation rule.
+ This makes a huge difference to the size of the code generated
+ by deriving(Read) notably in Time.lhs
+
+ -----------------------------
+ Marktoberdorf Commits (Aug 2000)
+ -----------------------------
+
+1. Tidy up the renaming story for "system binders", such as
+dictionary functions, default methods, constructor workers etc. These
+are now documented in HsDecls. The main effect of the change, apart
+from tidying up, is to make the *type-checker* (instead of the
+renamer) generate names for dict-funs and default-methods. This is
+good because Sergei's generic-class stuff generates new classes at
+typecheck time.
+
+
+2. Fix the CSE pass so it does not require the no-shadowing invariant.
+Keith discovered that the simplifier occasionally returns a result
+with shadowing. After much fiddling around (which has improved the
+code in the simplifier a bit) I found that it is nearly impossible to
+arrange that it really does do no-shadowing. So I gave up and fixed
+the CSE pass (which is the only one to rely on it) instead.
+
+
+3. Fix a performance bug in the simplifier. The change is in
+SimplUtils.interestingArg. It computes whether an argment should
+be considered "interesting"; if a function is applied to an interesting
+argument, we are more likely to inline that function.
+Consider this case
+ let x = 3 in f x
+The 'x' argument was considered "uninteresting" for a silly reason.
+Since x only occurs once, it was unconditionally substituted, but
+interestingArg didn't take account of that case. Now it does.
+
+I also made interestingArg a bit more liberal. Let's see if we
+get too much inlining now.
+
+
+4. In the occurrence analyser, we were choosing a bad loop breaker.
+Here's the comment that's now in OccurAnal.reOrderRec
+
+ score ((bndr, rhs), _, _)
+ | exprIsTrivial rhs = 3 -- Practically certain to be inlined
+ -- Used to have also: && not (isExportedId bndr)
+ -- But I found this sometimes cost an extra iteration when we have
+ -- rec { d = (a,b); a = ...df...; b = ...df...; df = d }
+ -- where df is the exported dictionary. Then df makes a really
+ -- bad choice for loop breaker
+
+I also increased the score for bindings with a non-functional type, so that
+dictionaries have a better chance of getting inlined early
+
+
+5. Add a hash code to the InScopeSet (and make it properly abstract)
+This should make uniqAway a lot more robust. Simple experiments suggest
+that uniqAway no longer gets into the long iteration chains that it used
+to.
+
+
+6. Fix a bug in the inliner that made the simplifier tend to get into
+a loop where it would keep iterating ("4 iterations, bailing out" message).
+In SimplUtils.mkRhsTyLam we float bindings out past a big lambda, thus:
+ x = /\ b -> let g = \x -> f x x
+ in E
+becomes
+ g* = /\a -> \x -> f x x
+ x = /\ b -> let g = g* b in E
+
+It's essential that we don't simply inling g* back into the RHS of g,
+else we will be back to square 1. The inliner is meant not to do this
+because there's no benefit to the inlining, but the size calculation
+was a little off in CoreUnfold.
+
+
+7. In SetLevels we were bogus-ly building a Subst with an empty in-scope
+set, so a WARNING popped up when compiling some modules. (knights/ChessSetList
+was the example that tickled it.) Now in fact the warning wasn't an error,
+but the Right Thing to do is to carry down a proper Subst in SetLevels, so
+that is what I have now done. It is very little more expensive.
+
+
+
+ ~~~~~~~~~~~~
+ Apr/May 2000
+ ~~~~~~~~~~~~
+
+This is a pretty big commit! It adds stuff I've been working on
+over the last month or so. DO NOT MERGE IT WITH 4.07!
+
+Recompilation checking
+~~~~~~~~~~~~~~~~~~~~~~
+Substantial improvement in recompilation checking. The version management
+is now entirely internal to GHC. ghc-iface.lprl is dead!
+
+The trick is to generate the new interface file in two steps:
+ - first convert Types etc to HsTypes etc, and thereby
+ build a new ParsedIface
+ - then compare against the parsed (but not renamed) version of the old
+ interface file
+Doing this meant adding code to convert *to* HsSyn things, and to
+compare HsSyn things for equality. That is the main tedious bit.
+
+Another improvement is that we now track version info for
+fixities and rules, which was missing before.
+
+
+Interface file reading
+~~~~~~~~~~~~~~~~~~~~~~
+Make interface files reading more robust.
+ * If the old interface file is unreadable, don't fail. [bug fix]
+
+ * If the old interface file mentions interfaces
+ that are unreadable, don't fail. [bug fix]
+
+ * When we can't find the interface file,
+ print the directories we are looking in. [feature]
+
+
+Type signatures
+~~~~~~~~~~~~~~~
+ * New flag -ddump-types to print type signatures
+
+
+Type pruning
+~~~~~~~~~~~~
+When importing
+ data T = T1 A | T2 B | T3 C
+it seems excessive to import the types A, B, C as well, unless
+the constructors T1, T2 etc are used. A,B,C might be more types,
+and importing them may mean reading more interfaces, and so on.
+ So the idea is that the renamer will just import the decl
+ data T
+unless one of the constructors is used. This turns out to be quite
+easy to implement. The downside is that we must make sure the
+constructors are always available if they are really needed, so
+I regard this as an experimental feature.
+
+
+Elimininate ThinAir names
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Eliminate ThinAir.lhs and all its works. It was always a hack, and now
+the desugarer carries around an environment I think we can nuke ThinAir
+altogether.
+
+As part of this, I had to move all the Prelude RdrName defns from PrelInfo
+to PrelMods --- so I renamed PrelMods as PrelNames.
+
+I also had to move the builtinRules so that they are injected by the renamer
+(rather than appearing out of the blue in SimplCore). This is if anything simpler.
+
+Miscellaneous
+~~~~~~~~~~~~~
+* Tidy up the data types involved in Rules
+
+* Eliminate RnEnv.better_provenance; use Name.hasBetterProv instead
+
+* Add Unique.hasKey :: Uniquable a => a -> Unique -> Bool
+ It's useful in a lot of places
+
+* Fix a bug in interface file parsing for __U[!]
+
+
+=======================================
+To-do
+~~~~~
+* Try the effect of enhancing update in place with the CPR
+ idea in CoreUnfold.calcUnfoldingGuidance
+
+* Check with Simon M re srt on Lit
+
+* Make all primops return a data type so that we can't over-apply a primop
+ This makes code gen simpler. Currently the only primops with a polymorphic
+ return type are:
+ raise# :: a -> b
+ catch# :: a -> (b->a) -> a
+ tagToEnum# :: Int -> a
+
+ Very strange code for PrelException.catchException! What has STret got
+ to do with it?
+
+* Liberate case
+
+* Missing w/w for coerce in go2 functions of fibToList' in fibheaps
+
+* Watch out for re-boxing in workers; sometimes it happens
+ and then w/w is a Bad Thing
+
+* Only two uses of mkCompulsoryUnfolding -- try to nuke it
+
+* Note that mkDupAlt makes alts that have binders that
+ are guaranteed to appear just once or not at all
+ (a,b) -> j a
+ Same for case binder, but that's harder to take into account.
+
+* max :: Int -> Int -> Int could be CPRd but isn't.
+
+* In mandel2 we do a little less well than 4.04 because we aren't
+ inlining point_colour, and that means we have to box up an argument
+ before calling it. [This was due to a bug in 4.04]
+ There's also a great opportunity for liberateCase
+ in check_radius, where it loops around with two lazy F# built each time
+
+* In PrelShow.itos' we find a thunk like:
+ tpl = case chrzh {(zpzh {(remIntzh {x{-aMf-} 10}) 48})}
+ of tpl{-X1j-} __D P { __DEFAULT ->
+ PrelBase.Czh{-62,s-} {tpl{-X1j-}}
+ }
+ This is a pity. The remInt# can't fail because the divisor isn't 0,
+ so we could do the sum eagerly and allocate a charcter instead of a thunk.
+
+* It's good to do let-to-case before we wrap up. Consider
+ f b xs = let ys = partition isUpper xs
+ zs = case ys of (a,b) -> a
+ in case b of
+ True -> case ys of
+ (a,b) -> (zs,[])
+ False -> case ys of
+ (a,b) -> (zs ++ xs,[])
+ If we don't do let-to-case at all, we get 3 redundant case ys left.
+ On the other hand we don't want to do it too early, because it
+ prevents inlining into strict arg positions, which is important for
+ rules to work.
+
+* Strict dictionaries.
+
+* INLINE functions are not always INLINEd, so it's sad to leave
+ stuff in their bodies like constructors that havn't been inlined.
+
+* If let x = e in b is strict, then CPR can use the CPR info from x
+ This bites in the mod method of Integral Int
+
+* Inline wrappers if they are the RHS of a let, so that update in place
+ can happen?
+
+* Consider doing unboxing on strict constr args in a pattern match,
+ as part of w/w.
+
+* In spectral/expert/Search.ask there's a statically visible CSE. Catching this
+ depends almost entirely on chance, which is a pity.
+
+* Think about exprEtaExpandArity in WwLib. Perhaps eliminate eta expand in simplify?
+ Perhaps use even if no coerces etc, just eta expansion. (e.g. PrelArr.done)
+
+* In knights/KnightHeuristic, we don't find that possibleMoves is strict
+ (with important knock-on effects) unless we apply rules before floating
+ out the literal list [A,B,C...].
+ Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
+
+* Floating can float the entire body of an INLINE thing out.
+ e.g. PrelArr.done
+ This is sad, and a bit stupid.
+
+* In spectral/multiplier, we have
+ xor = lift21 forceBit f
+ where f :: Bit -> Bit -> Bit
+ f 0 0 = 0
+ f 0 1 = 1
+ f 1 0 = 1
+ f 1 1 = 0
+ Trouble is, f is CPR'd, and that means that instead of returning
+ the constants I# 0, I# 1, it returns 0,1 and then boxes them.
+ So allocation goes up. I don't see a way around this.
+
+* spectral/hartel/parstof ends up saying
+ case (unpackCString "x") of { c:cs -> ... }
+ quite a bit. We should spot these and behave accordingly.
+
+* Try a different hashing algorithms in hashUFM. This might reduce long CSE lists
+ as well as making uniqAway faster.
+
+* [I'm not sure this is really important in the end.]
+ Don't float out partial applications in lvlMFE. E.g. (in hPutStr defn of shoveString)
+ \x -> case .. of
+ [] -> setBufWPtr a b
+ ...
+ setBufWPtr has arity 3. Floating it out is plain silly. And in this particular
+ case it's harmful, because it ends up preventing eta expansion on the \x.
+ That in turn leads to a big extra cost in hPutStr.
+
+ *** Try not doing lvlMFE on the body of a lambda and case alternative ***
+
+* PrelNumExtra.lhs we get three copies of dropTrailing0s. Too much inlining!
+ drop0 has cost 21, but gets a discount of 6 (3 * #constrs) for its arg.
+ With a keen-neess factor of 2, that makes a discount of 12. Add two for
+ the arguments and we get 21-12-2, which is just small enough to inline.
+ But that is plainly stupid.
+
+ Add one for cases; and decrease discount for constructors.
+
+* IO.hGetContents still doesn't see that it is strict in the handle.
+ Coerces still getting in the way.
+
+* Try not having really_interesting_cont (subsumed by changes in the
+ way guidance is calculated for inline things?)
+
+* Enumeration types in worker/wrapper for strictness analysis
+
+* This should be reported as an error:
+ data T k = MkT (k Int#)
+
+* Bogus report of overlapped pattern for
+ f (R {field = [c]}) = 1
+ f (R {}) = 2
+ This shows up for TyCon.maybeTyConSingleCon
+
+* > module Main( main ) where
+
+ > f :: String -> Int
+ > f "=<" = 0
+ > f "=" = 0
+
+ > g :: [Char] -> Int
+ > g ['=','<'] = 0
+ > g ['='] = 0
+
+ > main = return ()
+
+ For ``f'' the following is reported.
+
+ tmp.lhs:4:
+ Pattern match(es) are overlapped in the definition of function `f'
+ "=" = ...
+
+ There are no complaints for definition for ``g''.
+
+* Without -O I don't think we need change the module version
+ if the usages change; I forget why it changes even with -O
+
+* Record selectors for existential type; no good! What to do?
+ Record update doesn't make sense either.
+
+ Need to be careful when figuring out strictness, and when generating
+ worker-wrapper split.
+
+ Also when deriving.
+
+
+ Jan 2000
+ ~~~~~~~~
+
+A fairly big pile of work originally aimed at
+removing the Con form of Core expression, and replacing it with simple
+Lit form. However, I wanted to make sure that the resulting thing
+performed better than the original, so I ended up making an absolute
+raft of other changes.
+
+Removing the Con form of Core expressions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The big thing is that
+
+ For every constructor C there are now *two* Ids:
+
+ C is the constructor's *wrapper*. It evaluates and unboxes arguments
+ before calling $wC. It has a perfectly ordinary top-level defn
+ in the module defining the data type.
+
+ $wC is the constructor's *worker*. It is like a primop that simply
+ allocates and builds the constructor value. Its arguments are the
+ actual representation arguments of the constructor.
+
+ For every primop P there is *one* Id, its (curried) Id
+
+ Neither contructor worker Id nor the primop Id have a defminition anywhere.
+ Instead they are saturated during the core-to-STG pass, and the code generator
+ generates code for them directly. The STG language still has saturated
+ primops and constructor applications.
+
+* The Const type disappears, along with Const.lhs. The literal part
+ of Const.lhs reappears as Literal.lhs. Much tidying up in here,
+ to bring all the range checking into this one module.
+
+* I got rid of NoRep literals entirely. They just seem to be too much trouble.
+
+* Because Con's don't exist any more, the funny C { args } syntax
+ disappears from inteface files.
+
+* Every constructor, C, comes with a
+
+ *wrapper*, called C, whose type is exactly what it looks like
+ in the source program. It is an ordinary function,
+ and it gets a top-level binding like any other function
+
+ *worker*, called $wC, which is the actual data constructor.
+ Its type may be different to C, because:
+ - useless dict args are dropped
+ - strict args may be flattened
+ It does not have a binding.
+
+ The worker is very like a primop, in that it has no binding,
+
+
+Parsing
+~~~~~~~
+* Result type signatures now work
+ f :: Int -> Int = \x -> x
+ -- The Int->Int is the type of f
+
+ g x y :: Int = x+y
+ -- The Int is the type of the result of (g x y)
+
+
+Recompilation checking and make
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+* The .hi file for a modules is not touched if it doesn't change. (It used to
+ be touched regardless, forcing a chain of recompilations.) The penalty for this
+ is that we record exported things just as if they were mentioned in the body of
+ the module. And the penalty for that is that we may recompile a module when
+ the only things that have changed are the things it is passing on without using.
+ But it seems like a good trade.
+
+* -recomp is on by default
+
+Foreign declarations
+~~~~~~~~~~~~~~~~~~~~
+* If you say
+ foreign export zoo :: Int -> IO Int
+ then you get a C produre called 'zoo', not 'zzoo' as before.
+ I've also added a check that complains if you export (or import) a C
+ procedure whose name isn't legal C.
+
+
+Code generation and labels
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+* Now that constructor workers and wrappers have distinct names, there's
+ no need to have a Foo_static_closure and a Foo_closure for constructor Foo.
+ I nuked the entire StaticClosure story. This has effects in some of
+ the RTS headers (i.e. s/static_closure/closure/g)
+
+
+Rules, constant folding
+~~~~~~~~~~~~~~~~~~~~~~~
+* Constant folding becomes just another rewrite rule, attached to the Id for the
+ PrimOp. To achieve this, there's a new form of Rule, a BuiltinRule (see CoreSyn.lhs).
+ The prelude rules are in prelude/PrelRules.lhs, while simplCore/ConFold.lhs has gone.
+
+* Appending of constant strings now works, using fold/build fusion, plus
+ the rewrite rule
+ unpack "foo" c (unpack "baz" c n) = unpack "foobaz" c n
+ Implemented in PrelRules.lhs
+
+* The CCall primop is tidied up quite a bit. There is now a data type CCall,
+ defined in PrimOp, that packages up the info needed for a particular CCall.
+ There is a new Id for each new ccall, with an big "occurrence name"
+ {__ccall "foo" gc Int# -> Int#}
+ In interface files, this is parsed as a single Id, which is what it is, really.
+
+Miscellaneous
+~~~~~~~~~~~~~
+* There were numerous places where the host compiler's
+ minInt/maxInt was being used as the target machine's minInt/maxInt.
+ I nuked all of these; everything is localised to inIntRange and inWordRange,
+ in Literal.lhs
+
+* Desugaring record updates was broken: it didn't generate correct matches when
+ used withe records with fancy unboxing etc. It now uses matchWrapper.
+
+* Significant tidying up in codeGen/SMRep.lhs
+
+* Add __word, __word64, __int64 terminals to signal the obvious types
+ in interface files. Add the ability to print word values in hex into
+ C code.
+
+* PrimOp.lhs is no longer part of a loop. Remove PrimOp.hi-boot*
+
+
+Types
+~~~~~
+* isProductTyCon no longer returns False for recursive products, nor
+ for unboxed products; you have to test for these separately.
+ There's no reason not to do CPR for recursive product types, for example.
+ Ditto splitProductType_maybe.
+
+Simplification
+~~~~~~~~~~~~~~~
+* New -fno-case-of-case flag for the simplifier. We use this in the first run
+ of the simplifier, where it helps to stop messing up expressions that
+ the (subsequent) full laziness pass would otherwise find float out.
+ It's much more effective than previous half-baked hacks in inlining.
+
+ Actually, it turned out that there were three places in Simplify.lhs that
+ needed to know use this flag.
+
+* Make the float-in pass push duplicatable bindings into the branches of
+ a case expression, in the hope that we never have to allocate them.
+ (see FloatIn.sepBindsByDropPoint)
+
+* Arrange that top-level bottoming Ids get a NOINLINE pragma
+ This reduced gratuitous inlining of error messages.
+ But arrange that such things still get w/w'd.
+
+* Arrange that a strict argument position is regarded as an 'interesting'
+ context, so that if we see
+ foldr k z (g x)
+ then we'll be inclined to inline g; this can expose a build.
+
+* There was a missing case in CoreUtils.exprEtaExpandArity that meant
+ we were missing some obvious cases for eta expansion
+ Also improve the code when handling applications.
+
+* Make record selectors (identifiable by their IdFlavour) into "cheap" operations.
+ [The change is a 2-liner in CoreUtils.exprIsCheap]
+ This means that record selection may be inlined into function bodies, which
+ greatly improves the arities of overloaded functions.
+
+* Make a cleaner job of inlining "lone variables". There was some distributed
+ cunning, but I've centralised it all now in SimplUtils.analyseCont, which
+ analyses the context of a call to decide whether it is "interesting".
+
+* Don't specialise very small functions in Specialise.specDefn
+ It's better to inline it. Rather like the worker/wrapper case.
+
+* Be just a little more aggressive when floating out of let rhss.
+ See comments with Simplify.wantToExpose
+ A small change with an occasional big effect.
+
+* Make the inline-size computation think that
+ case x of I# x -> ...
+ is *free*.
+
+
+CPR analysis
+~~~~~~~~~~~~
+* Fix what was essentially a bug in CPR analysis. Consider
+
+ letrec f x = let g y = let ... in f e1
+ in
+ if ... then (a,b) else g x
+
+ g has the CPR property if f does; so when generating the final annotated
+ RHS for f, we must use an envt in which f is bound to its final abstract
+ value. This wasn't happening. Instead, f was given the CPR tag but g
+ wasn't; but of course the w/w pass gives rotten results in that case!!
+ (Because f's CPR-ness relied on g's.)
+
+ On they way I tidied up the code in CprAnalyse. It's quite a bit shorter.
+
+ The fact that some data constructors return a constructed product shows
+ up in their CPR info (MkId.mkDataConId) not in CprAnalyse.lhs
+
+
+
+Strictness analysis and worker/wrapper
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+* BIG THING: pass in the demand to StrictAnal.saExpr. This affects situations
+ like
+ f (let x = e1 in (x,x))
+ where f turns out to have strictness u(SS), say. In this case we can
+ mark x as demanded, and use a case expression for it.
+
+ The situation before is that we didn't "know" that there is the u(SS)
+ demand on the argument, so we simply computed that the body of the let
+ expression is lazy in x, and marked x as lazily-demanded. Then even after
+ f was w/w'd we got
+
+ let x = e1 in case (x,x) of (a,b) -> $wf a b
+
+ and hence
+
+ let x = e1 in $wf a b
+
+ I found a much more complicated situation in spectral/sphere/Main.shade,
+ which improved quite a bit with this change.
+
+* Moved the StrictnessInfo type from IdInfo to Demand. It's the logical
+ place for it, and helps avoid module loops
+
+* Do worker/wrapper for coerces even if the arity is zero. Thus:
+ stdout = coerce Handle (..blurg..)
+ ==>
+ wibble = (...blurg...)
+ stdout = coerce Handle wibble
+ This is good because I found places where we were saying
+ case coerce t stdout of { MVar a ->
+ ...
+ case coerce t stdout of { MVar b ->
+ ...
+ and the redundant case wasn't getting eliminated because of the coerce.
+
+
+
+End December
+~~~~~~~~~~~~
+* Fix a few renamer bugs
+
+* Substantially reorganise the Prelude to eliminate all orphan declarations.
+ Details in PrelBase.lhs
+
+* Do a much better job of appending literal strings
+ - remove NoRepStr
+ - move unpackCString stuff to PrelBase
+ - add BuiltinRules to the Rule type
+ - add fold/build rules for literal strings
+
+
+
+Week of Mon 25 Oct
+~~~~~~~~~~~~~~~~~~
+* Fix a terrible bug in Simplify.mkDupableAlt; we were duplicating a small
+ *InAlt*, but doing so invalidated occurrence info, which could lead to
+ substantial code duplication.
+
+* Fix a bug in WwLib.mkWWcpr; I was generating CPR wrappers like
+ I# (case x of ...)
+ which is utterly wrong. It should be
+ case x of ...(I# r)
+ (The effect was to make functions stricter than they really are.)
+
+* Try doing no inlining at all in phase 0. This noticeably improved
+ spectral/fish (esp Main.hs I think), by improving floating.
+ This single change has quite a large effect on some programs (allocation)
+
+ Don't inline Don't inline
+ wrappers anything
+ in phase 0 in phase 0
+ awards 113k -7.08%
+ cichelli 28962k -3.12%
+ wave4main 88089k +130.45%
+ fibheaps 31731k +19.01%
+ fish 8273k -1.64%
+ typecheck 148713k +4.91%
+
+ But I found that fish worked much better if we inline *local* things
+ in phase 0, but not *imported* things.
+
+* Fix a terrible bug in Simplify.mkLamBndrZapper. It was counting
+ type args in one place, but not type binders, so it was sometimes
+ inlining into unsaturated lambdas!
+
+* I found that there were some very bad loss-of-arity cases in PrelShow.
+ In particular, we had:
+
+ showl "" = showChar '"' s
+ showl ('"':xs) = showString "\\\"" . showl xs
+ showl (x:xs) = showLitChar x . showl xs
+
+ Trouble is, we get
+ showl = \xs -> case xs of
+ ...
+ (x:xs) -> let f = showLitChar x
+ g = showl xs
+ in \s -> f (g x)
+ which is TERRIBLE. We can't spot that showLitChar has arity 2 because
+ it looks like this:
+
+ ...other eqns...
+ showLitChar c = showString ('\\' : asciiTab!!ord c)
+
+ notice that the (asciiTab!!orc c) is outside the \s, so GHC can't rewrite it to
+
+ showLitChar c = \s -> showString ('\\' : asciiTab!!ord c) s
+
+ So I've changed PrelShow.showLitChar to use explicit \s. Even then, showl
+ doesn't work, because GHC can't see that showl xs can be pushed inside the \s.
+ So I've put an explict \s there too.
+
+ showl "" s = showChar '"' s
+ showl ('"':xs) s = showString "\\\"" (showl xs s)
+ showl (x:xs) s = showLitChar x (showl xs s)
+
+ Net result: imaginary/gen_regexps more than halves in allocation!
+
+ Turns out that the mkLamBndrZapper bug (above) meant that showl was
+ erroneously inlining showLitChar x and showl xs, which is why this
+ problem hasn't shown up before.
+
+* Improve CSE a bit. In ptic
+ case h x of y -> ...(h x)...
+ replaces (h x) by y.
+
+* Inline INLINE things very agressively, even though we get code duplication
+ thereby. Reason: otherwise we sometimes call the original un-inlined INLINE
+ defns, which have constructors etc still un-inlined in their RHSs. The
+ improvement is dramatic for a few programs:
+
+ typecheck 150865k -1.43%
+ wave4main 114216k -22.87%
+ boyer 28793k -7.86%
+ cichelli 33786k -14.28%
+ ida 59505k -1.79%
+ rewrite 14665k -4.91%
+ sched 17641k -4.22%
+
+ Code size increases by 10% which is not so good. There must be a better way.
+ Another bad thing showed up in fish/Main.hs. Here we have
+ (x1,y1) `vec_add` (x2,y2) = (x1+x2, y1+y2)
+ which tends to get inlined. But if we first inline (+), it looks big,
+ so we don't inline it. Sigh.
+
+
+* Don't inline constructors in INLINE RHSs. Ever. Otherwise rules don't match.
+ E.g. build
+
+* In ebnf2ps/Lexer.uncommentString, it would be a good idea to inline a constructor
+ that occurs once in each branch of a case. That way it doesn't get allocated
+ in the branches that don't use it. And in fact in this particular case
+ something else good happens. So CoreUnfold now does that.
+
+* Reverted to n_val_binders+2 in calcUnfoldingGuidance
+ Otherwise wrappers are inlined even if there's no benefit.
+
+
+Week of Mon 18 Oct
+~~~~~~~~~~
+* Arrange that simplConArgs works in one less pass than before.
+ This exposed a bug: a bogus call to completeBeta.
+
+* Add a top-level flag in CoreUnfolding, used in callSiteInline
+
+* Extend w/w to use etaExpandArity, so it does eta/coerce expansion
+
+* Don't float anything out of an INLINE.
+ Don't float things to top level unless they also escape a value lambda.
+ [see comments with SetLevels.lvlMFE
+ Without at least one of these changes, I found that
+ {-# INLINE concat #-}
+ concat = __inline (/\a -> foldr (++) [])
+ was getting floated to
+ concat = __inline( /\a -> lvl a )
+ lvl = ...inlined version of foldr...
+
+ Subsequently I found that not floating constants out of an INLINE
+ gave really bad code like
+ __inline (let x = e in \y -> ...)
+ so I now let things float out of INLINE
+
+* Implement inline phases. The meaning of the inline pragmas is
+ described in CoreUnfold.lhs
+
+* Implement the "reverse-mapping" idea for CSE; actually it turned out to be easier
+ to implement it in SetLevels, and may benefit full laziness too.
+
+Thurs 14 Oct
+~~~~~~~~~~~~
+* It's a good idea to inline inRange. Consider
+
+ index (l,h) i = case inRange (l,h) i of
+ True -> l+i
+ False -> error
+ inRange itself isn't strict in h, but if it't inlined then 'index'
+ *does* become strict in h. Interesting!
+
+* Big change to the way unfoldings and occurrence info is propagated in the simplifier
+ The plan is described in Subst.lhs with the Subst type
+ Occurrence info is now in a separate IdInfo field than user pragmas
+
+* I found that
+ (coerce T (coerce S (\x.e))) y
+ didn't simplify in one round. First we get to
+ (\x.e) y
+ and only then do the beta. Solution: cancel the coerces in the continuation
+
+* Amazingly, CoreUnfold wasn't counting the cost of a function an application.
+
+Early Oct
+~~~~~~~~~
+* No commas between for-alls in RULES
+
+* Disable rules in initial simplifier run. Otherwise full laziness
+ doesn't get a chance to lift out a MFE before a rule (e.g. fusion)
+ zaps it. queens is a case in point
+
+* Improve float-out stuff significantly. The big change is that if we have
+
+ \x -> ... /\a -> ...let p = ..a.. in let q = ...p...
+
+ where p's rhs doesn't x, we abstract a from p, so that we can get p past x.
+ (We did that before.) But we also substitute (p a) for p in q, and then
+ we can do the same thing for q. (We didn't do that, so q got stuck.)
+ This is much better. It involves doing a substitution "as we go" in SetLevels,
+ though.
+
+
+Weds 15 Sept
+~~~~~~~~~~~~
+* exprIsDupable for an application (f e1 .. en) wasn't calling exprIsDupable
+ on the arguments!! So applications with few, but large, args were being dupliated.
+
+* sizeExpr on an application wasn't doing a nukeScrutDiscount on the arg of
+ an application!! So bogus discounts could accumulate from arguments!
+
+* Improve handling of INLINE pragmas in calcUnfoldingGuidance. It was really
+ wrong before
+
+* Substantially improve handling of coerces in worker/wrapper
+
+Tuesday 6 June
+~~~~~~~~~~~~~~
+* Fix Kevin Atkinson's cant-find-instance bug. Turns out that Rename.slurpSourceRefs
+ needs to repeatedly call getImportedInstDecls, and then go back to slurping
+ source-refs. Comments with Rename.slurpSourceRefs.
+
+* Add a case to Simplify.mkDupableAlt for the quite-common case where there's
+ a very simple alternative, in which case there's no point in creating a
+ join-point binding.
+
+* Fix CoreUtils.exprOkForSpeculation so that it returns True of (==# a# b#).
+ This lack meant that
+ case ==# a# b# of { True -> x; False -> x }
+ was not simplifying
+
+* Make float-out dump bindings at the top of a function argument, as
+ at the top of a let(rec) rhs. See notes with FloatOut.floatRhs
+
+* Make the ArgOf case of mkDupableAlt generate a OneShot lambda.
+ This gave a noticeable boost to spectral/boyer2
+
+
+Monday 5 June
+~~~~~~~~~~~~~
+Work, using IO.hPutStr as an example, to reduce the number of coerces.
+The main idea is in WwLib.mkWWcoerce. The gloss is that we must do
+the w/w split even for small non-recursive things. See notes with
+WorkWrap.tryWw.
+
+
+Friday 2 June
+~~~~~~~~~~~~~
+Study why gen_regexps is slower than before. Problem is in IO.writeLines,
+in particular the local defn shoveString. Two things are getting
+in the way of arity expansion, which means we build far more function
+closures than we should:
+ shove = \ x -> let lvl = \s -> ...
+ in \s -> ... lvl ...
+
+The two things are:
+ a) coerces
+ b) full laziness floats
+
+
+Solution to (a): add coerces to the worker/wrapper stuff.
+See notes with WwLib.mkWWcoerce.
+
+This further complicated getWorkerId, so I finally bit the bullet and
+make the workerInfo field of the IdInfo work properly, including
+under substitutions. Death to getWorkerId.
+
+
+
+Solution to (b): make all lambdas over realWorldStatePrimTy
+into one-shot lambdas. This is a GROSS HACK.
+
+* Also make the occurrence analyser aware of one-shot lambdas.
+
+
+Thurs 1 June
+~~~~~~~~~~~~
+Fix SetLevels so that it does not clone top-level bindings, but it
+*does* clone bindings that are destined for the top level.
+
+The global invariant is that the top level bindings are always
+unique, and never cloned.