summaryrefslogtreecommitdiff
path: root/compiler/cmm/CmmCommonBlockElim.hs
Commit message (Collapse)AuthorAgeFilesLines
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-320/+0
|
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-151-1/+0
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Fix unused-import warningsDavid Eichmann2018-11-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a fairly long-standing bug (dating back to 2015) in RdrName.bestImport, namely commit 9376249b6b78610db055a10d05f6592d6bbbea2f Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Wed Oct 28 17:16:55 2015 +0000 Fix unused-import stuff in a better way In that patch got the sense of the comparison back to front, and thereby failed to implement the unused-import rules described in Note [Choosing the best import declaration] in RdrName This led to Trac #13064 and #15393 Fixing this bug revealed a bunch of unused imports in libraries; the ones in the GHC repo are part of this commit. The two important changes are * Fix the bug in bestImport * Modified the rules by adding (a) in Note [Choosing the best import declaration] in RdrName Reason: the previosu rules made Trac #5211 go bad again. And the new rule (a) makes sense to me. In unravalling this I also ended up doing a few other things * Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the things that are used, rather than [AvailInfo]. This is simpler and more direct. * Rename greParentName to greParent_maybe, to follow GHC naming conventions * Delete dead code RdrName.greUsedRdrName Bumps a few submodules. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27 Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5312
* Optimizations for CmmBlockElim.klebinger.andreas@gmx.at2018-06-021-17/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Use toBlockList instead of revPostorder. Block elimination works on a given Cmm graph by: * Getting a list of blocks. * Looking for duplicates in these blocks. * Removing all but one instance of duplicates. There are two (reasonable) ways to get the list of blocks. * The fast way: `toBlockList` This just flattens the underlying map into a list. * The convenient way: `revPostorder` Start at the entry label, scan for reachable blocks and return only these. This has the advantage of removing all dead code. If there is dead code the later is better. Work done on unreachable blocks is clearly wasted work. However by the point we run the common block elimination pass the input graph already had all dead code removed. This is done during control flow optimization in CmmContFlowOpt which is our first Cmm pass. This means common block elimination is free to use toBlockList because revPostorder would return the same blocks. (Although in a different order). * Change the triemap used for grouping by a label list from `(TM.ListMap UniqDFM)` to `ListMap (GenMap LabelMap)`. * Using GenMap offers leaf compression. Which is a trie optimization described by the Note [Compressed TrieMap] in CoreSyn/TrieMap.hs * Using LabelMap removes the overhead associated with UniqDFM. This is deterministic since if we have the same input keys the same LabelMap will be constructed. Test Plan: ci, profiling output Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: dfeuer, thomie, carter GHC Trac Issues: #15103 Differential Revision: https://phabricator.haskell.org/D4597
* Allow CmmLabelDiffOff with different widthsSimon Marlow2018-05-161-1/+1
| | | | | | | | | | | | | | | | | | Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
* Revert "CmmPipeline: add a second pass of CmmCommonBlockElim"Michal Terepeta2018-04-131-4/+4
| | | | | | | | | | | | | | | | This reverts commit d5c4d46a62ce6a0cfa6440344f707136eff18119. Please see #14989 for details. Test Plan: ./validate Reviewers: bgamari, simonmar Subscribers: thomie, carter GHC Trac Issues: #14989 Differential Revision: https://phabricator.haskell.org/D4577
* CmmPipeline: add a second pass of CmmCommonBlockElimMichal Terepeta2018-03-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sinking pass often gets rid of unnecessary registers registers/assignements exposing more opportunities for CBE, so this commit adds a second round of CBE after the sinking pass and should fix #12915 (and some examples in #14226). Nofib results: * Binary size: 0.9% reduction on average * Compile allocations: 0.7% increase on average * Runtime: noisy, two separate runs of nofib showed a tiny reduction on average, (~0.2-0.3%), but I think this is mostly noise * Compile time: very noisy, but generally within +/- 0.5% (one run faster, one slower) One interesting part of this change is that running CBE invalidates results of proc-point analysis. But instead of re-doing the whole analysis, we can use the map that CBE creates for replacing/comparing block labels (maps a redundant label to a useful one) to update the results of proc-point analysis. This lowers the overhead compared to the previous experiment in #12915. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #12915, #14226 Differential Revision: https://phabricator.haskell.org/D4417
* Hoopl: improve postorder calculationMichal Terepeta2018-03-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fix the naming and comments to indicate that we are calculating *reverse* postorder (and not the standard postorder). - Rewrite the calculation to avoid CPS code. I found it fairly difficult to understand and the new one seems faster (according to nofib, decreases compiler allocations by 0.2%) - Remove `LabelsPtr`, which seems unnecessary and could be *really* confusing. For instance, previously: `postorder_dfs_from <block with label X>` and `postorder_dfs_from <label X>` would actually mean quite different things (and give different results). - Change the `Dataflow` module to always use entry of the graph for reverse postorder calculation. This should be the only change in behavior of this commit. Previously, if the caller provided initial facts for some of the labels, we would use those labels for our postorder calculation. However, I don't think that's correct in general - if the initial facts did not contain the entry of the graph, we would never analyze the blocks reachable from the entry but unreachable from the labels provided with the initial facts. It seems that the only analysis that used this was proc-point analysis, which I think would always include the entry block (so I don't think there's any bug due to this). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4464
* cmm/: Avoid using lazy left foldsMichal Terepeta2018-03-061-1/+2
| | | | | | | | | | | | | | | | | | This basically replaces all uses of `foldl` with `foldl'`. I've looked at all the call sites and there doesn't seem to be any reason to prefer the lazy version. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4463
* CBE: re-introduce bgamari's fixesMichal Terepeta2018-02-181-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | During some recent work on CBE we discovered that `zipWith` is used to check for equality, but that doesn't quite work if lists are of different lengths! This was fixed by bgamari, but unfortunately the fix had to be rolled back due to other changes in CBE in 50adbd7c5fe5894d3e6e2a58b353ed07e5f8949d. Since I wanted to have another look at CBE anyway, we agreed that the first thing to do would be to re-introduce the fix. Sadly I don't have any actual test case that would exercise this. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14226 Differential Revision: https://phabricator.haskell.org/D4387
* cmm: Revert more aggressive CBE due to #14226Ben Gamari2018-02-031-235/+85
| | | | | | | | | | | | | | | | | Trac #14226 noted that the C-- CBE pass frequently fails to common up semantically identical blocks due to the differences in local register naming. These patches fixed this by making the pass consider equality up to alpha-renaming. However, the new logic failed to consider the possibility that local register naming *may* matter across multiple blocks. This lead to the regression #14754. I'll need to do a bit of thinking on a proper solution to this but in the meantime I'm reverting all four patches. This reverts commit a27056f9823f8bbe2302f1924b3ab38fd6752e37. This reverts commit 6f990c54f922beae80362fe62426beededc21290. This reverts commit 9aa73892e10e90a1799b9277da593e816a827364. This reverts commit 7920a7d9c53083b234e060a3e72f00b601a46808.
* Hoopl.Collections: change right folds to strict left foldsMichal Terepeta2018-02-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | It seems that most uses of these folds should be strict left folds (I could only find a single place that benefits from a right fold). So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey` replaces them with: - `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds) - `setFoldr`/`mapFoldr` (for the less common case where a right fold actually makes sense, e.g., `CmmProcPoint`) Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter, kavon Differential Revision: https://phabricator.haskell.org/D4356
* cmm/CBE: Fix a few more zip usesBen Gamari2017-11-061-3/+8
| | | | | | | | | | | | | | | Ensure that we don't consider lists of equal length to be equal when they are not. I noticed these while working on the fix for #14361. Reviewers: austin, simonmar, michalt Reviewed By: michalt Subscribers: rwbarton, thomie GHC Trac Issues: #14361 Differential Revision: https://phabricator.haskell.org/D4153
* cmm/CBE: Fix comparison between blocks of different lengthsBen Gamari2017-11-061-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously CBE computed equality by taking the lists of middle nodes of the blocks being compared and zipping them together. It would then map over this list with the equality relation, and accumulate the result. However, this is completely wrong: Consider what will happen when we compare a block with no middle nodes with one with one or more. The result of `zip` will be empty and consequently the pass may conclude that the two are indeed equivalent (if their last nodes also match). This is very bad and the cause of #14361. The solution I chose was just to write out an explicit recursion, like I distinctly recall considering doing when I first wrote this code. Unfortunately I was feeling clever at the time. Unfortunately this case was just rare enough not to be triggered by the testsuite. I still need to find a testcase that doesn't have external dependencies. Test Plan: Need to find a more minimal testcase Reviewers: austin, simonmar, michalt Reviewed By: michalt Subscribers: michalt, rwbarton, thomie, hvr GHC Trac Issues: #14361 Differential Revision: https://phabricator.haskell.org/D4152
* cmm/CBE: Use foldLocalRegsDefdBen Gamari2017-09-211-74/+94
| | | | | | | | | | | | | | | | | | Simonpj suggested this as a follow-on to #14226 to avoid code duplication. This also gives us the ability to CBE cases involving foreign calls for free. Test Plan: Validate Reviewers: austin, simonmar, simonpj Reviewed By: simonpj Subscribers: michalt, simonpj, rwbarton, thomie GHC Trac Issues: #14226 Differential Revision: https://phabricator.haskell.org/D3999
* cmm/CBE: Collapse blocks equivalent up to alpha renaming of local registersBen Gamari2017-09-191-58/+179
| | | | | | | | | | | | | | | | | | | | | | | As noted in #14226, the common block elimination pass currently implements an extremely strict equivalence relation, demanding that two blocks are equivalent including the names of their local registers. This is quite restrictive and severely hampers the effectiveness of the pass. Here we allow the CBE pass to collapse blocks which are equivalent up to alpha renaming of locally-bound local registers. This is completely safe and catches many more duplicate blocks. Test Plan: Validate Reviewers: austin, simonmar, michalt Reviewed By: michalt Subscribers: rwbarton, thomie GHC Trac Issues: #14226 Differential Revision: https://phabricator.haskell.org/D3973
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-1/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* CmmCommonBlockElim: Ignore CmmUnwind nodesBen Gamari2017-01-101-1/+1
| | | | | | | | | | | | | | | | We don't want unwind information to affect the code we produce. Consequently we need to ensure that CBE ignores unwind nodes for the purposes of equality. Test Plan: Validate Reviewers: scpmw, simonmar, austin Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2739
* BlockId: remove BlockMap and BlockSet synonymsMichal Terepeta2016-12-081-4/+4
| | | | | | | | | | | | | | | | | | | | This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
* Document some codegen nondeterminismBartosz Nitka2016-07-071-1/+2
| | | | | | | | | Bit-for-bit reproducible binaries are not a goal for now, so this is just marking places that could be a problem. Doing this will allow eltsUFM to be removed and will leave only nonDetEltsUFM. GHC Trac: #4012
* Delete Ord UniqueBartosz Nitka2016-06-301-3/+5
| | | | | | | | | | | | | | | | | | | | | | Ord Unique can be a source of invisible, accidental nondeterminism as explained in Note [No Ord for Unique]. This removes it, leaving a note with rationale. It's unfortunate that I had to write Ord instances for codegen data structures by hand, but I believe that it's a right trade-off here. Test Plan: ./validate Reviewers: simonmar, austin, bgamari Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2370 GHC Trac Issues: #4012
* Annotate CmmBranch with an optional likely targetSimon Marlow2015-09-231-3/+3
| | | | | | | | | | | | | | | | | Summary: This allows the code generator to give hints to later code generation steps about which branch is most likely to be taken. Right now it is only taken into account in one place: a special case in CmmContFlowOpt that swapped branches over to maximise the chance of fallthrough, which is now disabled when there is a likelihood setting. Test Plan: validate Reviewers: austin, simonpj, bgamari, ezyang, tibbe Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1273
* CmmCommonBlockElim: Improve hash functionJoachim Breitner2015-05-181-45/+26
| | | | | | | | | | | | | | | | Previously, the hash function used to cut down the number of block comparisons did not take local registers into account, causing far too many similar, but different bocks to be considered candidates for the (expensive!) comparision. Adding register to the hash takes CmmCommonBlockElim's share of the runtime of the example in #10397 from 17% to 2.5%, and eliminates all unwanted hash collisions. This patch also replaces the fancy trie by a plain Data.Map. It turned out to be not performance critical, so this simplifies the code. Differential Revision: https://phabricator.haskell.org/D896
* Speed up elimCommonBlocks by grouping blocks also by outgoing labelsJoachim Breitner2015-05-161-31/+112
| | | | | | | | | This is an attempt to improve the situation described in #10397, where the linear scan of possible candidates for commoning up is far too expensive. There is (ever) more room for improvement, but this is a start. Differential Revision: https://phabricator.haskell.org/D892
* Refactor the story around switches (#10137)Joachim Breitner2015-03-301-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-implements the code generation for case expressions at the Stg → Cmm level, both for data type cases as well as for integral literal cases. (Cases on float are still treated as before). The goal is to allow for fancier strategies in implementing them, for a cleaner separation of the strategy from the gritty details of Cmm, and to run this later than the Common Block Optimization, allowing for one way to attack #10124. The new module CmmSwitch contains a number of notes explaining this changes. For example, it creates larger consecutive jump tables than the previous code, if possible. nofib shows little significant overall improvement of runtime. The rather large wobbling comes from changes in the code block order (see #8082, not much we can do about it). But the decrease in code size alone makes this worthwhile. ``` Program Size Allocs Runtime Elapsed TotalMem Min -1.8% 0.0% -6.1% -6.1% -2.9% Max -0.7% +0.0% +5.6% +5.7% +7.8% Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0% ``` Compilation time increases slightly: ``` -1 s.d. ----- -2.0% +1 s.d. ----- +2.5% Average ----- +0.3% ``` The test case T783 regresses a lot, but it is the only one exhibiting any regression. The cause is the changed order of branches in an if-then-else tree, which makes the hoople data flow analysis traverse the blocks in a suboptimal order. Reverting that gets rid of this regression, but has a consistent, if only very small (+0.2%), negative effect on runtime. So I conclude that this test is an extreme outlier and no reason to change the code. Differential Revision: https://phabricator.haskell.org/D720
* Add unwind information to CmmPeter Wortmann2014-12-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Unwind information allows the debugger to discover more information about a program state, by allowing it to "reconstruct" other states of the program. In practice, this means that we explain to the debugger how to unravel stack frames, which comes down mostly to explaining how to find their Sp and Ip register values. * We declare yet another new constructor for CmmNode - and this time there's actually little choice, as unwind information can and will change mid-block. We don't actually make use of these capabilities, and back-end support would be tricky (generate new labels?), but it feels like the right way to do it. * Even though we only use it for Sp so far, we allow CmmUnwind to specify unwind information for any register. This is pretty cheap and could come in useful in future. * We allow full CmmExpr expressions for specifying unwind values. The advantage here is that we don't have to make up new syntax, and can e.g. use the WDS macro directly. On the other hand, the back-end will now have to simplify the expression until it can sensibly be converted into DWARF byte code - a process which might fail, yielding NCG panics. On the other hand, when you're writing Cmm by hand you really ought to know what you're doing. (From Phabricator D169)
* Tick scopesPeter Wortmann2014-12-161-6/+13
| | | | | | | | | | | | | | | | | | | | | | This patch solves the scoping problem of CmmTick nodes: If we just put CmmTicks into blocks we have no idea what exactly they are meant to cover. Here we introduce tick scopes, which allow us to create sub-scopes and merged scopes easily. Notes: * Given that the code often passes Cmm around "head-less", we have to make sure that its intended scope does not get lost. To keep the amount of passing-around to a minimum we define a CmmAGraphScoped type synonym here that just bundles the scope with a portion of Cmm to be assembled later. * We introduce new scopes at somewhat random places, aligning with getCode calls. This works surprisingly well, but we might have to add new scopes into the mix later on if we find things too be too coarse-grained. (From Phabricator D169)
* Source notes (Cmm support)Peter Wortmann2014-12-161-4/+31
| | | | | | | | | | | | | | | | | | | | This patch adds CmmTick nodes to Cmm code. This is relatively straight-forward, but also not very useful, as many blocks will simply end up with no annotations whatosever. Notes: * We use this design over, say, putting ticks into the entry node of all blocks, as it seems to work better alongside existing optimisations. Now granted, the reason for this is that currently GHC's main Cmm optimisations seem to mainly reorganize and merge code, so this might change in the future. * We have the Cmm parser generate a few source notes as well. This is relatively easy to do - worst part is that it complicates the CmmParse implementation a bit. (From Phabricator D169)
* Fix a bug in stack layout with safe foreign calls (#8083)Simon Marlow2013-07-241-1/+1
| | | | | | | We weren't properly tracking the number of stack arguments in the continuation of a foreign call. It happened to work when the continuation was not a join point, but when it was a join point we were using the wrong amount of stack fixup.
* Remove warning-suppression (not needed)Simon Peyton Jones2013-03-091-5/+0
|
* Add Cmm support for representing 128-bit-wide SIMD vectors.Geoffrey Mainland2013-02-011-0/+1
|
* Track liveness of GlobalRegs in the new code generatorSimon Marlow2012-07-091-3/+3
| | | | | | This gives the register allocator access to R1.., F1.., D1.. etc. for the new code generator, and is a cheap way to eliminate all the extra "x = R1" assignments that we get from copyIn.
* Remove "fuel", adapt to Hoopl changes, fix warningsSimon Marlow2012-07-051-3/+2
|
* Improve common-block eliminationSimon Marlow2012-03-071-2/+42
| | | | | | | | We need to compare middle nodes and expressions modulo the BlockId mapping too, because there are references to BlockIds in CmmStackSlot and CmmBlock. This lets us catch more common blocks - in particular we can share the heap-check fail code between multiple case alternatives, which is most cool.
* Fix a bug in common block eliminationSimon Marlow2012-03-071-1/+2
| | | | | When we had more than two identical blocks, we weren't eliminating all the duplicates properly.
* SnapshotSimon Marlow2012-01-171-8/+9
|
* More codegen refactoring with simonpjSimon Marlow2011-12-191-42/+34
|
* Clarify some commentsIan Lynagh2011-11-051-1/+2
|
* Snapshot of codegen refactoring to share with simonpjSimon Marlow2011-08-251-1/+1
|
* Merge in new code generator branch.Simon Marlow2011-01-241-0/+174
This changes the new code generator to make use of the Hoopl package for dataflow analysis. Hoopl is a new boot package, and is maintained in a separate upstream git repository (as usual, GHC has its own lagging darcs mirror in http://darcs.haskell.org/packages/hoopl). During this merge I squashed recent history into one patch. I tried to rebase, but the history had some internal conflicts of its own which made rebase extremely confusing, so I gave up. The history I squashed was: - Update new codegen to work with latest Hoopl - Add some notes on new code gen to cmm-notes - Enable Hoopl lag package. - Add SPJ note to cmm-notes - Improve GC calls on new code generator. Work in this branch was done by: - Milan Straka <fox@ucw.cz> - John Dias <dias@cs.tufts.edu> - David Terei <davidterei@gmail.com> Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD and fixed a few bugs.