summaryrefslogtreecommitdiff
path: root/compiler/cmm/CmmPipeline.hs
Commit message (Collapse)AuthorAgeFilesLines
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-367/+0
|
* Fix typos, via a Levenshtein-style correctorBrian Wignall2020-01-041-1/+1
|
* Add GHC-API logging hooksSylvain Henry2019-12-181-12/+13
| | | | | | | | | | | | | | | | | | | | | | | * Add 'dumpAction' hook to DynFlags. It allows GHC API users to catch dumped intermediate codes and information. The format of the dump (Core, Stg, raw text, etc.) is now reported allowing easier automatic handling. * Add 'traceAction' hook to DynFlags. Some dumps go through the trace mechanism (for instance unfoldings that have been considered for inlining). This is problematic because: 1) dumps aren't written into files even with -ddump-to-file on 2) dumps are written on stdout even with GHC API 3) in this specific case, dumping depends on unsafe globally stored DynFlags which is bad for GHC API users We introduce 'traceAction' hook which allows GHC API to catch those traces and to avoid using globally stored DynFlags. * Avoid dumping empty logs via dumpAction/traceAction (but still write empty files to keep the existing behavior)
* Make dynflag argument for withTiming pure.Andreas Klebinger2019-10-231-1/+1
| | | | | | | | | | | | 19 times out of 20 we already have dynflags in scope. We could just always use `return dflags`. But this is in fact not free. When looking at some STG code I noticed that we always allocate a closure for this expression in the heap. Clearly a waste in these cases. For the other cases we can either just modify the callsite to get dynflags or use the _D variants of withTiming I added which will use getDynFlags under the hood.
* ErrUtils: split withTiming into withTiming and withTimingSilentAlp Mestanogullari2019-09-191-1/+1
| | | | | | | | | | | | | | | | | | | 'withTiming' becomes a function that, when passed '-vN' (N >= 2) or '-ddump-timings', will print timing (and possibly allocations) related information. When additionally built with '-eventlog' and executed with '+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent' events to the eventlog. 'withTimingSilent' on the other hand will never print any timing information, under any circumstance, and will only emit 'traceEvent' events to the eventlog. As pointed out in !1672, 'traceMarker' is better suited for things that we might want to visualize in tools like eventlog2html, while 'traceEvent' is better suited for internal events that occur a lot more often and that we don't necessarily want to visualize. This addresses #17138 by using 'withTimingSilent' for all the codegen bits that are expressed as a bunch of small computations over streams of codegen ASTs.
* compiler: emit finer grained codegen events to eventlogAlp Mestanogullari2019-08-021-1/+5
|
* Change behaviour of -ddump-cmm-verbose to dump each Cmm pass output to a ↵nineonine2019-07-261-6/+7
| | | | separate file and add -ddump-cmm-verbose-by-proc to keep old behaviour (#16930)
* Add two CmmSwitch optimizations.Andreas Klebinger2019-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move switch expressions into a local variable when generating switches. This avoids duplicating the expression if we translate the switch to a tree search. This fixes #16933. Further we now check if all branches of a switch have the same destination, replacing the switch with a direct branch if that is the case. Both of these patterns appear in the ENTER macro used by the RTS but are unlikely to occur in intermediate Cmm generated by GHC. Nofib result summary: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- Min -0.0% -0.0% -15.7% -15.6% 0.0% Max -0.0% 0.0% +5.4% +5.5% 0.0% Geometric Mean -0.0% -0.0% -1.0% -1.0% -0.0% Compiler allocations go up slightly: +0.2% Example output before and after the change taken from RTS code below. All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated. Instead the data is loaded once from memory in block c6. Also the switch in block `ud` in the original code has been eliminated completely. Cmm without this commit: ``` stg_ap_0_fast() { // [R1] { [] } {offset ca: _c1::P64 = R1; // CmmAssign goto c2; // CmmBranch c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6; c6: _c3::I64 = I64[_c1::P64]; if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug; ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue; uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud; ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) { case 8, 9, 10, 11, 12, 13, 14 : goto c4; } ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf; uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4; c4: R1 = _c1::P64; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui; uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8; ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7; c8: _c1::P64 = P64[_c1::P64 + 8]; goto c2; c7: R1 = _c1::P64; call (_c3::I64)(R1) args: 8, res: 0, upd: 8; } } ``` Cmm with this commit: ``` stg_ap_0_fast() { // [R1] { [] } {offset ca: _c1::P64 = R1; goto c2; c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6; c6: _c3::I64 = I64[_c1::P64]; _ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]); if (_ub::I64 < 26) goto uc; else goto uh; uc: if (_ub::I64 < 15) goto ud; else goto uf; ud: if (_ub::I64 < 8) goto c7; else goto c4; uf: if (_ub::I64 >= 25) goto c4; else goto ug; ug: if (_ub::I64 != 23) goto c7; else goto c4; c4: R1 = _c1::P64; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; uh: if (_ub::I64 < 28) goto ui; else goto uj; ui: if (_ub::I64 < 27) goto c7; else goto c8; uj: if (_ub::I64 < 29) goto c8; else goto c7; c8: _c1::P64 = P64[_c1::P64 + 8]; goto c2; c7: R1 = _c1::P64; call (_c3::I64)(R1) args: 8, res: 0, upd: 8; } } ```
* Move 'Platform' to ghc-bootJohn Ericson2019-06-191-1/+1
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* PPC NCG: Remove Darwin supportPeter Trommler2019-01-011-7/+0
| | | | | | | Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* An overhaul of the SRT representationSimon Marlow2018-05-161-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Previously we would hvae a single big table of pointers per module, with a set of bitmaps to reference entries within it. The new representation is identical to a static constructor, which is much simpler for the GC to traverse, and we get to remove the complicated bitmap-traversal code from the GC. - Rewrite all the code to generate SRTs in CmmBuildInfoTables, and document it much better (see Note [SRTs]). This has been something I've wanted to do since we moved to the new code generator, I finally had the opportunity to finish it while on a transatlantic flight recently :) There are a series of 4 diffs: 1. D4632 (this one), which does the bulk of the changes 2. D4633 which adds support for smaller `CmmLabelDiffOff` constants 3. D4634 which takes advantage of D4632 and D4633 to save a word in info tables that have an SRT on x86_64. This is where most of the binary size improvement comes from. 4. D4637 which makes a further optimisation to merge some SRTs with static FUN closures. This adds some complexity and the benefits are fairly modest, so it's not clear yet whether we should do this. Results (after (3), on x86_64) - GHC itself (staticaly linked) is 5.2% smaller - -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176 - I measured the overhead of traversing all the static objects in a major GC in GHC itself by doing `replicateM_ 1000 performGC` as the first thing in `Main.main`. The new version was 5-10% faster, but the results did vary quite a bit. - I'm not sure if there's a compile-time difference, the results are too unreliable. Test Plan: validate Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4632
* Revert "CmmPipeline: add a second pass of CmmCommonBlockElim"Michal Terepeta2018-04-131-39/+3
| | | | | | | | | | | | | | | | This reverts commit d5c4d46a62ce6a0cfa6440344f707136eff18119. Please see #14989 for details. Test Plan: ./validate Reviewers: bgamari, simonmar Subscribers: thomie, carter GHC Trac Issues: #14989 Differential Revision: https://phabricator.haskell.org/D4577
* CmmPipeline: add a second pass of CmmCommonBlockElimMichal Terepeta2018-03-271-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sinking pass often gets rid of unnecessary registers registers/assignements exposing more opportunities for CBE, so this commit adds a second round of CBE after the sinking pass and should fix #12915 (and some examples in #14226). Nofib results: * Binary size: 0.9% reduction on average * Compile allocations: 0.7% increase on average * Runtime: noisy, two separate runs of nofib showed a tiny reduction on average, (~0.2-0.3%), but I think this is mostly noise * Compile time: very noisy, but generally within +/- 0.5% (one run faster, one slower) One interesting part of this change is that running CBE invalidates results of proc-point analysis. But instead of re-doing the whole analysis, we can use the map that CBE creates for replacing/comparing block labels (maps a redundant label to a useful one) to update the results of proc-point analysis. This lowers the overhead compared to the previous experiment in #12915. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #12915, #14226 Differential Revision: https://phabricator.haskell.org/D4417
* cmm: Revert more aggressive CBE due to #14226Ben Gamari2018-02-031-1/+1
| | | | | | | | | | | | | | | | | Trac #14226 noted that the C-- CBE pass frequently fails to common up semantically identical blocks due to the differences in local register naming. These patches fixed this by making the pass consider equality up to alpha-renaming. However, the new logic failed to consider the possibility that local register naming *may* matter across multiple blocks. This lead to the regression #14754. I'll need to do a bit of thinking on a proper solution to this but in the meantime I'm reverting all four patches. This reverts commit a27056f9823f8bbe2302f1924b3ab38fd6752e37. This reverts commit 6f990c54f922beae80362fe62426beededc21290. This reverts commit 9aa73892e10e90a1799b9277da593e816a827364. This reverts commit 7920a7d9c53083b234e060a3e72f00b601a46808.
* cmm/CBE: Use foldLocalRegsDefdBen Gamari2017-09-211-1/+1
| | | | | | | | | | | | | | | | | | Simonpj suggested this as a follow-on to #14226 to avoid code duplication. This also gives us the ability to CBE cases involving foreign calls for free. Test Plan: Validate Reviewers: austin, simonmar, simonpj Reviewed By: simonpj Subscribers: michalt, simonpj, rwbarton, thomie GHC Trac Issues: #14226 Differential Revision: https://phabricator.haskell.org/D3999
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Add support for producing position-independent executablesBen Gamari2017-08-221-1/+1
| | | | | | | | | | | | | | | | Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* procPointAnalysis doesn't need UniqSMMichal Terepeta2016-12-151-2/+2
| | | | | | | | | | | | | | | | | | `procPointAnalysis` doesn't need to run in `UniqSM` (it consists of a single `return` and the call to `analyzeCmm` function which is pure). Making it non-monadic simplifies the code a bit. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2837
* CodeGen: Way to dump cmm only once (#11717)Vladimir Trubilov2016-07-171-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | The `-ddump-cmm` put all stages of Cmm processing into one output. This patch changes its behavior and adds two more options to make Cmm dumping flexible. - `-ddump-cmm-from-stg` dumps only initial version of Cmm right after STG->Cmm codegen - `-ddump-cmm` dumps the final result of the Cmm pipeline processing - `-ddump-cmm-verbose` dumps intermediate output of each Cmm pipeline step - `-ddump-cmm-proc` and `-ddump-cmm-caf` seems were lost. Now enabled Test Plan: ./validate Reviewers: thomie, simonmar, austin, bgamari Reviewed By: thomie, simonmar Subscribers: simonpj, thomie Differential Revision: https://phabricator.haskell.org/D2393 GHC Trac Issues: #11717
* Refactor the story around switches (#10137)Joachim Breitner2015-03-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-implements the code generation for case expressions at the Stg → Cmm level, both for data type cases as well as for integral literal cases. (Cases on float are still treated as before). The goal is to allow for fancier strategies in implementing them, for a cleaner separation of the strategy from the gritty details of Cmm, and to run this later than the Common Block Optimization, allowing for one way to attack #10124. The new module CmmSwitch contains a number of notes explaining this changes. For example, it creates larger consecutive jump tables than the previous code, if possible. nofib shows little significant overall improvement of runtime. The rather large wobbling comes from changes in the code block order (see #8082, not much we can do about it). But the decrease in code size alone makes this worthwhile. ``` Program Size Allocs Runtime Elapsed TotalMem Min -1.8% 0.0% -6.1% -6.1% -2.9% Max -0.7% +0.0% +5.6% +5.7% +7.8% Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0% ``` Compilation time increases slightly: ``` -1 s.d. ----- -2.0% +1 s.d. ----- +2.5% Average ----- +0.3% ``` The test case T783 regresses a lot, but it is the only one exhibiting any regression. The cause is the changed order of branches in an if-then-else tree, which makes the hoople data flow analysis traverse the blocks in a suboptimal order. Reverting that gets rid of this regression, but has a consistent, if only very small (+0.2%), negative effect on runtime. So I conclude that this test is an extreme outlier and no reason to change the code. Differential Revision: https://phabricator.haskell.org/D720
* update commentSimon Marlow2014-08-011-4/+3
|
* Don't use showPass in the backend (#8973)Simon Marlow2014-06-081-2/+0
|
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-0/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Eliminate duplicate code in Cmm pipelineJan Stolarek2014-02-031-51/+30
| | | | | | | | | End of Cmm pipeline used to be split into two alternative flows, depending on whether we did proc-point splitting or not. There was a lot of code duplication between these two branches. But it wasn't really necessary as the differences can be easily enclosed within an if-then-else. I observed no impact of this change on compilation performance.
* Remove unnecessary LANGUAGE pragmaJan Stolarek2014-02-011-5/+0
|
* -ddump-cmm: don't dump the proc point stage if we didn't do anythingSimon Marlow2013-11-281-3/+6
|
* Add debug dump of the list of Cmm proc pointsSimon Peyton Jones2013-11-221-0/+1
|
* Rename -ddump-cmm-rewrite to -ddump-cmm-sinkJan Stolarek2013-09-131-1/+1
| | | | This makes it consistent with the corresponding -cmm-sink flag
* Improve sinking passJan Stolarek2013-09-121-10/+130
| | | | | | | | | | | | | | | | | | | | This commit does two things: * Allows duplicating of global registers and literals by inlining them. Previously we would only inline global register or literal if it was used only once. * Changes method of determining conflicts between a node and an assignment. New method has two advantages. It relies on DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers defined or used by a node should ever change, `conflicts` function will use the changed definition. This definition also catches more cases than the previous one (namely CmmCall and CmmForeignCall) which is a step towards making it possible to run sinking pass before stack layout (currently this doesn't work). This patch also adds a lot of comments that are result of about two-week long investigation of how sinking pass works and why it does what it does.
* Drop proc-points that don't exist in the graph (#8205)Jan Stolarek2013-09-111-3/+3
| | | | | | | | On some architectures it might happen that stack layout pass will invalidate the list of calculated procpoints by dropping some of them. We fix this by checking whether a proc-point is in a graph at the beginning of proc-point analysis. This is a speculative fix for #8205.
* Fix a bug in stack layout with safe foreign calls (#8083)Simon Marlow2013-07-241-4/+2
| | | | | | | We weren't properly tracking the number of stack arguments in the continuation of a foreign call. It happened to work when the continuation was not a join point, but when it was a join point we were using the wrong amount of stack fixup.
* Temporarily disable common block elimination; fixes #8083 for nowIan Lynagh2013-07-231-3/+5
|
* Rewrite usingInconsistentPicReg as a table for clarityGabor Greif2013-04-061-5/+5
| | | | No change in functionality intended
* Fix typosGabor Greif2013-04-061-3/+3
|
* Rename all of the 'cmmz' flags and make them more consistent.Austin Seipp2012-12-191-19/+18
| | | | | | | | | | | | | | | | There's only a single compiler backend now, so the 'z' suffix means nothing. Also, the flags were confusingly named ('cmm-foo' vs 'foo-cmm',) and counter-intuitively, '-ddump-cmm' did not do at all what you expected since the new backend went live. Basically, all of the -ddump-cmmz-* flags are now -ddump-cmm-*. Some were renamed to be more consistent. This doesn't update the manual; it already mentions '-ddump-cmm' and that flag implies all the others anyway, which is probably what you want. Signed-off-by: Austin Seipp <mad.one@gmail.com>
* Tweak commentsIan Lynagh2012-12-021-2/+3
|
* Fix broken -fPIC on Darwin/PPC (#7442)PHO2012-11-241-4/+12
| | | | The workaround described in note [darwin-x86-pic] applies to Darwin/PPC too.
* Remove OldCmm, convert backends to consume new CmmSimon Marlow2012-11-121-0/+12
| | | | | | | | | | | | | | | | | | This removes the OldCmm data type and the CmmCvt pass that converts new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been converted to consume new Cmm. The main difference between the two data types is that conditional branches in new Cmm have both true/false successors, whereas in OldCmm the false case was a fallthrough. To generate slightly better code we occasionally need to invert a conditional to ensure that the branch-not-taken becomes a fallthrough; this was previously done in CmmCvt, and it is now done in CmmContFlowOpt. We could go further and use the Hoopl Block representation for native code, which would mean that we could use Hoopl's postorderDfs and analyses for native code, but for now I've left it as is, using the old ListGraph representation for native code.
* Attach global register liveness info to Cmm procedures.Geoffrey Mainland2012-10-301-3/+3
| | | | | | | All Cmm procedures now include the set of global registers that are live on procedure entry, i.e., the global registers used to pass arguments to the procedure. Only global registers that are use to pass arguments are included in this list.
* Comment to explain why we need to split proc points on x86/Darwin with -fPICSimon Marlow2012-10-241-1/+31
|
* Fix -fPIC on OS X x86Ian Lynagh2012-10-231-0/+6
|
* Refactor the way dump flags are handledIan Lynagh2012-10-181-4/+4
| | | | | | | | | | | | | We were being inconsistent about how we tested whether dump flags were enabled; in particular, sometimes we also checked the verbosity, and sometimes we didn't. This lead to oddities such as "ghc -v4" printing an "Asm code" section which didn't contain any code, and "-v4" enabled some parts of "-ddump-deriv" but not others. Now all the tests use dopt, which also takes the verbosity into account as appropriate.
* Some alpha renamingIan Lynagh2012-10-161-4/+4
| | | | | Mostly d -> g (matching DynFlag -> GeneralFlag). Also renamed if* to when*, matching the Haskell if/when names
* Rename DynFlag to GeneralFlagIan Lynagh2012-10-161-2/+2
| | | | | This avoids confusion due to [DynFlag] and DynFlags being completely different types.
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* splitAtProcPoints: jump to the right place when tablesNextToCode == FalseSimon Marlow2012-09-201-1/+2
|
* make some debug output conditional on -ddump-cmmzSimon Marlow2012-09-181-1/+1
|
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-2/+2
|