summaryrefslogtreecommitdiff
path: root/compiler/cmm/CmmSink.hs
Commit message (Collapse)AuthorAgeFilesLines
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-854/+0
|
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-101-2/+2
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-151-1/+0
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Move 'Platform' to ghc-bootJohn Ericson2019-06-191-1/+1
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-1/+1
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* A few typos [ci skip]Gabor Greif2018-08-301-1/+1
|
* Replace most occurences of foldl with foldl'.klebinger.andreas@gmx.at2018-08-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds foldl' to GhcPrelude and changes must occurences of foldl to foldl'. This leads to better performance especially for quick builds where GHC does not perform strictness analysis. It does change strictness behaviour when we use foldl' to turn a argument list into function applications. But this is only a drawback if code looks ONLY at the last argument but not at the first. And as the benchmarks show leads to fewer allocations in practice at O2. Compiler performance for Nofib: O2 Allocations: -1 s.d. ----- -0.0% +1 s.d. ----- -0.0% Average ----- -0.0% O2 Compile Time: -1 s.d. ----- -2.8% +1 s.d. ----- +1.3% Average ----- -0.8% O0 Allocations: -1 s.d. ----- -0.2% +1 s.d. ----- -0.1% Average ----- -0.2% Test Plan: ci Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal Reviewed By: bgamari, monoidal Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4929
* Hoopl: improve postorder calculationMichal Terepeta2018-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fix the naming and comments to indicate that we are calculating *reverse* postorder (and not the standard postorder). - Rewrite the calculation to avoid CPS code. I found it fairly difficult to understand and the new one seems faster (according to nofib, decreases compiler allocations by 0.2%) - Remove `LabelsPtr`, which seems unnecessary and could be *really* confusing. For instance, previously: `postorder_dfs_from <block with label X>` and `postorder_dfs_from <label X>` would actually mean quite different things (and give different results). - Change the `Dataflow` module to always use entry of the graph for reverse postorder calculation. This should be the only change in behavior of this commit. Previously, if the caller provided initial facts for some of the labels, we would use those labels for our postorder calculation. However, I don't think that's correct in general - if the initial facts did not contain the entry of the graph, we would never analyze the blocks reachable from the entry but unreachable from the labels provided with the initial facts. It seems that the only analysis that used this was proc-point analysis, which I think would always include the entry block (so I don't think there's any bug due to this). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4464
* Be more selective in which conditionals we invertSimon Marlow2018-03-191-22/+33
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, AndreasK, erikd Reviewed By: AndreasK Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4398
* cmm/: Avoid using lazy left foldsMichal Terepeta2018-03-061-2/+3
| | | | | | | | | | | | | | | | | | This basically replaces all uses of `foldl` with `foldl'`. I've looked at all the call sites and there doesn't seem to be any reason to prefer the lazy version. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4463
* Tidy up and consolidate canned CmmReg and CmmGlobalsSimon Marlow2018-02-181-1/+1
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4380
* Invert likeliness when improving conditionalsAlexander Biehl2018-01-291-1/+5
| | | | ... in CmmSink
* CmmSink: Use a IntSet instead of a listalexbiehl2017-11-021-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CmmProcs which have *lots* of local variables take a considerable amount of time in CmmSink. This was noticed by @tdammers in #7258 while compiling files with large records (~200-400 fields). Before: ``` Sun Oct 29 19:58 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs -fforce-recomp -O2 total time = 26.00 secs (25996 ticks @ 1000 us, 1 processor) total alloc = 14,921,627,912 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 55.7 15.9 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 19.5 30.6 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 4.2 9.0 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 4.0 11.1 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.8 6.3 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 1.6 3.7 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 1.5 3.5 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.2 2.4 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 1.2 1.9 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 0.9 1.8 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 0.9 2.1 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 0.8 1.9 ``` After: ``` Sun Oct 29 19:18 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs -fforce-recomp -O2 total time = 13.31 secs (13307 ticks @ 1000 us, 1 processor) total alloc = 15,772,184,488 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 38.3 29.0 sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 13.2 20.3 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 8.3 10.5 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 8.1 8.5 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 5.4 5.9 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 3.1 3.5 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 2.9 3.3 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 2.3 2.3 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 2.1 1.8 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 1.7 2.0 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.6 1.7 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 1.4 1.8 foldNodesBwdOO Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(397,1)-(403,17) 1.1 0.8 ``` Reviewers: austin, bgamari, simonmar Reviewed By: bgamari Subscribers: duog, rwbarton, thomie, tdammers GHC Trac Issues: #7258 Differential Revision: https://phabricator.haskell.org/D4145
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* Improve code generation for conditionalsSimon Peyton Jones2017-04-281-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch in in preparation for the fix to Trac #13397 The code generator has a special case for case tagToEnum (a>#b) of False -> e1 True -> e2 but it was not doing nearly so well on case a>#b of DEFAULT -> e1 1# -> e2 This patch arranges to behave essentially identically in both cases. In due course we can eliminate the special case for tagToEnum#, once we've completed Trac #13397. The changes are: * Make CmmSink swizzle the order of a conditional where necessary; see Note [Improving conditionals] in CmmSink * Hack the general case of StgCmmExpr.cgCase so that it use NoGcInAlts for conditionals. This doesn't seem right, but it's the same choice as the tagToEnum version. Without it, code size increases a lot (more heap checks). There's a loose end here. * Add comments in CmmOpt.cmmMachOpFoldM
* Tweaks and typos in manual, note refs, commentsGabor Greif2017-02-091-1/+1
|
* Typos in comments [ci skip]Gabor Greif2017-01-251-1/+1
|
* BlockId: remove BlockMap and BlockSet synonymsMichal Terepeta2016-12-081-4/+3
| | | | | | | | | | | | | | | | | | | | This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
* Be aware of overlapping global STG registers in CmmSink (#10521)Reid Barton2015-06-251-8/+7
| | | | | | | | | | | | | | | | | | | Summary: On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned the STG registers F1 and D1 the same hardware register (xmm1), and the same for the registers F2 and D2, etc. When mixing calls to functions involving Float#s and Double#s, this can cause wrong Cmm optimizations that assume the F1 and D1 registers are independent. Reviewers: simonpj, austin Reviewed By: austin Subscribers: simonpj, thomie, bgamari Differential Revision: https://phabricator.haskell.org/D993 GHC Trac Issues: #10521
* Fixes the ARM buildMoritz Angermann2014-10-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: CodeGen.Platform.hs was changed with the following diff: -#endif globalRegMaybe _ = Nothing +#elif MACHREGS_NO_REGS +globalRegMaybe _ = Nothing +#else +globalRegMaybe = panic "globalRegMaybe not defined for this platform" +#endif which causes globalRegMaybe ot panic for arch ARM. This patch ensures globalRegMaybe is not called on ARM. Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de> Test Plan: Building arm cross-compiler (e.g. --target=arm-apple-darwin10) Reviewers: hvr, ezyang, simonmar, rwbarton, austin Reviewed By: austin Subscribers: dterei, bgamari, simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D208 GHC Trac Issues: #9593
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-261-4/+0
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-241-0/+4
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Don't inline non-register GlobalRegsSimon Marlow2014-04-291-12/+100
|
* Typos in commentsGabor Greif2014-04-131-1/+1
|
* Revert "Revert ad15c2, which causes Windows seg-faults (Trac #8834)"Austin Seipp2014-04-041-21/+64
| | | | This reverts commit a79613a75c7da0d3d225850382f0f578a07113b5.
* Revert ad15c2, which causes Windows seg-faults (Trac #8834)Simon Peyton Jones2014-03-171-64/+21
| | | | | | | | We don't yet understand WHY commit ad15c2, which is to do with CmmSink, causes seg-faults on Windows, but it certainly seems to. So reverting it is a stop-gap, but we need to un-block the 7.8 release. Many thanks to awson for identifying the offending commit.
* Allow the argument to 'reserve' to be a compile-time expressionSimon Marlow2014-01-161-8/+2
| | | | By using the constant-folder to reduce it to an integer.
* Discard dead assignments in tryToInlineSimon Marlow2013-10-251-4/+26
| | | | | | | Inlining global registers and constants made code slightly larger in some cases. I finally got around to looking into why, and discovered one reason: we weren't discarding dead code in some cases. This patch fixes it.
* TyposKrzysztof Gogolewski2013-10-121-1/+1
|
* Comments onlyJan Stolarek2013-09-201-0/+1
|
* 80 columnsSimon Marlow2013-09-141-5/+8
|
* Improve sinking passJan Stolarek2013-09-121-22/+75
| | | | | | | | | | | | | | | | | | | | This commit does two things: * Allows duplicating of global registers and literals by inlining them. Previously we would only inline global register or literal if it was used only once. * Changes method of determining conflicts between a node and an assignment. New method has two advantages. It relies on DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers defined or used by a node should ever change, `conflicts` function will use the changed definition. This definition also catches more cases than the previous one (namely CmmCall and CmmForeignCall) which is a step towards making it possible to run sinking pass before stack layout (currently this doesn't work). This patch also adds a lot of comments that are result of about two-week long investigation of how sinking pass works and why it does what it does.
* Fix definition of DefinerOfRegs for CmmForeignCallJan Stolarek2013-09-041-3/+4
| | | | And update comments
* Comments and type synonym in CmmSinkJan Stolarek2013-09-031-22/+33
|
* Comments onlyJan Stolarek2013-09-021-20/+40
|
* Comment onlySimon Peyton Jones2013-04-191-1/+1
|
* Generalize register sets and liveness calculations.Geoffrey Mainland2012-10-301-15/+15
| | | | | | We would like to calculate register liveness for global registers as well as local registers, so this patch generalizes the existing infrastructure to set the stage.
* Fix a bug in CmmSink exposed by a recent optimisation (#7366)Simon Marlow2012-10-251-0/+10
|
* Fix bug in 88a6f863d9f127fc1b03a1e2f068fd20ecbe096c (#7366)Simon Marlow2012-10-251-20/+20
|
* Small optimisation: always sink/inline reg1 = reg2 assignmentsSimon Marlow2012-10-231-6/+5
|
* Foreign calls can clobber heap & stack memory tooSimon Marlow2012-10-221-2/+17
| | | | | | | We were making an aggressive assumption that foreign calls cannot clobber heap or stack memory, which for the majority of foreign calls is true, but we violate the assumption in the implementation of primops in the RTS. This was causing crashes in some STM tests.
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-10/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Add a ToDo commentSimon Marlow2012-10-051-0/+21
|
* Misc tidyupSimon Marlow2012-09-241-1/+1
|
* no functional changesSimon Marlow2012-09-241-7/+16
|
* Pass DynFlags down to bWordIan Lynagh2012-09-121-20/+20
| | | | | | I've switched to passing DynFlags rather than Platform, as (a) it's simpler to not have to extract targetPlatform in so many places, and (b) it may be useful to have DynFlags around in future.
* Define callerSaves for all platformsIan Lynagh2012-08-071-31/+35
| | | | | | | | This means that we now generate the same code whatever platform we are on, which should help avoid changes on one platform breaking the build on another. It's also another step towards full cross-compilation.
* fix warningSimon Marlow2012-08-061-2/+2
|