summaryrefslogtreecommitdiff
path: root/compiler/codeGen/StgCmmBind.hs
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Place static closures in their own section."Edward Z. Yang2014-10-201-2/+2
| | | | | | | | | | This reverts commit b23ba2a7d612c6b466521399b33fe9aacf5c4f75. Conflicts: compiler/cmm/PprCmmDecl.hs compiler/nativeGen/PPC/Ppr.hs compiler/nativeGen/SPARC/Ppr.hs compiler/nativeGen/X86/Ppr.hs
* Place static closures in their own section.Edward Z. Yang2014-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Summary: The primary reason for doing this is assisting debuggability: if static closures are all in the same section, they are guaranteed to be adjacent to one another. This will help later when we add some code that takes section start/end and uses this to sanity-check the sections. Part of remove HEAP_ALLOCED patch set (#8199) Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D263 GHC Trac Issues: #8199
* Make Applicative a superclass of MonadAustin Seipp2014-09-091-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-0/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-291-4/+4
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Represent offsets into heap objects with byte, not word, offsetsSimon Marlow2014-03-111-6/+7
| | | | | I'd like to be able to pack together non-pointer fields that are less than a word in size, and this is a necessary prerequisite.
* Cleaned up Maybes.lhsBaldur Blöndal2014-02-131-2/+2
|
* Loopification jump between stack and heap checksJan Stolarek2014-02-011-9/+5
| | | | | | | | | | | | Fixes #8585 When emmiting label of a self-recursive tail call (ie. when performing loopification optimization) we emit the loop header label after a stack check but before the heap check. The reason is that tail-recursive functions use constant amount of stack space so we don't need to repeat the check in every loop. But they can grow the heap so heap check must be repeated in every call. See Note [Self-recursive tail calls] and [Self-recursive loop header].
* Update and deduplicate the comments on CAF management (#8590)Patrick Palka2013-12-041-31/+4
|
* Move the allocation of CAF blackholes into 'newCAF' (#8590)Patrick Palka2013-12-041-30/+10
| | | | | | | | | | We now do the allocation of the blackhole indirection closure inside the RTS procedure 'newCAF' instead of generating the allocation code inline in the closure body of each CAF. This slightly decreases code size in modules with a lot of CAFs. As a result of this change, for example, the size of DynFlags.o drops by ~60KB and HsExpr.o by ~100KB.
* Move the LDV code below the self-loop label (#8275)Patrick Palka2013-12-011-1/+1
|
* Don't explicitly refer to nodeReg in ldvEnterClosurePatrick Palka2013-12-011-2/+2
|
* Document solution to #8275Jan Stolarek2013-12-011-2/+13
|
* Fix loopification with profiling and enable it by default (#8275)Patrick Palka2013-12-011-4/+2
|
* Explicit import lists for StgCmmProf.Edward Z. Yang2013-09-011-1/+2
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Optimize self-recursive tail callsJan Stolarek2013-08-291-2/+14
| | | | | | | | | | | | | | | | | | | | | | This patch implements loopification optimization. It was described in "Low-level code optimisations in the Glasgow Haskell Compiler" by Krzysztof Woś, but we use a different approach here. Krzysztof's approach was to perform optimization as a Cmm-to-Cmm pass. Our approach is to generate properly optimized tail calls in the code generator, which saves us the trouble of processing Cmm. This idea was proposed by Simon Marlow. Implementation details are explained in Note [Self-recursive tail calls]. Performance of most nofib benchmarks is not affected. There are some benchmarks that show 5-7% improvement, with an average improvement of 2.6%. It would require some further investigation to check if this is related to benchamrking noise or does this optimization really help make some class of programs faster. As a minor cleanup, this patch renames forkProc to forkLneBody. It also moves some data declarations from StgCmmMonad to StgCmmClosure, because they are needed there and it seems that StgCmmClosure is on top of the whole StgCmm* hierarchy.
* Merge cgTailCall and cgLneJump into one functionJan Stolarek2013-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | Previosly logic of these functions was sth like this: cgIdApp x = case x of A -> cgLneJump x _ -> cgTailCall x cgTailCall x = case x of B -> ... C -> ... _ -> ... After merging there is no nesting of cases: cgIdApp x = case x of A -> -- body of cgLneJump B -> ... C -> ... _ -> ...
* Remove unused moduleJan Stolarek2013-08-201-3/+0
| | | | | | This commit removes module StgCmmGran which has only no-op functions. According to comments in the module, it was used by GpH, but GpH project seems to be dead for a couple of years now.
* Cleanup StgCmm passJan Stolarek2013-08-201-19/+19
| | | | | | | | | | | | | | This cleanup includes: * removing dead code. This includes forkStatics function, which was in fact one big noop, and global bindings in CgInfoDownwards, * converting functions that used FCode monad only to access DynFlags into functions that take DynFlags as a parameter and don't work in a monad, * addBindC function is now smarter. It extracts Id from CgIdInfo passed to it in the same way addBindsC does. Previously this was done at every call site, which was redundant.
* Trailing whitespaces, code formatting, detabifyJan Stolarek2013-08-201-4/+4
| | | | | A major cleanup of trailing whitespaces and tabs in codeGen/ directory. I also adjusted code formatting in some places.
* Add final remaining bits to fix #7978.Geoffrey Mainland2013-07-221-30/+1
|
* Add a work-around for #7978.Geoffrey Mainland2013-06-221-2/+7
| | | | | This patch fixes profiling at the cost of losing cost centre accounting in a very small number of cases. I am working on a better fix.
* Wibbles (merg-os) to ticky-tickySimon Peyton Jones2013-06-061-2/+2
|
* Implement cardinality analysisSimon Peyton Jones2013-06-061-10/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This major patch implements the cardinality analysis described in our paper "Higher order cardinality analysis". It is joint work with Ilya Sergey and Dimitrios Vytiniotis. The basic is augment the absence-analysis part of the demand analyser so that it can tell when something is used never at most once some other way The "at most once" information is used a) to enable transformations, and in particular to identify one-shot lambdas b) to allow updates on thunks to be omitted. There are two new flags, mainly there so you can do performance comparisons: -fkill-absence stops GHC doing absence analysis at all -fkill-one-shot stops GHC spotting one-shot lambdas and single-entry thunks The big changes are: * The Demand type is substantially refactored. In particular the UseDmd is factored as follows data UseDmd = UCall Count UseDmd | UProd [MaybeUsed] | UHead | Used data MaybeUsed = Abs | Use Count UseDmd data Count = One | Many Notice that UCall recurses straight to UseDmd, whereas UProd goes via MaybeUsed. The "Count" embodies the "at most once" or "many" idea. * The demand analyser itself was refactored a lot * The previously ad-hoc stuff in the occurrence analyser for foldr and build goes away entirely. Before if we had build (\cn -> ...x... ) then the "\cn" was hackily made one-shot (by spotting 'build' as special. That's essential to allow x to be inlined. Now the occurrence analyser propagates info gotten from 'build's stricness signature (so build isn't special); and that strictness sig is in turn derived entirely automatically. Much nicer! * The ticky stuff is improved to count single-entry thunks separately. One shortcoming is that there is no DEBUG way to spot if an allegedly-single-entry thunk is acually entered more than once. It would not be hard to generate a bit of code to check for this, and it would be reassuring. But it's fiddly and I have not done it. Despite all this fuss, the performance numbers are rather under-whelming. See the paper for more discussion. nucleic2 -0.8% -10.9% 0.10 0.10 +0.0% sphere -0.7% -1.5% 0.08 0.08 +0.0% -------------------------------------------------------------------------------- Min -4.7% -10.9% -9.3% -9.3% -50.0% Max -0.4% +0.5% +2.2% +2.3% +7.4% Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8% I don't quite know how much credence to place in the runtime changes, but movement seems generally in the right direction.
* extended ticky to also track "let"s that are not conventional closuresNicolas Frisby2013-05-021-9/+13
| | | | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag. (This is 024df664b600a with a small bug fix.)
* In CMM, only allow foreign calls to labels, not arbitrary expressionsIan Lynagh2013-04-241-1/+1
| | | | | | | | | I'm not sure if we want to make this change permanently, but for now it fixes the unreg build. I've also removed some redundant special-case code that generated prototypes for foreign functions. The standard pprTempAndExternDecls now generates them.
* Revert "extended ticky to also track "let"s that are not closures"Nicolas Frisby2013-04-121-14/+9
| | | | | | This reverts commit 024df664b600a622cb8189ccf31789688505fc1c. Of course I gaff on my last day...
* extended ticky to also track "let"s that are not closuresNicolas Frisby2013-04-121-9/+14
| | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag.
* ticky enhancementsNicolas Frisby2013-03-291-26/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
* Tidy up: move info-table related stuff to CmmInfoSimon Marlow2013-01-231-0/+1
| | | | Prep for #709
* Code-size optimisation for top-level indirections (#7308)Simon Marlow2012-11-191-9/+31
| | | | | | | | | | | | | | | Top-level indirections are often generated when there is a cast, e.g. foo :: T foo = bar `cast` (some coercion) For these we were generating a full-blown CAF, which is a fair chunk of code. This patch makes these indirections generate a single IND_STATIC closure (4 words) instead. This is exactly what the CAF would evaluate to eventually anyway, we're just shortcutting the whole process.
* Fix the Slow calling convention (#7192)Simon Marlow2012-11-131-10/+11
| | | | | | | | The Slow calling convention passes the closure in R1, but we were ignoring this and hoping it would work, which it often did. However, this bug seems to have been the cause of #7192, because the graph-colouring allocator is more sensitive to having correct liveness information on jumps.
* Some alpha renamingIan Lynagh2012-10-161-5/+5
| | | | | Mostly d -> g (matching DynFlag -> GeneralFlag). Also renamed if* to when*, matching the Haskell if/when names
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-16/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Change some "else return ()"s to use when/unlessIan Lynagh2012-09-201-2/+1
|
* Move tAG_BITS into platformConstantsIan Lynagh2012-09-161-2/+2
|
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-2/+1
|
* Start moving other constants from (Haskell)Constants to platformConstantsIan Lynagh2012-09-141-2/+2
|
* Use oFFSET_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-1/+1
|
* Use sIZEOF_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-1/+1
|
* Pass DynFlags down to wordWidthIan Lynagh2012-09-121-4/+4
|
* Pass DynFlags down to bWordIan Lynagh2012-09-121-9/+11
| | | | | | I've switched to passing DynFlags rather than Platform, as (a) it's simpler to not have to extract targetPlatform in so many places, and (b) it may be useful to have DynFlags around in future.
* Cleanup: add mkIntExpr and zeroExpr utilsSimon Marlow2012-08-311-1/+1
|
* remove tabsSimon Marlow2012-08-211-124/+117
|
* Remove uses of fixC from the codeGen, and make the FCode monad strictSimon Marlow2012-08-091-74/+110
|
* fix warningSimon Marlow2012-08-071-1/+1
|
* entryHeapCheck: fix calls to stg_gc_fun and stg_gc_enter_1Simon Marlow2012-08-071-2/+2
| | | | | | | | | We weren't passing the arguments correctly to the GC functions, which usually happened to work because the arguments were in the right registers already. After this fix the profiling tests go through with the new code generator.
* Small optimisationSimon Marlow2012-08-071-5/+6
| | | | | | When calling newCAF, refer to the closure using its LocalReg rather than R1. Using R1 here was preventing the register allocator from coalescing the assignment x=R1 at the beginning of the function.
* fix a warningSimon Marlow2012-08-071-1/+1
|
* Fix update frames for profilingSimon Marlow2012-08-071-12/+16
|