summaryrefslogtreecommitdiff
path: root/compiler/codeGen/StgCmmExpr.hs
Commit message (Collapse)AuthorAgeFilesLines
* Make Applicative a superclass of MonadAustin Seipp2014-09-091-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-0/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* codeGen: inline allocation optimization for clone array primopsJohan Tibell2014-03-221-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.
* Loopification jump between stack and heap checksJan Stolarek2014-02-011-5/+11
| | | | | | | | | | | | Fixes #8585 When emmiting label of a self-recursive tail call (ie. when performing loopification optimization) we emit the loop header label after a stack check but before the heap check. The reason is that tail-recursive functions use constant amount of stack space so we don't need to repeat the check in every loop. But they can grow the heap so heap check must be repeated in every call. See Note [Self-recursive tail calls] and [Self-recursive loop header].
* Move isVoidRep, isGcPtrRep to TyCon to join primRepSizeW etcSimon Peyton Jones2013-11-221-1/+1
| | | | This is just a modest refactoring
* Fix some cases where we were leaving slop in the heap (#8515, #8298)Simon Marlow2013-11-141-6/+14
|
* Add flag to control loopificationJan Stolarek2013-09-181-1/+6
| | | | | It is off by default, which is meant to be a workaround for #8275. Once #8275 is fixed we will enable this option by default.
* Explicit import lists for StgCmmProf.Edward Z. Yang2013-09-011-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Optimize self-recursive tail callsJan Stolarek2013-08-291-15/+89
| | | | | | | | | | | | | | | | | | | | | | This patch implements loopification optimization. It was described in "Low-level code optimisations in the Glasgow Haskell Compiler" by Krzysztof Woś, but we use a different approach here. Krzysztof's approach was to perform optimization as a Cmm-to-Cmm pass. Our approach is to generate properly optimized tail calls in the code generator, which saves us the trouble of processing Cmm. This idea was proposed by Simon Marlow. Implementation details are explained in Note [Self-recursive tail calls]. Performance of most nofib benchmarks is not affected. There are some benchmarks that show 5-7% improvement, with an average improvement of 2.6%. It would require some further investigation to check if this is related to benchamrking noise or does this optimization really help make some class of programs faster. As a minor cleanup, this patch renames forkProc to forkLneBody. It also moves some data declarations from StgCmmMonad to StgCmmClosure, because they are needed there and it seems that StgCmmClosure is on top of the whole StgCmm* hierarchy.
* Comments only, relating to #8166 fixSimon Peyton Jones2013-08-271-3/+4
|
* Properly externalise codegen identifiers (#8166)Austin Seipp2013-08-261-3/+7
| | | | | | | | | | | | | | | | | | | | 388e14e2 unfortunately broke a subtle invariant in the code generator: when generating code for an application, names may need to be externalised, in case you're building against something external with was built with -split-objs. We were never externalising the ids of the applied functions. This means if the libraries are split and we call into them, then the compiler won't may not generate correct ids when making references to functions in the library (causing linker failure). I'm not entirely sure how this didn't break everything, but it certainly caused several failures for a bunch of people. I had to fiddle with my tree a little to make this occur. This should fix #8166. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Merge cgTailCall and cgLneJump into one functionJan Stolarek2013-08-201-30/+16
| | | | | | | | | | | | | | | | | | | | | Previosly logic of these functions was sth like this: cgIdApp x = case x of A -> cgLneJump x _ -> cgTailCall x cgTailCall x = case x of B -> ... C -> ... _ -> ... After merging there is no nesting of cases: cgIdApp x = case x of A -> -- body of cgLneJump B -> ... C -> ... _ -> ...
* Cleanup StgCmm passJan Stolarek2013-08-201-4/+4
| | | | | | | | | | | | | | This cleanup includes: * removing dead code. This includes forkStatics function, which was in fact one big noop, and global bindings in CgInfoDownwards, * converting functions that used FCode monad only to access DynFlags into functions that take DynFlags as a parameter and don't work in a monad, * addBindC function is now smarter. It extracts Id from CgIdInfo passed to it in the same way addBindsC does. Previously this was done at every call site, which was redundant.
* Trailing whitespaces, code formatting, detabifyJan Stolarek2013-08-201-1/+1
| | | | | A major cleanup of trailing whitespaces and tabs in codeGen/ directory. I also adjusted code formatting in some places.
* Comments onlySimon Peyton Jones2013-08-191-1/+8
|
* Comparison primops return Int# (Fixes #6135)Jan Stolarek2013-08-141-14/+18
| | | | | | | | | | | | This patch modifies all comparison primops for Char#, Int#, Word#, Double#, Float# and Addr# to return Int# instead of Bool. A value of 1# represents True and 0# represents False. For a more detailed description of motivation for this change, discussion of implementation details and benchmarking results please visit the wiki page: http://hackage.haskell.org/trac/ghc/wiki/PrimBool There's also some cleanup: whitespace fixes in files that were extensively edited in this patch and constant folding rules for Integer div and mod operators (which for some reason have been left out up till now).
* Avoid needlessly splitting a UniqSupply when extracting a Unique (#8041)Patrick Palka2013-07-061-3/+2
| | | | | | | | | | | | In many places, 'splitUniqSupply' + 'uniqFromSupply' is used to split a UniqSupply into a Unique and a new UniqSupply. In such places we should instead use the more efficient and more appropriate 'takeUniqFromSupply' (or equivalent). Not only is the former method slower, it also generates and throws away an extra Unique. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* extended ticky to also track "let"s that are not conventional closuresNicolas Frisby2013-05-021-3/+4
| | | | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag. (This is 024df664b600a with a small bug fix.)
* Revert "extended ticky to also track "let"s that are not closures"Nicolas Frisby2013-04-121-4/+3
| | | | | | This reverts commit 024df664b600a622cb8189ccf31789688505fc1c. Of course I gaff on my last day...
* extended ticky to also track "let"s that are not closuresNicolas Frisby2013-04-121-3/+4
| | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag.
* ticky enhancementsNicolas Frisby2013-03-291-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
* Tidy up: move info-table related stuff to CmmInfoSimon Marlow2013-01-231-0/+1
| | | | Prep for #709
* Fix the Slow calling convention (#7192)Simon Marlow2012-11-131-2/+2
| | | | | | | | The Slow calling convention passes the closure in R1, but we were ignoring this and hoping it would work, which it often did. However, this bug seems to have been the cause of #7192, because the graph-colouring allocator is more sensitive to having correct liveness information on jumps.
* Attach global register liveness info to Cmm procedures.Geoffrey Mainland2012-10-301-1/+1
| | | | | | | All Cmm procedures now include the set of global registers that are live on procedure entry, i.e., the global registers used to pass arguments to the procedure. Only global registers that are use to pass arguments are included in this list.
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Partially fix #367 by adding HpLim checks to entry with -fno-omit-yields.Edward Z. Yang2012-09-261-3/+1
| | | | | | | | | | | | | | | | | | | | | The current fix is relatively dumb as far as where to add HpLim checks: it will always perform a check unless we know that we're returning from a closure or we are doing a non let-no-escape case analysis. The performance impact on the nofib suite looks like this: Min +5.7% -0.0% -6.5% -6.4% -50.0% Max +6.3% +5.8% +5.0% +5.5% +0.8% Geometric Mean +6.2% +0.1% +0.5% +0.5% -0.8% Overall, the executable bloat is the biggest problem, so we keep the old omit-yields optimization on by default. Remember that if you need an interruptibility guarantee, you need to recompile all of your libraries with -fno-omit-yields. A better fix would involve only inserting the yields necessary to break loops; this is left as future work. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Move tAG_BITS into platformConstantsIan Lynagh2012-09-161-1/+1
|
* Pass DynFlags down to wordWidthIan Lynagh2012-09-121-3/+3
|
* Pass DynFlags down to bWordIan Lynagh2012-09-121-22/+35
| | | | | | I've switched to passing DynFlags rather than Platform, as (a) it's simpler to not have to extract targetPlatform in so many places, and (b) it may be useful to have DynFlags around in future.
* A further fix for -split-objs with the new codegenSimon Marlow2012-09-051-1/+4
|
* remove tabsSimon Marlow2012-08-211-93/+86
|
* Remove uses of fixC from the codeGen, and make the FCode monad strictSimon Marlow2012-08-091-22/+26
|
* fix maybeSaveCostCentre: cases were reversedSimon Marlow2012-08-071-2/+2
|
* Add "Unregisterised" as a field in the settings fileIan Lynagh2012-08-071-3/+3
| | | | | | To explicitly choose whether you want an unregisterised build you now need to use the "--enable-unregisterised"/"--disable-unregisterised" configure flags.
* Make tablesNextToCode "dynamic"Ian Lynagh2012-08-061-3/+4
| | | | | This is a bit odd by itself, but it's a stepping stone on the way to putting "target unregisterised" into the settings file.
* Explicitly share some return continuationsSimon Marlow2012-08-021-104/+73
| | | | | | | Instead of relying on common-block-elimination to share return continuations in the common case (case-alternative heap checks) we do it explicitly. This isn't hard to do, is more robust, and saves some compilation time. Full commentary in Note [sharing continuations].
* Make -fscc-profiling a dynamic flagIan Lynagh2012-07-241-8/+8
| | | | All the flags that 'ways' imply are now dynamic
* adjustHpBackwards before calling a let-no-escapeSimon Marlow2012-07-111-1/+2
|
* remove some redundant SRT-related stuffSimon Marlow2012-07-111-9/+9
|
* Track liveness of GlobalRegs in the new code generatorSimon Marlow2012-07-091-9/+7
| | | | | | This gives the register allocator access to R1.., F1.., D1.. etc. for the new code generator, and is a cheap way to eliminate all the extra "x = R1" assignments that we get from copyIn.
* Remove "fuel", adapt to Hoopl changes, fix warningsSimon Marlow2012-07-051-1/+1
|
* Merge remote-tracking branch 'origin/master' into newcgSimon Marlow2012-07-041-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * origin/master: (756 commits) don't crash if argv[0] == NULL (#7037) -package P was loading all versions of P in GHCi (#7030) Add a Note, copying text from #2437 improve the --help docs a bit (#7008) Copy Data.HashTable's hashString into our Util module Build fix Build fixes Parse error: suggest brackets and indentation. Don't build the ghc DLL on Windows; works around trac #5987 On Windows, detect if DLLs have too many symbols; trac #5987 Add some more Integer rules; fixes #6111 Fix PA dfun construction with silent superclass args Add silent superclass parameters to the vectoriser Add silent superclass parameters (again) Mention Generic1 in the user's guide Make the GHC API a little more powerful. tweak llvm version warning message New version of the patch for #5461. Fix Word64ToInteger conversion rule. Implemented feature request on reconfigurable pretty-printing in GHCi (#5461) ... Conflicts: compiler/basicTypes/UniqSupply.lhs compiler/cmm/CmmBuildInfoTables.hs compiler/cmm/CmmLint.hs compiler/cmm/CmmOpt.hs compiler/cmm/CmmPipeline.hs compiler/cmm/CmmStackLayout.hs compiler/cmm/MkGraph.hs compiler/cmm/OldPprCmm.hs compiler/codeGen/CodeGen.lhs compiler/codeGen/StgCmm.hs compiler/codeGen/StgCmmBind.hs compiler/codeGen/StgCmmLayout.hs compiler/codeGen/StgCmmUtils.hs compiler/main/CodeOutput.lhs compiler/main/HscMain.hs compiler/nativeGen/AsmCodeGen.lhs compiler/simplStg/SimplStg.lhs
| * Support code generation for unboxed-tuple function argumentsunboxed-tuple-arguments2Max Bolingbroke2012-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | This is done by a 'unarisation' pre-pass at the STG level which translates away all (live) binders binding something of unboxed tuple type. This has the following knock-on effects: * The subkind hierarchy is vastly simplified (no UbxTupleKind or ArgKind) * Various relaxed type checks in typechecker, 'foreign import prim' etc * All case binders may be live at the Core level
* | Improve the case-alternative heap checksSimon Marlow2012-03-071-20/+51
| | | | | | | | | | | | | | The code we were generating for heap-checks in algebraic case alternatives wasn't working well with the common-block eliminator. A small tweak to make the heap-check failure jump back to the same place in all branches lets the common-block eliminator squash more code.
* | New codegen: fix bad code for comparisons (see Note [case on bool])Simon Marlow2012-02-151-43/+76
| |
* | New stack layout algorithmSimon Marlow2012-02-081-10/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | Also: - improvements to code generation: push slow-call continuations on the stack instead of generating explicit continuations - remove unused CmmInfo wrapper type (replace with CmmInfoTable) - squash Area and AreaId together, remove now-unused RegSlot - comment out old unused stack-allocation code that no longer compiles after removal of RegSlot
* | small refactorSimon Marlow2012-01-251-5/+6
| |
* | Different implementation of MkGraphSimon Marlow2012-01-251-14/+17
|/
* Use -fwarn-tabs when validatingIan Lynagh2011-11-041-0/+7
| | | | | We only use it for "compiler" sources, i.e. not for libraries. Many modules have a -fno-warn-tabs kludge for now.
* Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)