summaryrefslogtreecommitdiff
path: root/compiler/codeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* Globally replace "hackage.haskell.org" with "ghc.haskell.org"Simon Marlow2013-10-011-1/+1
|
* Check that SIMD vector instructions are compatible with current set of ↵Geoffrey Mainland2013-09-221-14/+59
| | | | | | | | dynamic flags. SIMD vector instructions currently require the LLVM back-end. The set of available instructions also depends on the set of architecture flags specified on the command line.
* Pass 512-bit-wide vectors in registers.Geoffrey Mainland2013-09-221-0/+7
|
* Add support for 512-bit-wide vectors.Geoffrey Mainland2013-09-222-0/+6
|
* Pass 256-bit-wide vectors in registers.Geoffrey Mainland2013-09-221-0/+7
|
* Add support for 256-bit-wide vectors.Geoffrey Mainland2013-09-222-3/+9
|
* SIMD primops are now generated using schemas that are polymorphic inGeoffrey Mainland2013-09-221-125/+163
| | | | | | | | | | | | | width and element type. SIMD primops are now polymorphic in vector size and element type, but only internally to the compiler. More specifically, utils/genprimopcode has been extended so that it "knows" about SIMD vectors. This allows us to, for example, write a single definition for the "add two vectors" primop in primops.txt.pp and have it instantiated at many vector types. This generates a primop in GHC.Prim for each vector type at which "add two vectors" is instantiated, but only one data constructor for the PrimOp data type, so the code generator is much, much simpler.
* Add flag to control loopificationJan Stolarek2013-09-182-3/+9
| | | | | It is off by default, which is meant to be a workaround for #8275. Once #8275 is fixed we will enable this option by default.
* New primops for byte range copies ByteArray# <-> Addr#Duncan Coutts2013-09-151-0/+34
| | | | | | | | | | | | | | | | | | | | | | | We have primops for copying ranges of bytes between ByteArray#s: * ByteArray# -> MutableByteArray# * MutableByteArray# -> MutableByteArray# This extends it with three further cases: * Addr# -> MutableByteArray# * ByteArray# -> Addr# * MutableByteArray# -> Addr# One use case for these is copying between ForeignPtr-based representations and in-heap arrays (like Text, UArray etc). The implementation is essentially the same as for the existing primops, and shares the memcpy stuff in the code generators. Defficiencies / future directions: none of these primops (existing or the new ones) let one take advantage of knowing that ByteArray#s are word-aligned in memory. Though it is unclear that any of the code generators would make use of this information unless the size to copy is also known at compile time. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Fix AMP warnings.Austin Seipp2013-09-112-0/+15
| | | | | Authored-by: David Luposchainsky <dluposchainsky@gmail.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* Explicit import lists for StgCmmProf.Edward Z. Yang2013-09-018-8/+9
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Optimize self-recursive tail callsJan Stolarek2013-08-294-94/+222
| | | | | | | | | | | | | | | | | | | | | | This patch implements loopification optimization. It was described in "Low-level code optimisations in the Glasgow Haskell Compiler" by Krzysztof Woś, but we use a different approach here. Krzysztof's approach was to perform optimization as a Cmm-to-Cmm pass. Our approach is to generate properly optimized tail calls in the code generator, which saves us the trouble of processing Cmm. This idea was proposed by Simon Marlow. Implementation details are explained in Note [Self-recursive tail calls]. Performance of most nofib benchmarks is not affected. There are some benchmarks that show 5-7% improvement, with an average improvement of 2.6%. It would require some further investigation to check if this is related to benchamrking noise or does this optimization really help make some class of programs faster. As a minor cleanup, this patch renames forkProc to forkLneBody. It also moves some data declarations from StgCmmMonad to StgCmmClosure, because they are needed there and it seems that StgCmmClosure is on top of the whole StgCmm* hierarchy.
* Whitespaces and comment formattingJan Stolarek2013-08-292-33/+31
|
* Comments only, relating to #8166 fixSimon Peyton Jones2013-08-273-9/+15
|
* Properly externalise codegen identifiers (#8166)Austin Seipp2013-08-261-3/+7
| | | | | | | | | | | | | | | | | | | | 388e14e2 unfortunately broke a subtle invariant in the code generator: when generating code for an application, names may need to be externalised, in case you're building against something external with was built with -split-objs. We were never externalising the ids of the applied functions. This means if the libraries are split and we call into them, then the compiler won't may not generate correct ids when making references to functions in the library (causing linker failure). I'm not entirely sure how this didn't break everything, but it certainly caused several failures for a bunch of people. I had to fiddle with my tree a little to make this occur. This should fix #8166. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Comments onlyJan Stolarek2013-08-221-3/+1
| | | | This comment is no loger true
* DetabifyJan Stolarek2013-08-211-39/+32
| | | | I missed that file yesterday when I was cleaning up codeGen/ directory.
* Comments onlyJan Stolarek2013-08-201-1/+2
|
* Merge cgTailCall and cgLneJump into one functionJan Stolarek2013-08-202-31/+17
| | | | | | | | | | | | | | | | | | | | | Previosly logic of these functions was sth like this: cgIdApp x = case x of A -> cgLneJump x _ -> cgTailCall x cgTailCall x = case x of B -> ... C -> ... _ -> ... After merging there is no nesting of cases: cgIdApp x = case x of A -> -- body of cgLneJump B -> ... C -> ... _ -> ...
* Remove unused moduleJan Stolarek2013-08-203-133/+2
| | | | | | This commit removes module StgCmmGran which has only no-op functions. According to comments in the module, it was used by GpH, but GpH project seems to be dead for a couple of years now.
* Cleanup StgCmm passJan Stolarek2013-08-207-115/+66
| | | | | | | | | | | | | | This cleanup includes: * removing dead code. This includes forkStatics function, which was in fact one big noop, and global bindings in CgInfoDownwards, * converting functions that used FCode monad only to access DynFlags into functions that take DynFlags as a parameter and don't work in a monad, * addBindC function is now smarter. It extracts Id from CgIdInfo passed to it in the same way addBindsC does. Previously this was done at every call site, which was redundant.
* Trailing whitespaces, code formatting, detabifyJan Stolarek2013-08-2013-481/+467
| | | | | A major cleanup of trailing whitespaces and tabs in codeGen/ directory. I also adjusted code formatting in some places.
* Comments onlySimon Peyton Jones2013-08-191-1/+8
|
* Comparison primops return Int# (Fixes #6135)Jan Stolarek2013-08-142-23/+18
| | | | | | | | | | | | This patch modifies all comparison primops for Char#, Int#, Word#, Double#, Float# and Addr# to return Int# instead of Bool. A value of 1# represents True and 0# represents False. For a more detailed description of motivation for this change, discussion of implementation details and benchmarking results please visit the wiki page: http://hackage.haskell.org/trac/ghc/wiki/PrimBool There's also some cleanup: whitespace fixes in files that were extensively edited in this patch and constant folding rules for Integer div and mod operators (which for some reason have been left out up till now).
* Fix a bug in stack layout with safe foreign calls (#8083)Simon Marlow2013-07-241-1/+2
| | | | | | | We weren't properly tracking the number of stack arguments in the continuation of a foreign call. It happened to work when the continuation was not a join point, but when it was a join point we were using the wrong amount of stack fixup.
* Add final remaining bits to fix #7978.Geoffrey Mainland2013-07-221-30/+1
|
* Add support for byte endian swapping for Word 16/32/64.Austin Seipp2013-07-171-0/+12
| | | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machop: MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Authored-by: Vincent Hanquez <tab@snarc.org> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Avoid needlessly splitting a UniqSupply when extracting a Unique (#8041)Patrick Palka2013-07-062-5/+6
| | | | | | | | | | | | In many places, 'splitUniqSupply' + 'uniqFromSupply' is used to split a UniqSupply into a Unique and a new UniqSupply. In such places we should instead use the more efficient and more appropriate 'takeUniqFromSupply' (or equivalent). Not only is the former method slower, it also generates and throws away an extra Unique. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Fix bumpTickyLitBy[E] on Win64; fixes #7940Ian Lynagh2013-07-021-4/+2
| | | | | | A comment claimed that the ticky counters are unsigned longs, but as far as I can see that isn't the case: They're already word-sized values.
* Fix typosGabor Greif2013-06-251-2/+2
|
* Make noteMustPointToIt true of all non-top-level thunksSimon Peyton Jones2013-06-251-23/+44
| | | | See Note [GC recovery]. To come: clean-up of StgCmmBind.cgRhs.
* Add a work-around for #7978.Geoffrey Mainland2013-06-221-2/+7
| | | | | This patch fixes profiling at the cost of losing cost centre accounting in a very small number of cases. I am working on a better fix.
* Fix many ASSERT uses under Clang.Austin Seipp2013-06-181-1/+1
| | | | | | Clang doesn't like whitespace between macro and arguments. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Remove redundant import, revealed by the fix to #7963Simon Peyton Jones2013-06-181-1/+0
|
* Revert "Add support for byte endian swapping for Word 16/32/64."Simon Peyton Jones2013-06-111-12/+0
| | | | This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
* Add support for byte endian swapping for Word 16/32/64.Ian Lynagh2013-06-091-0/+12
| | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machops MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Patch from Vincent Hanquez.
* Wibbles (merg-os) to ticky-tickySimon Peyton Jones2013-06-062-3/+3
|
* Implement cardinality analysisSimon Peyton Jones2013-06-062-18/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This major patch implements the cardinality analysis described in our paper "Higher order cardinality analysis". It is joint work with Ilya Sergey and Dimitrios Vytiniotis. The basic is augment the absence-analysis part of the demand analyser so that it can tell when something is used never at most once some other way The "at most once" information is used a) to enable transformations, and in particular to identify one-shot lambdas b) to allow updates on thunks to be omitted. There are two new flags, mainly there so you can do performance comparisons: -fkill-absence stops GHC doing absence analysis at all -fkill-one-shot stops GHC spotting one-shot lambdas and single-entry thunks The big changes are: * The Demand type is substantially refactored. In particular the UseDmd is factored as follows data UseDmd = UCall Count UseDmd | UProd [MaybeUsed] | UHead | Used data MaybeUsed = Abs | Use Count UseDmd data Count = One | Many Notice that UCall recurses straight to UseDmd, whereas UProd goes via MaybeUsed. The "Count" embodies the "at most once" or "many" idea. * The demand analyser itself was refactored a lot * The previously ad-hoc stuff in the occurrence analyser for foldr and build goes away entirely. Before if we had build (\cn -> ...x... ) then the "\cn" was hackily made one-shot (by spotting 'build' as special. That's essential to allow x to be inlined. Now the occurrence analyser propagates info gotten from 'build's stricness signature (so build isn't special); and that strictness sig is in turn derived entirely automatically. Much nicer! * The ticky stuff is improved to count single-entry thunks separately. One shortcoming is that there is no DEBUG way to spot if an allegedly-single-entry thunk is acually entered more than once. It would not be hard to generate a bit of code to check for this, and it would be reassuring. But it's fiddly and I have not done it. Despite all this fuss, the performance numbers are rather under-whelming. See the paper for more discussion. nucleic2 -0.8% -10.9% 0.10 0.10 +0.0% sphere -0.7% -1.5% 0.08 0.08 +0.0% -------------------------------------------------------------------------------- Min -4.7% -10.9% -9.3% -9.3% -50.0% Max -0.4% +0.5% +2.2% +2.3% +7.4% Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8% I don't quite know how much credence to place in the runtime changes, but movement seems generally in the right direction.
* Comments and white space onlySimon Peyton Jones2013-06-061-2/+2
|
* Fix the GHC package DLL-splittingIan Lynagh2013-05-141-1/+2
| | | | | | | There's now an internal -dll-split flag, which we use to tell GHC how the GHC package is split into 2 separate DLLs. This is used by Packages.isDllName to determine whether a call is within the same DLL, or whether it is a call to another DLL.
* extended ticky to also track "let"s that are not conventional closuresNicolas Frisby2013-05-026-47/+71
| | | | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag. (This is 024df664b600a with a small bug fix.)
* In CMM, only allow foreign calls to labels, not arbitrary expressionsIan Lynagh2013-04-243-10/+8
| | | | | | | | | I'm not sure if we want to make this change permanently, but for now it fixes the unreg build. I've also removed some redundant special-case code that generated prototypes for foreign functions. The standard pprTempAndExternDecls now generates them.
* Small refactoring in StgCmmExtCodeIan Lynagh2013-04-231-6/+7
|
* Don't duplicate decls unnecessarily in the environmentIan Lynagh2013-04-231-1/+1
| | | | | In loopDecls, as far as I can see the globalDecls will always already be in the environment, so don't add them again.
* Make CmmParse abstractIan Lynagh2013-04-231-1/+1
|
* Revert "extended ticky to also track "let"s that are not closures"Nicolas Frisby2013-04-126-69/+47
| | | | | | This reverts commit 024df664b600a622cb8189ccf31789688505fc1c. Of course I gaff on my last day...
* extended ticky to also track "let"s that are not closuresNicolas Frisby2013-04-126-47/+69
| | | | | This includes selector, ap, and constructor thunks. They are still guarded by the -ticky-dyn-thk flag.
* added ticky counters for heap and stack checksNicolas Frisby2013-04-112-1/+11
|
* ticky enhancementsNicolas Frisby2013-03-299-348/+614
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
* Typo-fix for panic.Edward Z. Yang2013-03-111-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>