summaryrefslogtreecommitdiff
path: root/compiler/codeGen/StgCmmPrim.hs
Commit message (Collapse)AuthorAgeFilesLines
* Make Applicative a superclass of MonadAustin Seipp2014-09-091-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backendReid Barton2014-08-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These MachOps are used by addIntC# and subIntC#, which in turn are used in integer-gmp when adding or subtracting small Integers. The following benchmark shows a ~6% speedup after this commit on x86_64 (building GHC with BuildFlavour=perf). {-# LANGUAGE MagicHash #-} import GHC.Exts import Criterion.Main count :: Int -> Integer count (I# n#) = go n# 0 where go :: Int# -> Integer -> Integer go 0# acc = acc go n# acc = go (n# -# 1#) $! acc + 1 main = defaultMain [bgroup "count" [bench "100" $ whnf count 100]] Differential Revision: https://phabricator.haskell.org/D140
* Implement new CLZ and CTZ primops (re #9340)Herbert Valerio Riedel2014-08-141-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
* StgCmmPrim: add note to stop using fixed size signed types for sizesJohan Tibell2014-08-121-0/+5
| | | | | We use fixed size signed types to e.g. represent array sizes. This means that the size can overflow.
* shouldInlinePrimOp: Fix Int overflowJohan Tibell2014-08-121-22/+38
| | | | | | | | | | | | | | | | There were two overflow issues in shouldInlinePrimOp. The first one is due to a negative CmmInt literal being created if the array size was given as larger than 2^63-1 (on a 64-bit platform.) This meant that large array sizes could compare as being smaller than maxInlineAllocSize. The second issue is that we casted the Integer to an Int in the comparison, which again meant that large array sizes could compare as being smaller than maxInlineAllocSize. The attempt to allocate a large array inline then caused a segfault. Fixes #9416.
* Make IntAddCOp, IntSubCOp into GenericOpsReid Barton2014-08-101-57/+65
| | | | | | | | | ... in preparation for backend-specific implementations. No functional changes in this commit (except in panic messages for ill-formed Cmm). Differential Revision: https://phabricator.haskell.org/D138
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-301-0/+94
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-261-94/+0
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-241-0/+94
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-0/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Add inline versions of copy ops for small arraysJohan Tibell2014-03-301-0/+63
| | | | | If the number of elements being copied is known statically this might lead to the copy loop being unrolled in the backend.
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-291-30/+138
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Make copy array ops out-of-line by defaultJohan Tibell2014-03-281-32/+45
| | | | | | This should reduce code size when there's little to gain from inlining these primops, while still retaining the inlining benefit when the size of the copy is known statically.
* codeGen: inline allocation optimization for clone array primopsJohan Tibell2014-03-221-80/+52
| | | | | | | | | | | | | | | | | | | | | | | | The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.
* codeGen: allocate small byte arrays of statically known size inlineJohan Tibell2014-03-141-10/+39
| | | | | | | This results in a 57% runtime decrease when allocating an array of 128 bytes on a 64-bit machine. Fixes #8876.
* Fix incorrect loop condition in inline array allocationJohan Tibell2014-03-111-5/+6
| | | | | Also make sure allocHeapClosure updates profiling counters with the memory allocated.
* Refactor inline array allocationSimon Marlow2014-03-111-56/+19
| | | | | | | | | | - Move array representation knowledge into SMRep - Separate out low-level heap-object allocation so that we can reuse it from doNewArrayOp - remove card-table initialisation, we can safely ignore the card table for newly allocated arrays.
* codeGen: allocate small arrays of statically known size inlineJohan Tibell2014-03-111-38/+159
| | | | | | | | | | This results in a 46% runtime decrease when allocating an array of 16 unit elements on a 64-bit machine. In order to allow newArray# to have both an inline and an out-of-line implementation, cgOpApp is refactored slightly. The new implementation of cgOpApp should make it easier to add other primops with both inline and out-of-line implementations in the future.
* Add support for prefetch with locality levels.Austin Seipp2013-10-011-23/+30
| | | | | | | | | | | | | | | | | This patch adds support for several new primitive operations which support using processor-specific instructions to help guide data and cache locality decisions. We have levels ranging from [0..3] For LLVM, we generate llvm.prefetch intrinsics at the proper locality level (similar to GCC.) For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and PowerPC, the locality levels are ignored. This closes #8256. Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* Check that SIMD vector instructions are compatible with current set of ↵Geoffrey Mainland2013-09-221-14/+59
| | | | | | | | dynamic flags. SIMD vector instructions currently require the LLVM back-end. The set of available instructions also depends on the set of architecture flags specified on the command line.
* SIMD primops are now generated using schemas that are polymorphic inGeoffrey Mainland2013-09-221-125/+163
| | | | | | | | | | | | | width and element type. SIMD primops are now polymorphic in vector size and element type, but only internally to the compiler. More specifically, utils/genprimopcode has been extended so that it "knows" about SIMD vectors. This allows us to, for example, write a single definition for the "add two vectors" primop in primops.txt.pp and have it instantiated at many vector types. This generates a primop in GHC.Prim for each vector type at which "add two vectors" is instantiated, but only one data constructor for the PrimOp data type, so the code generator is much, much simpler.
* New primops for byte range copies ByteArray# <-> Addr#Duncan Coutts2013-09-151-0/+34
| | | | | | | | | | | | | | | | | | | | | | | We have primops for copying ranges of bytes between ByteArray#s: * ByteArray# -> MutableByteArray# * MutableByteArray# -> MutableByteArray# This extends it with three further cases: * Addr# -> MutableByteArray# * ByteArray# -> Addr# * MutableByteArray# -> Addr# One use case for these is copying between ForeignPtr-based representations and in-heap arrays (like Text, UArray etc). The implementation is essentially the same as for the existing primops, and shares the memcpy stuff in the code generators. Defficiencies / future directions: none of these primops (existing or the new ones) let one take advantage of knowing that ByteArray#s are word-aligned in memory. Though it is unclear that any of the code generators would make use of this information unless the size to copy is also known at compile time. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Explicit import lists for StgCmmProf.Edward Z. Yang2013-09-011-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Trailing whitespaces, code formatting, detabifyJan Stolarek2013-08-201-7/+7
| | | | | A major cleanup of trailing whitespaces and tabs in codeGen/ directory. I also adjusted code formatting in some places.
* Comparison primops return Int# (Fixes #6135)Jan Stolarek2013-08-141-9/+0
| | | | | | | | | | | | This patch modifies all comparison primops for Char#, Int#, Word#, Double#, Float# and Addr# to return Int# instead of Bool. A value of 1# represents True and 0# represents False. For a more detailed description of motivation for this change, discussion of implementation details and benchmarking results please visit the wiki page: http://hackage.haskell.org/trac/ghc/wiki/PrimBool There's also some cleanup: whitespace fixes in files that were extensively edited in this patch and constant folding rules for Integer div and mod operators (which for some reason have been left out up till now).
* Add support for byte endian swapping for Word 16/32/64.Austin Seipp2013-07-171-0/+12
| | | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machop: MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Authored-by: Vincent Hanquez <tab@snarc.org> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Revert "Add support for byte endian swapping for Word 16/32/64."Simon Peyton Jones2013-06-111-12/+0
| | | | This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
* Add support for byte endian swapping for Word 16/32/64.Ian Lynagh2013-06-091-0/+12
| | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machops MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Patch from Vincent Hanquez.
* In CMM, only allow foreign calls to labels, not arbitrary expressionsIan Lynagh2013-04-241-3/+2
| | | | | | | | | I'm not sure if we want to make this change permanently, but for now it fixes the unreg build. I've also removed some redundant special-case code that generated prototypes for foreign functions. The standard pprTempAndExternDecls now generates them.
* Typo-fix for panic.Edward Z. Yang2013-03-111-1/+1
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Primitive bitwise operations on Int# (Fixes #7689)Jan Stolarek2013-02-181-0/+4
|
* Add prefetch primops.Geoffrey Mainland2013-02-011-0/+47
|
* Add support for passing SSE vectors in registers.Geoffrey Mainland2013-02-011-5/+13
| | | | | | | This patch adds support for 6 XMM registers on x86-64 which overlap with the F and D registers and may hold 128-bit wide SIMD vectors. Because there is not a good way to attach type information to STG registers, we aggressively bitcast in the LLVM back-end.
* Add the Int64X2# primitive type and associated primops.Geoffrey Mainland2013-02-011-0/+37
|
* Add the DoubleX2# primitive type and associated primops.Geoffrey Mainland2013-02-011-0/+36
|
* Add the Int32X4# primitive type and associated primops.Paul Monday2013-02-011-0/+37
|
* Add the Float32X4# primitive type and associated primops.Geoffrey Mainland2013-02-011-137/+337
| | | | | | | | | | | | | This patch lays the groundwork needed for primop support for SIMD vectors. In addition to the groundwork, we add support for the FloatX4# primitive type and associated primops. * Add the FloatX4# primitive type and associated primops. * Add CodeGen support for Float vectors. * Compile vector operations to LLVM vector operations in the LLVM code generator. * Make the x86 native backend fail gracefully when encountering vector primops. * Only generate primop wrappers for vector primops when using LLVM.
* Tidy up: move info-table related stuff to CmmInfoSimon Marlow2013-01-231-0/+1
| | | | Prep for #709
* Implement word2Float# and word2Double#Johan Tibell2012-12-131-0/+6
|
* Fix popcnt callsIan Lynagh2012-11-011-10/+5
| | | | | We don't want to narrow the argument size before making the foreign call: Word8 still gets passed as a Word-sized argument
* Whitespace only in codeGen/StgCmmPrim.hsIan Lynagh2012-11-011-90/+83
|
* Some alpha renamingIan Lynagh2012-10-161-1/+1
| | | | | Mostly d -> g (matching DynFlag -> GeneralFlag). Also renamed if* to when*, matching the Haskell if/when names
* Fix copyArray# bug in new code generatorRoman Leshchinskiy2012-10-081-17/+22
|
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-13/+12
|
* Move wORD_SIZE_IN_BITS to DynFlagsIan Lynagh2012-09-141-2/+2
| | | | This frees wORD_SIZE up to be moved out of HaskellConstants
* Move some more constants fo platformConstantsIan Lynagh2012-09-141-3/+3
|
* Use oFFSET_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-5/+5
|
* Use sIZEOF_* from platformConstants rather than ConstantsIan Lynagh2012-09-131-1/+1
|
* Pass DynFlags down to wordWidthIan Lynagh2012-09-121-249/+253
|