summaryrefslogtreecommitdiff
path: root/compiler/codeGen
Commit message (Collapse)AuthorAgeFilesLines
* Make Applicative a superclass of MonadAustin Seipp2014-09-099-12/+40
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backendReid Barton2014-08-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These MachOps are used by addIntC# and subIntC#, which in turn are used in integer-gmp when adding or subtracting small Integers. The following benchmark shows a ~6% speedup after this commit on x86_64 (building GHC with BuildFlavour=perf). {-# LANGUAGE MagicHash #-} import GHC.Exts import Criterion.Main count :: Int -> Integer count (I# n#) = go n# 0 where go :: Int# -> Integer -> Integer go 0# acc = acc go n# acc = go (n# -# 1#) $! acc + 1 main = defaultMain [bgroup "count" [bench "100" $ whnf count 100]] Differential Revision: https://phabricator.haskell.org/D140
* Implement new CLZ and CTZ primops (re #9340)Herbert Valerio Riedel2014-08-141-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
* StgCmmPrim: add note to stop using fixed size signed types for sizesJohan Tibell2014-08-121-0/+5
| | | | | We use fixed size signed types to e.g. represent array sizes. This means that the size can overflow.
* shouldInlinePrimOp: Fix Int overflowJohan Tibell2014-08-121-22/+38
| | | | | | | | | | | | | | | | There were two overflow issues in shouldInlinePrimOp. The first one is due to a negative CmmInt literal being created if the array size was given as larger than 2^63-1 (on a 64-bit platform.) This meant that large array sizes could compare as being smaller than maxInlineAllocSize. The second issue is that we casted the Integer to an Int in the comparison, which again meant that large array sizes could compare as being smaller than maxInlineAllocSize. The attempt to allocate a large array inline then caused a segfault. Fixes #9416.
* Make IntAddCOp, IntSubCOp into GenericOpsReid Barton2014-08-101-57/+65
| | | | | | | | | ... in preparation for backend-specific implementations. No functional changes in this commit (except in panic messages for ill-formed Cmm). Differential Revision: https://phabricator.haskell.org/D138
* Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId.Edward Z. Yang2014-07-218-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, both Cabal and GHC defined the type PackageId, and we expected them to be roughly equivalent (but represented differently). This refactoring separates these two notions. A package ID is a user-visible identifier; it's the thing you write in a Cabal file, e.g. containers-0.9. The components of this ID are semantically meaningful, and decompose into a package name and a package vrsion. A package key is an opaque identifier used by GHC to generate linking symbols. Presently, it just consists of a package name and a package version, but pursuant to #9265 we are planning to extend it to record other information. Within a single executable, it uniquely identifies a package. It is *not* an InstalledPackageId, as the choice of a package key affects the ABI of a package (whereas an InstalledPackageId is computed after compilation.) Cabal computes a package key for the package and passes it to GHC using -package-name (now *extremely* misnamed). As an added bonus, we don't have to worry about shadowing anymore. As a follow on, we should introduce -current-package-key having the same role as -package-name, and deprecate the old flag. This commit is just renaming. The haddock submodule needed to be updated. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonpj, simonmar, hvr, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D79 Conflicts: compiler/main/HscTypes.lhs compiler/main/Packages.lhs utils/haddock
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-301-0/+94
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-261-94/+0
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-241-0/+94
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Don't use showPass in the backend (#8973)Simon Marlow2014-06-081-4/+1
|
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-1522-4/+37
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Revert "Per-thread allocation counters and limits"Simon Marlow2014-05-041-196/+72
| | | | | | | | Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
* Per-thread allocation counters and limitsSimon Marlow2014-05-021-72/+196
| | | | | | | | | | | | | | | | | | | | | | | This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
* Add inline versions of copy ops for small arraysJohan Tibell2014-03-301-0/+63
| | | | | If the number of elements being copied is known statically this might lead to the copy loop being unrolled in the backend.
* Add SmallArray# and SmallMutableArray# typesJohan Tibell2014-03-295-38/+149
| | | | | | | | | | | | | | | These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
* Make copy array ops out-of-line by defaultJohan Tibell2014-03-281-32/+45
| | | | | | This should reduce code size when there's little to gain from inlining these primops, while still retaining the inlining benefit when the size of the copy is known statically.
* codeGen: inline allocation optimization for clone array primopsJohan Tibell2014-03-222-91/+74
| | | | | | | | | | | | | | | | | | | | | | | | The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.
* codeGen: allocate small byte arrays of statically known size inlineJohan Tibell2014-03-141-10/+39
| | | | | | | This results in a 57% runtime decrease when allocating an array of 128 bytes on a 64-bit machine. Fixes #8876.
* Comments on virtHp, realHp (Trac #8864)Simon Peyton Jones2014-03-133-6/+37
| | | | | | | Documentation in response to Johan's questions Plus, don't export hpRel from StgCmmHeap, StgCmmLayout (it is only used locally in StgCmmLayout)
* Fix incorrect loop condition in inline array allocationJohan Tibell2014-03-113-8/+12
| | | | | Also make sure allocHeapClosure updates profiling counters with the memory allocated.
* Refactor inline array allocationSimon Marlow2014-03-114-96/+67
| | | | | | | | | | - Move array representation knowledge into SMRep - Separate out low-level heap-object allocation so that we can reuse it from doNewArrayOp - remove card-table initialisation, we can safely ignore the card table for newly allocated arrays.
* Represent offsets into heap objects with byte, not word, offsetsSimon Marlow2014-03-115-29/+40
| | | | | I'd like to be able to pack together non-pointer fields that are less than a word in size, and this is a necessary prerequisite.
* codeGen: allocate small arrays of statically known size inlineJohan Tibell2014-03-111-38/+159
| | | | | | | | | | This results in a 46% runtime decrease when allocating an array of 16 unit elements on a 64-bit machine. In order to allow newArray# to have both an inline and an out-of-line implementation, cgOpApp is refactored slightly. The new implementation of cgOpApp should make it easier to add other primops with both inline and out-of-line implementations in the future.
* Fix a bug in codegen for non-updatable selector thunks (#8817)Simon Marlow2014-02-271-23/+35
| | | | | | | To evaluate most non-updatable thunks, we can jump directly to the entry code if we know what it is. But not for a selector thunk: these might be updated by the garbage collector, so we have to enter the closure with an indirect jump through its info pointer.
* Cleaned up Maybes.lhsBaldur Blöndal2014-02-131-2/+2
|
* Loopification jump between stack and heap checksJan Stolarek2014-02-013-18/+53
| | | | | | | | | | | | Fixes #8585 When emmiting label of a self-recursive tail call (ie. when performing loopification optimization) we emit the loop header label after a stack check but before the heap check. The reason is that tail-recursive functions use constant amount of stack space so we don't need to repeat the check in every loop. But they can grow the heap so heap check must be repeated in every call. See Note [Self-recursive tail calls] and [Self-recursive loop header].
* Remove the LFBlackHole constructorPatrick Palka2013-12-051-27/+0
| | | | After commit 55c703b8fdb0, this code is no longer used anywhere.
* Update and deduplicate the comments on CAF management (#8590)Patrick Palka2013-12-041-31/+4
|
* Move the allocation of CAF blackholes into 'newCAF' (#8590)Patrick Palka2013-12-041-30/+10
| | | | | | | | | | We now do the allocation of the blackhole indirection closure inside the RTS procedure 'newCAF' instead of generating the allocation code inline in the closure body of each CAF. This slightly decreases code size in modules with a lot of CAFs. As a result of this change, for example, the size of DynFlags.o drops by ~60KB and HsExpr.o by ~100KB.
* Move the LDV code below the self-loop label (#8275)Patrick Palka2013-12-011-1/+1
|
* Don't explicitly refer to nodeReg in ldvEnterClosurePatrick Palka2013-12-012-7/+8
|
* Document solution to #8275Jan Stolarek2013-12-011-2/+13
|
* Fix loopification with profiling and enable it by default (#8275)Patrick Palka2013-12-011-4/+2
|
* Comments on slow-call-shortcuttingSimon Marlow2013-11-281-0/+36
|
* Fix up shortcut for slow callsPatrick Palka2013-11-281-7/+7
|
* Implement shortcuts for slow calls (#6084)Simon Marlow2013-11-281-7/+43
|
* Move isVoidRep, isGcPtrRep to TyCon to join primRepSizeW etcSimon Peyton Jones2013-11-223-14/+4
| | | | This is just a modest refactoring
* Fix some cases where we were leaving slop in the heap (#8515, #8298)Simon Marlow2013-11-141-6/+14
|
* commentsSimon Marlow2013-11-141-5/+5
|
* Revert "Implement shortcuts for slow calls that would require PAPs (#6084)"Austin Seipp2013-10-261-43/+7
| | | | This reverts commit 2f5db98e90cf0cff1a11971c85f108a7480528ed.
* Revert "comments"Austin Seipp2013-10-261-27/+0
| | | | This reverts commit 9026c77a07533bda3773c3c3f3df1c6592bc80c7.
* commentsSimon Marlow2013-10-251-0/+27
|
* Implement shortcuts for slow calls that would require PAPs (#6084)Simon Marlow2013-10-251-7/+43
|
* More comments about stack layoutSimon Peyton Jones2013-10-181-9/+27
|
* Comments (about the stack overflow check) onlySimon Peyton Jones2013-10-181-15/+23
|
* Generate (old + 0) instead of Sp in stack checksJan Stolarek2013-10-161-2/+25
| | | | | | | | | | | | | | | | | | | | When compiling a function we can determine how much stack space it will use. We therefore need to perform only a single stack check at the beginning of a function to see if we have enough stack space. Instead of referring directly to Sp - as we used to do in the past - the code generator uses (old + 0) in the stack check. Stack layout phase turns (old + 0) into Sp. The idea here is that, while we need to perform only one stack check for each function, we could in theory place more stack checks later in the function. They would be redundant, but not incorrect (in a sense that they should not change program behaviour). We need to make sure however that a stack check inserted after incrementing the stack pointer checks for a respectively smaller stack space. This would not be the case if the code generator produced direct references to Sp. By referencing (old + 0) we make sure that we always check for a correct amount of stack: when converting (old + 0) to Sp the stack layout phase takes into account changes already made to stack pointer. The idea for this change came from observations made while debugging #8275.
* Fix a bug in the canned selector code when profiling.Simon Marlow2013-10-111-1/+6
|
* Comments onlySimon Peyton Jones2013-10-041-1/+7
|
* Add support for prefetch with locality levels.Austin Seipp2013-10-011-23/+30
| | | | | | | | | | | | | | | | | This patch adds support for several new primitive operations which support using processor-specific instructions to help guide data and cache locality decisions. We have levels ranging from [0..3] For LLVM, we generate llvm.prefetch intrinsics at the proper locality level (similar to GCC.) For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and PowerPC, the locality levels are ignored. This closes #8256. Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com> Signed-off-by: Austin Seipp <austin@well-typed.com>