summaryrefslogtreecommitdiff
path: root/compiler/nativeGen
Commit message (Collapse)AuthorAgeFilesLines
* Allow -dead_strip linking on platforms with .subsections_via_symbolsMoritz Angermann2014-11-193-18/+3
| | | | | | | | | | | | | | | | | Summary: This allows to link objects produced with the llvm code generator to be linked with -dead_strip. This applies to at least the iOS cross compiler and OS X compiler. Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de> Test Plan: Create a ffi library and link it with -dead_strip. If the resulting binary does not crash, the patch works as advertised. Reviewers: rwbarton, simonmar, hvr, dterei, mzero, ezyang, austin Reviewed By: dterei, ezyang, austin Subscribers: thomie, mzero, simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D206
* Per-thread allocation counters and limitsSimon Marlow2014-11-123-21/+60
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* Some refactoring around endPass and debug dumpingSimon Peyton Jones2014-11-041-4/+4
| | | | | I forget all the details, but I spent some time trying to understand the current setup, and tried to simplify it a bit
* Revert "Place static closures in their own section."Edward Z. Yang2014-10-203-6/+0
| | | | | | | | | | This reverts commit b23ba2a7d612c6b466521399b33fe9aacf5c4f75. Conflicts: compiler/cmm/PprCmmDecl.hs compiler/nativeGen/PPC/Ppr.hs compiler/nativeGen/SPARC/Ppr.hs compiler/nativeGen/X86/Ppr.hs
* Indentation and non-semantic changes only.Edward Z. Yang2014-10-193-76/+80
| | | | | | | | | | | | | | | Summary: Get these lines fitting in 80 columns, and replace ptext (sLit ...) with text Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D342
* Implement optimized NCG `MO_Ctz W64` op for i386 (#9340)Herbert Valerio Riedel2014-10-181-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is an optimization to the CTZ primops introduced for #9340 Previously we called out to `hs_ctz64`, but we can actually generate better hand-tuned code while avoiding the FFI ccall. With this patch, the code {-# LANGUAGE MagicHash #-} module TestClz0 where import GHC.Prim ctz64 :: Word64# -> Word# ctz64 x = ctz64# x results in the following assembler generated by NCG on i386: TestClz.ctz64_info: movl (%ebp),%eax movl 4(%ebp),%ecx movl %ecx,%edx orl %eax,%edx movl $64,%edx je _nAO bsf %ecx,%ecx addl $32,%ecx bsf %eax,%eax cmovne %eax,%ecx movl %ecx,%edx _nAO: movl %edx,%esi addl $8,%ebp jmp *(%ebp) For comparision, here's what LLVM 3.4 currently generates: 000000fc <TestClzz_ctzz64_info>: fc: 0f bc 45 04 bsf 0x4(%ebp),%eax 100: b9 20 00 00 00 mov $0x20,%ecx 105: 0f 45 c8 cmovne %eax,%ecx 108: 83 c1 20 add $0x20,%ecx 10b: 8b 45 00 mov 0x0(%ebp),%eax 10e: 8b 55 08 mov 0x8(%ebp),%edx 111: 0f bc f0 bsf %eax,%esi 114: 85 c0 test %eax,%eax 116: 0f 44 f1 cmove %ecx,%esi 119: 83 c5 08 add $0x8,%ebp 11c: ff e2 jmp *%edx Reviewed By: austin Auditors: simonmar Differential Revision: https://phabricator.haskell.org/D163
* Code size micro-optimizations in the X86 backendReid Barton2014-10-071-1/+34
| | | | | | | | | | | | | | | | | | | | | Summary: Carter Schonwald suggested looking for opportunities to replace instructions in GHC's output by equivalent ones that are shorter, as recommended by the Intel optimization manuals. This patch reduces the module sizes as reported by nofib by about 1.5% on x86_64. Test Plan: Built an i386 cross-compiler and ran the test suite; the same (rather large) set of tests failed before and after this commit. Will let Harbormaster validate on x86_64. Reviewers: austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D320
* Place static closures in their own section.Edward Z. Yang2014-10-013-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: The primary reason for doing this is assisting debuggability: if static closures are all in the same section, they are guaranteed to be adjacent to one another. This will help later when we add some code that takes section start/end and uses this to sanity-check the sections. Part of remove HEAP_ALLOCED patch set (#8199) Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D263 GHC Trac Issues: #8199
* Stop exporting, and stop using, functions marked as deprecatedThomas Miedema2014-09-275-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Don't export `getUs` and `getUniqueUs`. `UniqSM` has a `MonadUnique` instance: instance MonadUnique UniqSM where getUniqueSupplyM = getUs getUniqueM = getUniqueUs getUniquesM = getUniquesUs Commandline-fu used: git grep -l 'getUs\>' | grep -v compiler/basicTypes/UniqSupply.lhs | xargs sed -i 's/getUs/getUniqueSupplyM/g git grep -l 'getUniqueUs\>' | grep -v combiler/basicTypes/UniqSupply.lhs | xargs sed -i 's/getUniqueUs/getUniqueM/g' Follow up on b522d3a3f970a043397a0d6556ca555648e7a9c3 Reviewed By: austin, hvr Differential Revision: https://phabricator.haskell.org/D220
* Delete hack that was once needed to fix the buildThomas Miedema2014-09-252-0/+2
| | | | | | | | | | | | | | | | | Summary: Introduced in 6c7b41cc2b24f533697a62bf1843507ae043fc97. I checked the rest of that commit, and this is all that was left to revert. Test Plan: x Reviewers: ezyang, austin Reviewed By: ezyang, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D241
* Make Applicative a superclass of MonadAustin Seipp2014-09-093-5/+11
| | | | | | | | | | | | | | | | | | | | | Summary: This includes pretty much all the changes needed to make `Applicative` a superclass of `Monad` finally. There's mostly reshuffling in the interests of avoid orphans and boot files, but luckily we can resolve all of them, pretty much. The only catch was that Alternative/MonadPlus also had to go into Prelude to avoid this. As a result, we must update the hsc2hs and haddock submodules. Signed-off-by: Austin Seipp <austin@well-typed.com> Test Plan: Build things, they might not explode horribly. Reviewers: hvr, simonmar Subscribers: simonmar Differential Revision: https://phabricator.haskell.org/D13
* `M-x delete-trailing-whitespace` & `M-x untabify`...Herbert Valerio Riedel2014-08-311-2/+2
| | | | | | ...some files more or less recently touched by me [ci skip]
* Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backendReid Barton2014-08-235-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | Summary: These MachOps are used by addIntC# and subIntC#, which in turn are used in integer-gmp when adding or subtracting small Integers. The following benchmark shows a ~6% speedup after this commit on x86_64 (building GHC with BuildFlavour=perf). {-# LANGUAGE MagicHash #-} import GHC.Exts import Criterion.Main count :: Int -> Integer count (I# n#) = go n# 0 where go :: Int# -> Integer -> Integer go 0# acc = acc go n# acc = go (n# -# 1#) $! acc + 1 main = defaultMain [bgroup "count" [bench "100" $ whnf count 100]] Differential Revision: https://phabricator.haskell.org/D140
* Implement new CLZ and CTZ primops (re #9340)Herbert Valerio Riedel2014-08-144-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
* x86: zero extend the result of 16-bit popcnt instructions (#9435)Reid Barton2014-08-121-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The 'popcnt r16, r/m16' instruction only writes the low 16 bits of the destination register, so we have to zero-extend the result to a full word as popCnt16# is supposed to return a Word#. For popCnt8# we could instead zero-extend the input to 32 bits and then do a 32-bit popcnt, and not have to zero-extend the result. LLVM produces the 16-bit popcnt sequence with two zero extensions, though, and who am I to argue? Test Plan: - ran "make TEST=cgrun071 EXTRA_HC_OPTS=-msse42" - then ran again adding "WAY=optasm", and verified that the popcnt sequences we generate match the ones produced by LLVM for its @llvm.ctpop.* intrinsics Reviewers: austin, hvr, tibbe Reviewed By: austin, hvr, tibbe Subscribers: phaskell, hvr, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D147 GHC Trac Issues: #9435
* Add CMOVcc insns to x86 NCGHerbert Valerio Riedel2014-08-122-0/+18
| | | | | | | | | | | | | | This is a pre-requisite for implementing count-{leading,trailing}-zero prim-ops (re #9340) and may be useful to NCG to help turn some code into branch-less code sequences. Test Plan: Compiles and validates in combination with clz/ctz primop impl Reviewers: ezyang, rwbarton, simonmar, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D141
* Add bit scan {forward,reverse} insns to x86 NCGHerbert Valerio Riedel2014-08-122-2/+10
| | | | | | | | | | | This is a pre-requisite for implementing count-{leading,trailing}-zero prim-ops (re #9340) Reviewers: ezyang, rwbarton, simonmar, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D141
* x86: Always generate add instruction in MO_Add2 (#9013)Reid Barton2014-08-113-2/+14
| | | | | | | | | | | | | Test Plan: - ran validate - ran T9013 test with all ways - ran CarryOverflow test with all ways, for good measure Reviewers: austin, simonmar Reviewed By: simonmar Differential Revision: https://phabricator.haskell.org/D137
* Eliminate some code duplication in x86 backend (genCCall32/64)Reid Barton2014-08-101-101/+13
| | | | | | | | | | | | | | | | | | | | | | | | Summary: No functional changes except in panic messages. These functions were identical except for - x87 operations in genCCall32 - the fallback to genCCall32'/64' - "32" vs "64" in panic messages (one case was wrong!) - minor syntactic or otherwise non-functional differences. Test Plan: Ran "validate --no-dph --slow" before and after the change. Only differences were two tests that failed before the change but not after, further investigation revealed that those tests are in fact erratic. Reviewers: simonmar, austin Reviewed By: austin Subscribers: phaskell, simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D139
* refactor to fix 80column overflowSimon Marlow2014-08-011-16/+20
|
* Allow multiple entry points when allocating recursive groups (#9303)Simon Marlow2014-07-312-35/+39
| | | | | | | | | | | | | | | | | Summary: In this example we ended up with some code that was only reachable via an info table, because a branch had been optimised away by the native code generator. The register allocator then got confused because it was only considering the first block of the proc to be an entry point, when actually any of the info tables are entry points. Test Plan: validate Reviewers: simonpj, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D88
* Add missing memory fence to atomicWriteIntArray#Johan Tibell2014-07-233-1/+7
|
* X86 codegen: make LOCK a real instruction prefixJohan Tibell2014-07-233-12/+8
| | | | | | | | | | Before LOCK was a separate instruction and this led to the register allocator separating it from the instruction it was supposed to be a prefix of, leading to illegal assembly such as lock mov Fix contributed by PÁLI Gábor János.
* Rename PackageId to PackageKey, distinguishing it from Cabal's PackageId.Edward Z. Yang2014-07-212-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, both Cabal and GHC defined the type PackageId, and we expected them to be roughly equivalent (but represented differently). This refactoring separates these two notions. A package ID is a user-visible identifier; it's the thing you write in a Cabal file, e.g. containers-0.9. The components of this ID are semantically meaningful, and decompose into a package name and a package vrsion. A package key is an opaque identifier used by GHC to generate linking symbols. Presently, it just consists of a package name and a package version, but pursuant to #9265 we are planning to extend it to record other information. Within a single executable, it uniquely identifies a package. It is *not* an InstalledPackageId, as the choice of a package key affects the ABI of a package (whereas an InstalledPackageId is computed after compilation.) Cabal computes a package key for the package and passes it to GHC using -package-name (now *extremely* misnamed). As an added bonus, we don't have to worry about shadowing anymore. As a follow on, we should introduce -current-package-key having the same role as -package-name, and deprecate the old flag. This commit is just renaming. The haddock submodule needed to be updated. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonpj, simonmar, hvr, austin Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D79 Conflicts: compiler/main/HscTypes.lhs compiler/main/Packages.lhs utils/haddock
* nativeGen: detabify/dewhitespace SPARC/CodeGen/BaseAustin Seipp2014-07-201-53/+41
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CodeGen/Gen32Austin Seipp2014-07-201-373/+361
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CodeGen/SanityAustin Seipp2014-07-201-53/+42
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CodeGen/ExpandAustin Seipp2014-07-201-106/+91
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CodeGen/AmodeAustin Seipp2014-07-201-23/+15
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CodeGen/CondCodeAustin Seipp2014-07-201-29/+21
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/CondAustin Seipp2014-07-201-29/+21
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/RegsAustin Seipp2014-07-201-145/+137
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/InstrAustin Seipp2014-07-201-311/+302
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/ShortcutJumpAustin Seipp2014-07-201-21/+10
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/ImmAustin Seipp2014-07-201-42/+32
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SPARC/StackAustin Seipp2014-07-201-29/+20
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace TargetRegAustin Seipp2014-07-201-22/+12
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace RegClassAustin Seipp2014-07-201-28/+20
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace PPC/RegInfoAustin Seipp2014-07-201-16/+7
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace PPC/CondAustin Seipp2014-07-201-25/+17
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace X86/RegInfoAustin Seipp2014-07-201-19/+11
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace RegAustin Seipp2014-07-201-127/+116
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* nativeGen: detabify/dewhitespace SizeAustin Seipp2014-07-201-51/+44
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* remove SPARC related comment in PPC code generatorPeter Trommler2014-07-101-9/+0
| | | | | | | | | | | | | | | | Summary: PowerPC does not do delay slots and there is also no requirement to put extra instructions between FP operations and branches. Test Plan: None. Comment change only. Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D40
* Re-add more primops for atomic ops on byte arraysJohan Tibell2014-06-306-2/+212
| | | | | | | | | | | | | | | | | | | | | | | This is the second attempt to add this functionality. The first attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due to register allocator failure on x86. Given how the register allocator currently works, we don't have enough registers on x86 to support cmpxchg using complicated addressing modes. Instead we fall back to a simpler addressing mode on x86. Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Revert "Add more primops for atomic ops on byte arrays"Johan Tibell2014-06-266-194/+2
| | | | | | | | This commit caused the register allocator to fail on i386. This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and 04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to the first).
* Add more primops for atomic ops on byte arraysJohan Tibell2014-06-246-2/+194
| | | | | | | | | | | | | | | | | | | Summary: Add more primops for atomic ops on byte arrays Adds the following primops: * atomicReadIntArray# * atomicWriteIntArray# * fetchSubIntArray# * fetchOrIntArray# * fetchXorIntArray# * fetchAndIntArray# Makes these pre-existing out-of-line primops inline: * fetchAddIntArray# * casIntArray#
* Some typos in commentsGabor Greif2014-06-111-1/+1
|
* Make better use of the x86 addressing modeJohan Tibell2014-06-101-0/+9
| | | | | | | | | | | | | We now emit movq %rdi,16(%r14,%rsi,8) instead of leaq 16(%r14),%rax movq %rdi,(%rax,%rsi,8) This helps e.g. byte array indexing.
* Fix discarding of unreachable code in the register allocator (#9155)Simon Marlow2014-06-061-4/+10
| | | | | | | A previous fix to this was wrong: f5879acd018494b84233f26fba828ce376d0f81d and left some unreachable code behind. So rather than try to be clever and do this at the same time as the strongly-connected-component analysis, I'm doing a separate reachability pass first.