summaryrefslogtreecommitdiff
path: root/compiler/nativeGen
Commit message (Collapse)AuthorAgeFilesLines
* Make the C-- O and C types constructors with DataKindsJohn Ericson2019-09-051-2/+5
| | | | | The tightens up the kinds a bit. I use type synnonyms to avoid adding promotion ticks everywhere.
* Few tweaks in -ddump-debug output, minor refactoringÖmer Sinan Ağacan2019-09-021-8/+6
| | | | | | | - Fixes crazy indentation in -ddump-debug output - We no longer dump empty sections in -ddump-debug when a code block does not have any generated debug info - Minor refactoring in Debug.hs and AsmCodeGen.hs
* Return results of Cmm streams in backendsÖmer Sinan Ağacan2019-08-281-12/+14
| | | | | | | | | | | | | | | | | | | This generalizes code generators (outputAsm, outputLlvm, outputC, and the call site codeOutput) so that they'll return the return values of the passed Cmm streams. This allows accumulating data during Cmm generation and returning it to the call site in HscMain. Previously the Cmm streams were assumed to return (), so the code generators returned () as well. This change is required by !1304 and !1530. Skipping CI as this was tested before and I only updated the commit message. [skip ci]
* Remove redundant OPTIONS_GHC in BlockLayout.hsAndreas Klebinger2019-08-271-3/+0
|
* Remove Bag fold specialisations (#16969)Richard Lupton2019-08-191-2/+2
|
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-1512-15/+6
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Introduce a type for "platform word size", use it instead of IntÖmer Sinan Ağacan2019-08-063-14/+11
| | | | | | | | We introduce a PlatformWordSize type and use it in platformWordSize field. This removes to panic/error calls called when platform word size is not 32 or 64. We now check for this when reading the platform config.
* compiler: emit finer grained codegen events to eventlogAlp Mestanogullari2019-08-021-20/+25
|
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-1616-799/+97
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-0316-97/+799
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-283-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-1933-33/+33
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Use DeriveFunctor throughout the codebase (#15654)Krzysztof Gogolewski2019-06-123-14/+11
|
* Introduce log1p and expm1 primopschessai2019-06-093-0/+12
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* powerpc32: fix stack allocation code generationSergei Trofimovich2019-05-311-1/+1
| | | | | | | | | | | | | | | | When ghc was built for powerpc32 built failed as: It's a fallout of commit 3f46cffcc2850e68405a1 ("PPC NCG: Refactor stack allocation code") where word size used to be II32/II64 and changed to II8/panic "no width for given number of bytes" widthFromBytes ((platformWordSize platform) `quot` 8) The change restores initial behaviour by removing extra division. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* powerpc32: fix 64-bit comparison (#16465)Sergei Trofimovich2019-05-311-0/+1
| | | | | | | | | | | | | | | | | | On powerpc32 64-bit comparison code generated dangling target labels. This caused ghc build failure as: $ ./configure --target=powerpc-unknown-linux-gnu && make ... SCCs aren't in reverse dependent order bad blockId n3U This happened because condIntCode' in PPC codegen generated label name but did not place the label into `cmp_lo` code block. The change adds the `cmp_lo` label into the case of negative comparison. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Use datatype for unboxed returns when loading ghc into ghciMichael Sloan2019-05-222-37/+69
| | | | See #13101 and #15454
* Remove all target-specific portions of Config.hsJohn Ericson2019-05-142-25/+25
| | | | | | | | | | | | | | | | | | | 1. If GHC is to be multi-target, these cannot be baked in at compile time. 2. Compile-time flags have a higher maintenance than run-time flags. 3. The old way makes build system implementation (various bootstrapping details) with the thing being built. E.g. GHC doesn't need to care about which integer library *will* be used---this is purely a crutch so the build system doesn't need to pass flags later when using that library. 4. Experience with cross compilation in Nixpkgs has shown things work nicer when compiler's can *optionally* delegate the bootstrapping the package manager. The package manager knows the entire end-goal build plan, and thus can make top-down decisions on bootstrapping. GHC can just worry about GHC, not even core library like base and ghc-prim!
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-153-1/+33
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* codegen: unroll memcpy calls for small bytearraysArtem Pyanykh2019-04-141-5/+6
|
* removing x87 register support from native code genCarter Schonwald2019-04-1017-894/+235
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* codegen: use newtype for Alignment in BasicTypesArtem Pyanykh2019-04-092-23/+22
|
* codegen: fix memset unroll for small bytearrays, add 64-bit setsArtem Pyanykh2019-04-091-25/+55
| | | | | | | | | | | | | | | | | | | | | | Fixes #16052 When the offset in `setByteArray#` is statically known, we can provide better alignment guarantees then just 1 byte. Also, memset can now do 64-bit wide sets. The current memset intrinsic is not optimal however and can be improved for the case when we know that we deal with (baseAddress at known alignment) + offset For instance, on 64-bit `setByteArray# s 1# 23# 0#` given that bytearray is 8 bytes aligned could be unrolled into `movb, movw, movl, movq, movq`; but currently it is `movb x23` since alignment of 1 is all we can embed into MO_Memset op.
* Add support for bitreverse primopAlexandre2019-04-014-1/+17
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* PPC NCG: Use liveness information in CmmCallPeter Trommler2019-03-154-42/+49
| | | | | | | | | | | | | | | | | We make liveness information for global registers available on `JMP` and `BCTR`, which were the last instructions missing. With complete liveness information we do not need to reserve global registers in `freeReg` anymore. Moreover we assign R9 and R10 to callee saves registers. Cleanup by removing `Reg_Su`, which was unused, from `freeReg` and removing unused register definitions. The calculation of the number of floating point registers is too conservative. Just follow X86 and specify the constants directly. Overall on PowerPC this results in 0.3 % smaller code size in nofib while runtime is slightly better in some tests.
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-153-5/+5
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* NCG: correctly escape path strings on Windows (#16389)Sylvain Henry2019-03-092-2/+4
| | | | | GHC native code generator generates .incbin and .file directives. We need to escape those strings correctly on Windows (see #16389).
* Rip out object splittingBen Gamari2019-03-057-65/+27
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Don't wrap the entry map for LiveInfo in Maybe.klebinger.andreas@gmx.at2019-02-156-26/+27
| | | | | | | | | | | | | It never really encoded a invariant. * The linear register allocator just did partial pattern matches * The graph allocator just set it to (Just mapEmpty) for Nothing So I changed LiveInfo to directly contain the map. Further natCmmTopToLive which filled in Nothing is no longer exported. Instead we know call cmmTopLiveness which changes the type AND fills in the map.
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-144-12/+51
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* Fix Int overflow on 32 bit platformPeter Trommler2019-02-101-1/+1
|
* Stack: fix name mangling.Tamar Christina2019-02-091-1/+1
|
* Allow resizing the stack for the graph allocator.klebinger.andreas@gmx.at2019-02-086-36/+105
| | | | | | | | | | The graph allocator now dynamically resizes the number of stack slots when running into the limit. This fixes #8657. Also loop membership of basic blocks is now available in the register allocator for cost heuristics.
* Remove unused importsSebastian Graf2019-02-021-3/+0
|
* PPC NCG: Promote integers to word size in C callsPeter Trommler2019-01-311-13/+23
| | | | Fixes #16222
* Small optimizations to BlockLayout.klebinger.andreas@gmx.at2019-01-311-39/+31
| | | | | | | | | | * Remove `takeL/R 1` occurences by lastOL/headOL. * Make BlockChain a OrdList newtype by removing the set of blocks. Initially BlockChain contained both, a set for membership test and a ordered list of blocks. The set is not used for any performance sensitive lookups so we get rid of it.
* Replace BlockSequence with OrdList in BlockLayout.hsklebinger.andreas@gmx.at2019-01-311-76/+23
| | | | | OrdList does the same thing and more so there is no reason to have both.
* Optimize pprASCIISylvain Henry2019-01-311-12/+23
| | | | | | | | | * Use `ByteString.foldr` instead of `(List.foldr . BS.unpack)` * Avoid calling `chr` and its test that checks for invalid Unicode codepoints: we stay in the ASCII range so we know we're ok * Avoid calling `isPrint` (unsafe FFI call): we can check the ASCII printable range directly * Use bit operations (`unsafeShiftR`, `.&.`) instead of `div` and `mod`
* Use ByteString to represent Cmm string literals (#16198)Sylvain Henry2019-01-313-4/+8
| | | | Also used ByteString in some other relevant places
* Compile count{Leading,Trailing}Zeros to corresponding x86_64 instructions ↵Dmitry Ivanov2019-01-303-28/+63
| | | | | | | under -mbmi2 This works similarly to existing implementation for popCount. Trac ticket: #16086.
* Revert "Batch merge"Ben Gamari2019-01-303-63/+28
| | | | This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
* Batch mergeBen Gamari2019-01-303-28/+63
|
* Fix regDotColor for amd64.klebinger.andreas@gmx.at2019-01-272-31/+48
| | | | | Add missing color mappings to regDotColor for amd64. Also set fakeRegs to red instead of xmm regs.
* A few typofixesGabor Greif2019-01-231-1/+1
|
* PPC NCG: Rename constructorsPeter Trommler2019-01-171-28/+29
| | | | | Rename constructors in calling convention data type to reflect the fact that they represent an ELF ABI not only a Linux ABI.
* Fix tab and improve whitespacePeter Trommler2019-01-171-7/+8
|
* PPC NCG: Register definitions for all 64-bit systemsPeter Trommler2019-01-171-7/+3
|
* PPC NCG: GOT declaration for all 64-bit ELF systemsPeter Trommler2019-01-171-5/+3
|
* PPC NCG: Make `stackHeaderSize` more generalPeter Trommler2019-01-171-7/+6
|