summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/PPC/CodeGen.hs
Commit message (Collapse)AuthorAgeFilesLines
* Clean up `#include`s in the compilerJohn Ericson2019-10-051-1/+0
| | | | | | | | - Remove unneeded ones - Use <..> for inter-package. Besides general clean up, helps distinguish between the RTS we link against vs the RTS we compile for.
* Remove empty NCG.hJohn Ericson2019-09-131-1/+0
|
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-101-1/+1
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-161-6/+0
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-031-0/+6
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-191-1/+1
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Introduce log1p and expm1 primopschessai2019-06-091-0/+4
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* powerpc32: fix 64-bit comparison (#16465)Sergei Trofimovich2019-05-311-0/+1
| | | | | | | | | | | | | | | | | | On powerpc32 64-bit comparison code generated dangling target labels. This caused ghc build failure as: $ ./configure --target=powerpc-unknown-linux-gnu && make ... SCCs aren't in reverse dependent order bad blockId n3U This happened because condIntCode' in PPC codegen generated label name but did not place the label into `cmp_lo` code block. The change adds the `cmp_lo` label into the case of negative comparison. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* removing x87 register support from native code genCarter Schonwald2019-04-101-4/+2
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* Add support for bitreverse primopAlexandre2019-04-011-0/+1
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* PPC NCG: Use liveness information in CmmCallPeter Trommler2019-03-151-17/+23
| | | | | | | | | | | | | | | | | We make liveness information for global registers available on `JMP` and `BCTR`, which were the last instructions missing. With complete liveness information we do not need to reserve global registers in `freeReg` anymore. Moreover we assign R9 and R10 to callee saves registers. Cleanup by removing `Reg_Su`, which was unused, from `freeReg` and removing unused register definitions. The calculation of the number of floating point registers is too conservative. Just follow X86 and specify the constants directly. Overall on PowerPC this results in 0.3 % smaller code size in nofib while runtime is slightly better in some tests.
* PPC NCG: Promote integers to word size in C callsPeter Trommler2019-01-311-13/+23
| | | | Fixes #16222
* PPC NCG: Rename constructorsPeter Trommler2019-01-171-28/+29
| | | | | Rename constructors in calling convention data type to reflect the fact that they represent an ELF ABI not only a Linux ABI.
* Fix tab and improve whitespacePeter Trommler2019-01-171-7/+8
|
* PPC NCG: Make calling convention more generalPeter Trommler2019-01-171-6/+5
| | | | | All operating systems except AIX and Darwin follow the ELF specification.
* PPC NCG: Remove Darwin supportPeter Trommler2019-01-011-65/+29
| | | | | | | Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
* PPC NCG: Simple 64-bit condition code on 32-bitPeter Trommler2018-12-301-3/+48
|
* PPC NCG: Generate MO_?_QuotRem for subword sizesPeter Trommler2018-12-111-21/+24
| | | | | | | | | | | | | | | Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG. Refactor common code with remainder operation. Test Plan: validate (I validated on Linux powerpc64le and x86_64) Reviewers: erikd, hvr, bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5323
* PPC NCG: Implement MachOps for smaller sizesPeter Trommler2018-12-111-161/+146
| | | | | | | | | | | | | | | | | Generate code for MachOps with smaller than wordsize data. Refactor conversion MachOps. Fixes #15854 Test Plan: validate (I validated on powerpc64le and x86_64 Linux) Reviewers: bgamari, hvr, erikd, simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15854 Differential Revision: https://phabricator.haskell.org/D5300
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* Fix precision of asinh/acosh/atanh by making them primopsArtem Pelenitsyn2018-08-211-0/+8
| | | | | | | | | | Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
* Allow CmmLabelDiffOff with different widthsSimon Marlow2018-05-161-1/+2
| | | | | | | | | | | | | | | | | | Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
* Add 'addWordC#' PrimOpSebastian Graf2018-05-051-0/+7
| | | | | | | | | | | | | | | | | | | This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'. I found 'plusWord2#' while implementing this, which both lacks documentation and has a slightly different specification than 'addWordC#', which means the generic implementation is unnecessarily complex. While I was at it, I also added lacking meta-information on PrimOps and refactored 'subWordC#'s generic implementation to be branchless. Reviewers: bgamari, simonmar, jrtc27, dfeuer Reviewed By: bgamari, dfeuer Subscribers: dfeuer, thomie, carter Differential Revision: https://phabricator.haskell.org/D4592
* PPC nativeGen: Add support for MO_SS_Conv_W32_W64Peter Trommler2018-03-191-0/+8
| | | | | | | | | | | | | | | This is required by D4363. D4362 has the implementation for i386 this commit adds PowerPC. Test Plan: validate Reviewers: erikd, hvr, simonmar, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4468
* Add new mbmi and mbmi2 compiler flagsJohn Ky2018-01-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
* Revert "Add new mbmi and mbmi2 compiler flags"Ben Gamari2017-11-221-2/+0
| | | | | | This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
* Add new mbmi and mbmi2 compiler flagsJohn Ky2017-11-151-0/+2
| | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
* Fix PPC NCG after blockID patchPeter Trommler2017-11-091-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit rGHC8b007ab assigns the same label to the first basic block of a proc and to the proc entry point. This violates the PPC 64-bit ELF v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols. This patch fixes duplicate symbols caused by block labels In commit rGHCd7b8da1 an info table label is generated from a block id. Getting the entry label from that info label leads to an undefined symbol because a suffix "_entry" that is not present in the block label. To fix that issue add a new info table label flavour for labels derived from block ids. Converting such a label with toEntryLabel produces the original block label. Fixes #14311 Test Plan: ./validate Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14311 Differential Revision: https://phabricator.haskell.org/D4149
* PPC NCG: Impl branch prediction, atomic ops.Peter Trommler2017-11-021-32/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement AtomicRMW ops, atomic read, atomic write in PowerPC native code generator. Also implement branch prediction because we need it in atomic ops anyway. This patch improves the issue in #12537 a bit but does not fix it entirely. The fallback operations for atomicread and atomicwrite in libraries/ghc-prim/cbits/atomic.c are incorrect. This patch avoids those functions by implementing the operations directly in the native code generator. This is also what the x86/amd64 NCG and the LLVM backend do. Test Plan: validate on AIX and PowerPC (32-bit) Linux Reviewers: erikd, hvr, austin, bgamari, simonmar Reviewed By: hvr, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #12537 Differential Revision: https://phabricator.haskell.org/D3984
* Turn `compareByteArrays#` out-of-line primop into inline primopalexbiehl2017-10-291-0/+1
| | | | | | | | | | | | Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
* A bunch of typofixesGabor Greif2017-09-261-1/+1
|
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* nativeGen: Consistently use blockLbl to generate CLabels from BlockIdsBen Gamari2017-09-191-3/+2
| | | | | | | | | | | | | | | This fixes #14221, where the NCG and the DWARF code were apparently giving two different names to the same block. Test Plan: Validate with DWARF support enabled. Reviewers: simonmar, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14221 Differential Revision: https://phabricator.haskell.org/D3977
* Add support for producing position-independent executablesBen Gamari2017-08-221-3/+3
| | | | | | | | | | | | | | | | Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* PPC NCG: Lower MO_*_Fabs as PowerPC fabs instructionPeter Trommler2017-05-011-0/+8
| | | | | | | | | | | | | | | | In Phab:D3265 we introduced MO_F32_Fabs and MO_F64_Fabs. This patch improves code generation by generating PowerPC fabs instructions. Test Plan: run numeric/should_run/numrun015 or validate Reviewers: austin, bgamari, hvr, simonmar, erikd Reviewed By: erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3512
* PPC NCG: Implement callish prim opsPeter Trommler2017-04-251-64/+400
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide PowerPC optimised implementations of callish prim ops. MO_?_QuotRem The generic implementation of quotient remainder prim ops uses a division and a remainder operation. There is no remainder on PowerPC and so we need to implement remainder "by hand" which results in a duplication of the divide operation when using the generic code. Avoid this duplication by implementing the prim op in the native code generator. MO_U_Mul2 Use PowerPC's instructions for long multiplication. Addition and subtraction Use PowerPC add/subtract with carry/overflow instructions MO_Clz and MO_Ctz Use PowerPC's CNTLZ instruction and implement count trailing zeros using count leading zeros MO_QuotRem2 Implement an algorithm given by Henry Warren in "Hacker's Delight" using PowerPC divide instruction. TODO: Use long division instructions when available (POWER7 and later). Test Plan: validate on AIX and 32-bit Linux Reviewers: simonmar, erikd, hvr, austin, bgamari Reviewed By: erikd, hvr, bgamari Subscribers: trofi, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D2973
* Generate better fp abs for X86 and llvm with default cmm otherwiseDominic Steinitz2017-03-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Currently we have this in libraries/base/GHC/Float.hs: ``` abs x | x == 0 = 0 -- handles (-0.0) | x > 0 = x | otherwise = negateFloat x ``` But 3-4 years ago it was noted that this was inefficient: https://mail.haskell.org/pipermail/libraries/2013-April/019690.html We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now. Reviewers: austin, simonmar, hvr, bgamari Reviewed By: bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3265
* Typos in commentsGabor Greif2016-10-171-1/+1
|
* PPC/CodeGen: fix lwa instruction generationPeter Trommler2016-10-011-4/+12
| | | | | | | | | | | | | | | | | | | Opcode lwa is a 64-bit opcode and allows a DS-form only. This patch generates lwa opcodes only when the offset is a multiple of 4. Fixes #12621 Test Plan: validate Reviewers: erikd, hvr, simonmar, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2547 GHC Trac Issues: #12621
* PPC NCG: Implement minimal stack frame header.Peter Trommler2016-08-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | According to the ABI specifications a minimal stack frame consists of a header and a minimum size parameter save area. We reserve the minimal size for each ABI. On PowerPC 64-bil Linux and AIX the parameter save area can accomodate up to eight parameters. So calls with eight parameters and fewer can be done without allocating a new stack frame and deallocating that stack frame after the call. On AIX one additional spill slot is available on the stack. Code size for all nofib benchmarks is 0.3 % smaller on powerpc64. Test Plan: validate on AIX Reviewers: hvr!, erikd, austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2445
* PPC NCG: Fix and refactor TOC handling.Peter Trommler2016-06-191-28/+28
| | | | | | | | | | | | | | | | | | | In a call to a fixed function the TOC does not need to be saved. The linker handles TOC saving. Refactor TOC handling by folding the two functions toc_before and toc_after into the code generating the call sequence. This saves repeating the case distinction in those two functions. Test Plan: validate on PowerPC 32-bit Linux and AIX Reviewers: hvr, simonmar, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2328
* PPC NCG: Fix float parameter passing on 64-bit.Peter Trommler2016-06-191-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | On Linux 64-bit PowerPC the first 13 floating point parameters are passed in registers. We only passed the first 8 floating point params. The alignment of a floating point single precision value in ELF v1.9 is the second word of a doubleword. For ELF v2 we support only little endian and the least significant word of a doubleword is the first word, so no special handling is required. Add a regression test. Test Plan: validate on powerpc Linux and AIX Reviewers: erikd, hvr, austin, simonmar, bgamari Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2327 GHC Trac Issues: #12134
* PPC NCG: Improve pointer de-tagging codePeter Trommler2016-04-291-5/+22
| | | | | | | | | | | | | | | | Generate a clrr[wd]i instruction to clear the tag bits in a pointer. This saves one instruction and one temporary register. Optimize signed comparison with zero after andi. operation This saves one instruction when comparing a pointer tag with zero. This reduces code size by 0.6 % in all nofib benchmarks. Test Plan: validate on AIX and 32-bit Linux Reviewed By: erikd, hvr Differential Revision: https://phabricator.haskell.org/D2093
* Remove code-duplication in the PPC NCGHerbert Valerio Riedel2016-03-241-26/+19
| | | | | | Reviewed By: bgamari, trommler Differential Revision: https://phabricator.haskell.org/D2020
* Add NCG support for AIX/ppc32Herbert Valerio Riedel2016-03-241-5/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the previous work to revive the unregisterised GHC build for AIX/ppc32. Strictly speaking, AIX runs on POWER4 (and later) hardware, but the PPC32 instructions implemented in the PPC NCG represent a compatible subset of the POWER4 ISA. IBM AIX follows the PowerOpen ABI (and shares many similiarites with the Linux PPC64 ELF V1 NCG backend) but uses the rather limited XCOFF format (compared to ELF). This doesn't support dynamic libraries yet. A major limiting factor is that the AIX assembler does not support the `@ha`/`@l` relocation types nor the ha16()/lo16() functions Darwin's assembler supports. Therefore we need to avoid emitting those. In case of numeric literals we simply compute the functions ourselves, while for labels we have to use local TOCs and hope everything fits into a 16bit offset (for ppc32 this gives us at most 16384 entries per TOC section, which is enough to compile GHC). Another issue is that XCOFF doesn't seem to have a relocation type for label-differences, and therefore the label-differences placed into tables-next-to-code can't be relocated, but the linker may rearrange different sections, so we need to place all read-only sections into the same `.text[PR]` section to workaround this. Finally, the PowerOpen ABI distinguishes between function-descriptors and actualy entry-point addresses. For AIX we need to be specific when emitting assembler code whether we want the address of the function descriptor `printf`) or for the entry-point (`.printf`). So we let the asm pretty-printer prefix a dot to all emitted subroutine calls (i.e. `BL`) on AIX only. For now, STG routines' entry-point labels are not prefixed by a label and don't have any associated function-descriptor. Reviewers: austin, trommler, erikd, bgamari Reviewed By: trommler, erikd, bgamari Differential Revision: https://phabricator.haskell.org/D2019
* Implement function-sections for Haskell code, #8405Simon Brenner2015-11-121-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag -split-sections that does similar things to -split-objs, but using sections in single object files instead of relying on the Satanic Splitter and other abominations. This is very similar to the GCC flags -ffunction-sections and -fdata-sections. The --gc-sections linker flag, which allows unused sections to actually be removed, is added to all link commands (if the linker supports it) so that space savings from having base compiled with sections can be realized. Supported both in LLVM and the native code-gen, in theory for all architectures, but really tested on x86 only. In the GHC build, a new SplitSections variable enables -split-sections for relevant parts of the build. Test Plan: validate with both settings of SplitSections Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari Reviewed By: simonmar, thomie, bgamari Subscribers: hsyl20, erikd, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D1242 GHC Trac Issues: #8405
* Add subWordC# on x86ishNikita Karetnikov2015-10-311-0/+1
| | | | | | | | | | | | | | | This adds a subWordC# primop which implements subtraction with overflow reporting. Reviewers: tibbe, goldfire, rwbarton, bgamari, austin, hvr Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1334 GHC Trac Issues: #10962
* Annotate CmmBranch with an optional likely targetSimon Marlow2015-09-231-3/+4
| | | | | | | | | | | | | | | | | Summary: This allows the code generator to give hints to later code generation steps about which branch is most likely to be taken. Right now it is only taken into account in one place: a special case in CmmContFlowOpt that swapped branches over to maximise the chance of fallthrough, which is now disabled when there is a likelihood setting. Test Plan: validate Reviewers: austin, simonpj, bgamari, ezyang, tibbe Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1273