summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/PPC
Commit message (Collapse)AuthorAgeFilesLines
* Clean up `#include`s in the compilerJohn Ericson2019-10-051-1/+0
| | | | | | | | - Remove unneeded ones - Use <..> for inter-package. Besides general clean up, helps distinguish between the RTS we link against vs the RTS we compile for.
* Remove empty NCG.hJohn Ericson2019-09-134-4/+0
|
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-103-3/+3
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-151-1/+1
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Introduce a type for "platform word size", use it instead of IntÖmer Sinan Ağacan2019-08-061-1/+1
| | | | | | | | We introduce a PlatformWordSize type and use it in platformWordSize field. This removes to panic/error calls called when platform word size is not 32 or 64. We now check for this when reading the platform config.
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-162-12/+6
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-032-6/+12
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-194-4/+4
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Introduce log1p and expm1 primopschessai2019-06-091-0/+4
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* powerpc32: fix stack allocation code generationSergei Trofimovich2019-05-311-1/+1
| | | | | | | | | | | | | | | | When ghc was built for powerpc32 built failed as: It's a fallout of commit 3f46cffcc2850e68405a1 ("PPC NCG: Refactor stack allocation code") where word size used to be II32/II64 and changed to II8/panic "no width for given number of bytes" widthFromBytes ((platformWordSize platform) `quot` 8) The change restores initial behaviour by removing extra division. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* powerpc32: fix 64-bit comparison (#16465)Sergei Trofimovich2019-05-311-0/+1
| | | | | | | | | | | | | | | | | | On powerpc32 64-bit comparison code generated dangling target labels. This caused ghc build failure as: $ ./configure --target=powerpc-unknown-linux-gnu && make ... SCCs aren't in reverse dependent order bad blockId n3U This happened because condIntCode' in PPC codegen generated label name but did not place the label into `cmp_lo` code block. The change adds the `cmp_lo` label into the case of negative comparison. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-151-0/+11
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* removing x87 register support from native code genCarter Schonwald2019-04-103-10/+5
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* Add support for bitreverse primopAlexandre2019-04-011-0/+1
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* PPC NCG: Use liveness information in CmmCallPeter Trommler2019-03-154-42/+49
| | | | | | | | | | | | | | | | | We make liveness information for global registers available on `JMP` and `BCTR`, which were the last instructions missing. With complete liveness information we do not need to reserve global registers in `freeReg` anymore. Moreover we assign R9 and R10 to callee saves registers. Cleanup by removing `Reg_Su`, which was unused, from `freeReg` and removing unused register definitions. The calculation of the number of floating point registers is too conservative. Just follow X86 and specify the constants directly. Overall on PowerPC this results in 0.3 % smaller code size in nofib while runtime is slightly better in some tests.
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-2/+2
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* Rip out object splittingBen Gamari2019-03-051-10/+7
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-141-2/+1
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* PPC NCG: Promote integers to word size in C callsPeter Trommler2019-01-311-13/+23
| | | | Fixes #16222
* PPC NCG: Rename constructorsPeter Trommler2019-01-171-28/+29
| | | | | Rename constructors in calling convention data type to reflect the fact that they represent an ELF ABI not only a Linux ABI.
* Fix tab and improve whitespacePeter Trommler2019-01-171-7/+8
|
* PPC NCG: Register definitions for all 64-bit systemsPeter Trommler2019-01-171-7/+3
|
* PPC NCG: Make `stackHeaderSize` more generalPeter Trommler2019-01-171-7/+6
|
* PPC NCG: Make calling convention more generalPeter Trommler2019-01-171-6/+5
| | | | | All operating systems except AIX and Darwin follow the ELF specification.
* PPC NCG: Refactor stack allocation codePeter Trommler2019-01-162-26/+15
| | | | | There is only one place where UPDATE_SP was used. Instead of the UPDATE_SP pseudo instruction build the list of instructions directly.
* PPC NCG: Reduce memory consumption emitting string literalsPeter Trommler2019-01-131-15/+3
|
* PPC NCG: Remove Darwin supportPeter Trommler2019-01-014-144/+42
| | | | | | | Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
* PPC NCG: Simple 64-bit condition code on 32-bitPeter Trommler2018-12-301-3/+48
|
* PPC NCG: Generate MO_?_QuotRem for subword sizesPeter Trommler2018-12-111-21/+24
| | | | | | | | | | | | | | | Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG. Refactor common code with remainder operation. Test Plan: validate (I validated on Linux powerpc64le and x86_64) Reviewers: erikd, hvr, bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5323
* PPC NCG: Implement MachOps for smaller sizesPeter Trommler2018-12-111-161/+146
| | | | | | | | | | | | | | | | | Generate code for MachOps with smaller than wordsize data. Refactor conversion MachOps. Fixes #15854 Test Plan: validate (I validated on powerpc64le and x86_64 Linux) Reviewers: bgamari, hvr, erikd, simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15854 Differential Revision: https://phabricator.haskell.org/D5300
* Rename literal constructorsSylvain Henry2018-11-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
* NCG: New code layout algorithm.Andreas Klebinger2018-11-173-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* Fix precision of asinh/acosh/atanh by making them primopsArtem Pelenitsyn2018-08-211-0/+8
| | | | | | | | | | Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
* stack: fix stack allocations on WindowsTamar Christina2018-07-181-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: On Windows one is not allowed to drop the stack by more than a page size. The reason for this is that the OS only allocates enough stack till what the TEB specifies. After that a guard page is placed and the rest of the virtual address space is unmapped. The intention is that doing stack allocations will cause you to hit the guard which will then map the next page in and move the guard. This is done to prevent what in the Linux world is known as stack clash vulnerabilities https://access.redhat.com/security/cve/cve-2017-1000364. There are modules in GHC for which the liveliness analysis thinks the reserved 8KB of spill slots isn't enough. One being DynFlags and the other being Cabal. Though I think the Cabal one is likely a bug: ``` 4d6544: 81 ec 00 46 00 00 sub $0x4600,%esp 4d654a: 8d 85 94 fe ff ff lea -0x16c(%ebp),%eax 4d6550: 3b 83 1c 03 00 00 cmp 0x31c(%ebx),%eax 4d6556: 0f 82 de 8d 02 00 jb 4ff33a <_cLpg_info+0x7a> 4d655c: c7 45 fc 14 3d 50 00 movl $0x503d14,-0x4(%ebp) 4d6563: 8b 75 0c mov 0xc(%ebp),%esi 4d6566: 83 c5 fc add $0xfffffffc,%ebp 4d6569: 66 f7 c6 03 00 test $0x3,%si 4d656e: 0f 85 a6 d7 02 00 jne 503d1a <_cLpb_info+0x6> 4d6574: 81 c4 00 46 00 00 add $0x4600,%esp ``` It allocates nearly 18KB of spill slots for a simple 4 line function and doesn't even use it. Note that this doesn't happen on x64 or when making a validate build. Only when making a build without a validate and build.mk. This and the allocation in DynFlags means the stack allocation will jump over the guard page into unmapped memory areas and GHC or an end program segfaults. The pagesize on x86 Windows is 4KB which means we hit it very easily for these two modules, which explains the total DOA of GHC 32bit for the past 3 releases and the "random" segfaults on Windows. ``` 0:000> bp 00503d29 0:000> gn Breakpoint 0 hit WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40 eip=00503d29 esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 setup+0x103d29: 00503d29 89442440 mov dword ptr [esp+40h],eax ss:002b:013e973c=???????? WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. WARNING: Stack overflow detected. The unwound frames are extracted from outside normal stack bounds. 0:000> !teb TEB at 00384000 ExceptionList: 013effcc StackBase: 013f0000 StackLimit: 013eb000 ``` This doesn't fix the liveliness analysis but does fix the allocations, by emitting a function call to `__chkstk_ms` when doing allocations of larger than a page, this will make sure the stack is probed every page so the kernel maps in the next page. `__chkstk_ms` is provided by `libGCC`, which is under the `GNU runtime exclusion license`, so it's safe to link against it, even for proprietary code. (Technically we already do since we link compiled C code in.) For allocations smaller than a page we drop the stack and probe the new address. This avoids the function call and still makes sure we hit the guard if needed. PS: In case anyone is Wondering why we didn't notice this before, it's because we only test x86_64 and on Windows 10. On x86_64 the page size is 8KB and also the kernel is a bit more lenient on Windows 10 in that it seems to catch the segfault and resize the stack if it was unmapped: ``` 0:000> t eax=03b6b9c9 ebx=00dc90f0 ecx=03cac48c edx=03cac43d esi=03b6b9c9 edi=03abef40 eip=00503d2d esp=013e96fc ebp=03cf8f70 iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 setup+0x103d2d: 00503d2d 8b461b mov eax,dword ptr [esi+1Bh] ds:002b:03b6b9e4=03cac431 0:000> !teb TEB at 00384000 ExceptionList: 013effcc StackBase: 013f0000 StackLimit: 013e9000 ``` Likely Windows 10 has a guard page larger than previous versions. This fixes the stack allocations, and as soon as I get the time I will look at the liveliness analysis. I find it highly unlikely that simple Cabal function requires ~2200 spill slots. Test Plan: ./validate Reviewers: simonmar, bgamari Reviewed By: bgamari Subscribers: AndreasK, rwbarton, thomie, carter GHC Trac Issues: #15154 Differential Revision: https://phabricator.haskell.org/D4917
* Allow CmmLabelDiffOff with different widthsSimon Marlow2018-05-163-4/+5
| | | | | | | | | | | | | | | | | | Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
* Add 'addWordC#' PrimOpSebastian Graf2018-05-051-0/+7
| | | | | | | | | | | | | | | | | | | This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'. I found 'plusWord2#' while implementing this, which both lacks documentation and has a slightly different specification than 'addWordC#', which means the generic implementation is unnecessarily complex. While I was at it, I also added lacking meta-information on PrimOps and refactored 'subWordC#'s generic implementation to be branchless. Reviewers: bgamari, simonmar, jrtc27, dfeuer Reviewed By: bgamari, dfeuer Subscribers: dfeuer, thomie, carter Differential Revision: https://phabricator.haskell.org/D4592
* PPC nativeGen: Add support for MO_SS_Conv_W32_W64Peter Trommler2018-03-191-0/+8
| | | | | | | | | | | | | | | This is required by D4363. D4362 has the implementation for i386 this commit adds PowerPC. Test Plan: validate Reviewers: erikd, hvr, simonmar, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4468
* Add new mbmi and mbmi2 compiler flagsJohn Ky2018-01-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
* cmm: Use LocalBlockLabel instead of AsmTempLabel to represent blocksBen Gamari2017-11-282-5/+5
| | | | | | | | | | blockLbl was originally changed in 8b007abbeb3045900a11529d907a835080129176 to use mkTempAsmLabel to fix an inconsistency resulting in #14221. However, this breaks the C code generator, which doesn't support AsmTempLabels (#14454). Instead let's try going the other direction: use a new CLabel variety, LocalBlockLabel. Then we can teach the C code generator to deal with these as well.
* Revert "Add new mbmi and mbmi2 compiler flags"Ben Gamari2017-11-221-2/+0
| | | | | | This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
* Add new mbmi and mbmi2 compiler flagsJohn Ky2017-11-151-0/+2
| | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
* Fix PPC NCG after blockID patchPeter Trommler2017-11-093-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit rGHC8b007ab assigns the same label to the first basic block of a proc and to the proc entry point. This violates the PPC 64-bit ELF v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols. This patch fixes duplicate symbols caused by block labels In commit rGHCd7b8da1 an info table label is generated from a block id. Getting the entry label from that info label leads to an undefined symbol because a suffix "_entry" that is not present in the block label. To fix that issue add a new info table label flavour for labels derived from block ids. Converting such a label with toEntryLabel produces the original block label. Fixes #14311 Test Plan: ./validate Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14311 Differential Revision: https://phabricator.haskell.org/D4149
* PPC NCG: Impl branch prediction, atomic ops.Peter Trommler2017-11-023-61/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement AtomicRMW ops, atomic read, atomic write in PowerPC native code generator. Also implement branch prediction because we need it in atomic ops anyway. This patch improves the issue in #12537 a bit but does not fix it entirely. The fallback operations for atomicread and atomicwrite in libraries/ghc-prim/cbits/atomic.c are incorrect. This patch avoids those functions by implementing the operations directly in the native code generator. This is also what the x86/amd64 NCG and the LLVM backend do. Test Plan: validate on AIX and PowerPC (32-bit) Linux Reviewers: erikd, hvr, austin, bgamari, simonmar Reviewed By: hvr, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #12537 Differential Revision: https://phabricator.haskell.org/D3984
* Turn `compareByteArrays#` out-of-line primop into inline primopalexbiehl2017-10-291-0/+1
| | | | | | | | | | | | Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
* A bunch of typofixesGabor Greif2017-09-261-1/+1
|
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-196-0/+12
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* nativeGen: Consistently use blockLbl to generate CLabels from BlockIdsBen Gamari2017-09-192-7/+7
| | | | | | | | | | | | | | | This fixes #14221, where the NCG and the DWARF code were apparently giving two different names to the same block. Test Plan: Validate with DWARF support enabled. Reviewers: simonmar, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14221 Differential Revision: https://phabricator.haskell.org/D3977
* Fix typos in diagnostics, testsuite and commentsGabor Greif2017-09-071-1/+1
|
* Add support for producing position-independent executablesBen Gamari2017-08-221-3/+3
| | | | | | | | | | | | | | | | Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589