summaryrefslogtreecommitdiff
path: root/compiler/nativeGen
Commit message (Collapse)AuthorAgeFilesLines
* Add GHC-API logging hooksSylvain Henry2019-12-181-5/+15
| | | | | | | | | | | | | | | | | | | | | | | * Add 'dumpAction' hook to DynFlags. It allows GHC API users to catch dumped intermediate codes and information. The format of the dump (Core, Stg, raw text, etc.) is now reported allowing easier automatic handling. * Add 'traceAction' hook to DynFlags. Some dumps go through the trace mechanism (for instance unfoldings that have been considered for inlining). This is problematic because: 1) dumps aren't written into files even with -ddump-to-file on 2) dumps are written on stdout even with GHC API 3) in this specific case, dumping depends on unsafe globally stored DynFlags which is bad for GHC API users We introduce 'traceAction' hook which allows GHC API to catch those traces and to avoid using globally stored DynFlags. * Avoid dumping empty logs via dumpAction/traceAction (but still write empty files to keep the existing behavior)
* Add `timesInt2#` primopSylvain Henry2019-12-023-0/+23
|
* Fix more typosBrian Wignall2019-12-022-2/+2
|
* Fix typos, using Wikipedia list of common typosBrian Wignall2019-11-289-10/+10
|
* Fix typosBrian Wignall2019-11-231-2/+1
|
* Set correct length of DWARF .debug_aranges section (fixes #17428)Szymon Nowicki-Korgol2019-11-081-1/+2
|
* Make dynflag argument for withTiming pure.Andreas Klebinger2019-10-231-2/+2
| | | | | | | | | | | | 19 times out of 20 we already have dynflags in scope. We could just always use `return dflags`. But this is in fact not free. When looking at some STG code I noticed that we always allocate a closure for this expression in the heap. Clearly a waste in these cases. For the other cases we can either just modify the callsite to get dynflags or use the _D variants of withTiming I added which will use getDynFlags under the hood.
* Fix bug in the x86 backend involving the CFG.Andreas Klebinger2019-10-236-63/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is part two of fixing #17334. There are two parts to this commit: - A bugfix for computing loop levels - A bugfix of basic block invariants in the NCG. ----------------------------------------------------------- In the first bug we ended up with a CFG of the sort: [A -> B -> C] This was represented via maps as fromList [(A,B),(B,C)] and later transformed into a adjacency array. However the transformation did not include block C in the array (since we only looked at the keys of the map). This was still fine until we tried to look up successors for C and tried to read outside of the array bounds when accessing C. In order to prevent this in the future I refactored to code to include all nodes as keys in the map representation. And make this a invariant which is checked in a few places. Overall I expect this to make the code more robust as now any failed lookup will represent an error, versus failed lookups sometimes being expected and sometimes not. In terms of performance this makes some things cheaper (getting a list of all nodes) and others more expensive (adding a new edge). Overall this adds up to no noteable performance difference. ----------------------------------------------------------- Part 2: When the NCG generated a new basic block, it did not always insert a NEWBLOCK meta instruction in the stream which caused a quite subtle bug. During instruction selection a statement `s` in a block B with control of the sort: B -> C will sometimes result in control flow of the sort: ┌ < ┐ v ^ B -> B1 ┴ -> C as is the case for some atomic operations. Now to keep the CFG in sync when introducing B1 we clearly want to insert it between B and C. However there is a catch when we have to deal with self loops. We might start with code and a CFG of these forms: loop: stmt1 ┌ < ┐ .... v ^ stmtX loop ┘ stmtY .... goto loop: Now we introduce B1: ┌ ─ ─ ─ ─ ─┐ loop: │ ┌ < ┐ │ instrs v │ │ ^ .... loop ┴ B1 ┴ ┘ instrsFromX stmtY goto loop: This is simple, all outgoing edges from loop now simply start from B1 instead and the code generator knows which new edges it introduced for the self loop of B1. Disaster strikes if the statement Y follows the same pattern. If we apply the same rule that all outgoing edges change then we end up with: loop ─> B1 ─> B2 ┬─┐ │ │ └─<┤ │ │ └───<───┘ │ └───────<────────┘ This is problematic. The edge B1->B1 is modified as expected. However the modification is wrong! The assembly in this case looked like this: _loop: <instrs> _B1: ... cmpxchgq ... jne _B1 <instrs> <end _B1> _B2: ... cmpxchgq ... jne _B2 <instrs> jmp loop There is no edge _B2 -> _B1 here. It's still a self loop onto _B1. The problem here is that really B1 should be two basic blocks. Otherwise we have control flow in the *middle* of a basic block. A contradiction! So to account for this we add yet another basic block marker: _B: <instrs> _B1: ... cmpxchgq ... jne _B1 jmp _B1' _B1': <instrs> <end _B1> _B2: ... Now when inserting B2 we will only look at the outgoing edges of B1' and everything will work out nicely. You might also wonder why we don't insert jumps at the end of _B1'. There is no way another block ends up jumping to the labels _B1 or _B2 since they are essentially invisible to other blocks. View them as control flow labels local to the basic block if you'd like. Not doing this ultimately caused (part 2 of) #17334.
* Implement s390x LLVM backend.Stefan Schulze Frielinghaus2019-10-225-0/+11
| | | | | | This patch adds support for the s390x architecture for the LLVM code generator. The patch includes a register mapping of STG registers onto s390x machine registers which enables a registerised build.
* Add loop level analysis to the NCG backend.klebinger.andreas@gmx.at2019-10-165-364/+1131
| | | | | | | | | | | | | | | | | | | | | | | | | | | For backends maintaining the CFG during codegen we can now find loops and their nesting level. This is based on the Cmm CFG and dominator analysis. As a result we can estimate edge frequencies a lot better for methods, resulting in far better code layout. Speedup on nofib: ~1.5% Increase in compile times: ~1.9% To make this feasible this commit adds: * Dominator analysis based on the Lengauer-Tarjan Algorithm. * An algorithm estimating global edge frequences from branch probabilities - In CFG.hs A few static branch prediction heuristics: * Expect to take the backedge in loops. * Expect to take the branch NOT exiting a loop. * Expect integer vs constant comparisons to be false. We also treat heap/stack checks special for branch prediction to avoid them being treated as loops.
* Fix #17334 where NCG did not properly update the CFG.wip/andreask/17334Andreas Klebinger2019-10-135-242/+353
| | | | | | | | | | | | | Statements can change the basic block in which instructions are placed during instruction selection. We have to keep track of this switch of the current basic block as we need this information in order to properly update the CFG. This commit implements this change and fixes #17334. We do so by having stmtToInstr return the new block id if a statement changed the basic block.
* Clean up `#include`s in the compilerJohn Ericson2019-10-053-3/+0
| | | | | | | | - Remove unneeded ones - Use <..> for inter-package. Besides general clean up, helps distinguish between the RTS we link against vs the RTS we compile for.
* Factor out a smaller part of Platform for host fallbackJohn Ericson2019-10-041-4/+4
|
* PmCheck: Only ever check constantly many models against a single patternSebastian Graf2019-09-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Introduces a new flag `-fmax-pmcheck-deltas` to achieve that. Deprecates the old `-fmax-pmcheck-iter` mechanism in favor of this new flag. From the user's guide: Pattern match checking can be exponential in some cases. This limit makes sure we scale polynomially in the number of patterns, by forgetting refined information gained from a partially successful match. For example, when matching `x` against `Just 4`, we split each incoming matching model into two sub-models: One where `x` is not `Nothing` and one where `x` is `Just y` but `y` is not `4`. When the number of incoming models exceeds the limit, we continue checking the next clause with the original, unrefined model. This also retires the incredibly hard to understand "maximum number of refinements" mechanism, because the current mechanism is more general and should catch the same exponential cases like PrelRules at the same time. ------------------------- Metric Decrease: T11822 -------------------------
* ErrUtils: split withTiming into withTiming and withTimingSilentAlp Mestanogullari2019-09-191-3/+4
| | | | | | | | | | | | | | | | | | | 'withTiming' becomes a function that, when passed '-vN' (N >= 2) or '-ddump-timings', will print timing (and possibly allocations) related information. When additionally built with '-eventlog' and executed with '+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent' events to the eventlog. 'withTimingSilent' on the other hand will never print any timing information, under any circumstance, and will only emit 'traceEvent' events to the eventlog. As pointed out in !1672, 'traceMarker' is better suited for things that we might want to visualize in tools like eventlog2html, while 'traceEvent' is better suited for internal events that occur a lot more often and that we don't necessarily want to visualize. This addresses #17138 by using 'withTimingSilent' for all the codegen bits that are expressed as a bunch of small computations over streams of codegen ASTs.
* Remove empty NCG.hJohn Ericson2019-09-1315-27/+0
|
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-1012-12/+12
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Make the C-- O and C types constructors with DataKindsJohn Ericson2019-09-051-2/+5
| | | | | The tightens up the kinds a bit. I use type synnonyms to avoid adding promotion ticks everywhere.
* Few tweaks in -ddump-debug output, minor refactoringÖmer Sinan Ağacan2019-09-021-8/+6
| | | | | | | - Fixes crazy indentation in -ddump-debug output - We no longer dump empty sections in -ddump-debug when a code block does not have any generated debug info - Minor refactoring in Debug.hs and AsmCodeGen.hs
* Return results of Cmm streams in backendsÖmer Sinan Ağacan2019-08-281-12/+14
| | | | | | | | | | | | | | | | | | | This generalizes code generators (outputAsm, outputLlvm, outputC, and the call site codeOutput) so that they'll return the return values of the passed Cmm streams. This allows accumulating data during Cmm generation and returning it to the call site in HscMain. Previously the Cmm streams were assumed to return (), so the code generators returned () as well. This change is required by !1304 and !1530. Skipping CI as this was tested before and I only updated the commit message. [skip ci]
* Remove redundant OPTIONS_GHC in BlockLayout.hsAndreas Klebinger2019-08-271-3/+0
|
* Remove Bag fold specialisations (#16969)Richard Lupton2019-08-191-2/+2
|
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-1512-15/+6
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* Introduce a type for "platform word size", use it instead of IntÖmer Sinan Ağacan2019-08-063-14/+11
| | | | | | | | We introduce a PlatformWordSize type and use it in platformWordSize field. This removes to panic/error calls called when platform word size is not 32 or 64. We now check for this when reading the platform config.
* compiler: emit finer grained codegen events to eventlogAlp Mestanogullari2019-08-021-20/+25
|
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-1616-799/+97
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-0316-97/+799
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-283-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-1933-33/+33
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Use DeriveFunctor throughout the codebase (#15654)Krzysztof Gogolewski2019-06-123-14/+11
|
* Introduce log1p and expm1 primopschessai2019-06-093-0/+12
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* powerpc32: fix stack allocation code generationSergei Trofimovich2019-05-311-1/+1
| | | | | | | | | | | | | | | | When ghc was built for powerpc32 built failed as: It's a fallout of commit 3f46cffcc2850e68405a1 ("PPC NCG: Refactor stack allocation code") where word size used to be II32/II64 and changed to II8/panic "no width for given number of bytes" widthFromBytes ((platformWordSize platform) `quot` 8) The change restores initial behaviour by removing extra division. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* powerpc32: fix 64-bit comparison (#16465)Sergei Trofimovich2019-05-311-0/+1
| | | | | | | | | | | | | | | | | | On powerpc32 64-bit comparison code generated dangling target labels. This caused ghc build failure as: $ ./configure --target=powerpc-unknown-linux-gnu && make ... SCCs aren't in reverse dependent order bad blockId n3U This happened because condIntCode' in PPC codegen generated label name but did not place the label into `cmp_lo` code block. The change adds the `cmp_lo` label into the case of negative comparison. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Use datatype for unboxed returns when loading ghc into ghciMichael Sloan2019-05-222-37/+69
| | | | See #13101 and #15454
* Remove all target-specific portions of Config.hsJohn Ericson2019-05-142-25/+25
| | | | | | | | | | | | | | | | | | | 1. If GHC is to be multi-target, these cannot be baked in at compile time. 2. Compile-time flags have a higher maintenance than run-time flags. 3. The old way makes build system implementation (various bootstrapping details) with the thing being built. E.g. GHC doesn't need to care about which integer library *will* be used---this is purely a crutch so the build system doesn't need to pass flags later when using that library. 4. Experience with cross compilation in Nixpkgs has shown things work nicer when compiler's can *optionally* delegate the bootstrapping the package manager. The package manager knows the entire end-goal build plan, and thus can make top-down decisions on bootstrapping. GHC can just worry about GHC, not even core library like base and ghc-prim!
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-153-1/+33
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* codegen: unroll memcpy calls for small bytearraysArtem Pyanykh2019-04-141-5/+6
|
* removing x87 register support from native code genCarter Schonwald2019-04-1017-894/+235
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* codegen: use newtype for Alignment in BasicTypesArtem Pyanykh2019-04-092-23/+22
|
* codegen: fix memset unroll for small bytearrays, add 64-bit setsArtem Pyanykh2019-04-091-25/+55
| | | | | | | | | | | | | | | | | | | | | | Fixes #16052 When the offset in `setByteArray#` is statically known, we can provide better alignment guarantees then just 1 byte. Also, memset can now do 64-bit wide sets. The current memset intrinsic is not optimal however and can be improved for the case when we know that we deal with (baseAddress at known alignment) + offset For instance, on 64-bit `setByteArray# s 1# 23# 0#` given that bytearray is 8 bytes aligned could be unrolled into `movb, movw, movl, movq, movq`; but currently it is `movb x23` since alignment of 1 is all we can embed into MO_Memset op.
* Add support for bitreverse primopAlexandre2019-04-014-1/+17
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* PPC NCG: Use liveness information in CmmCallPeter Trommler2019-03-154-42/+49
| | | | | | | | | | | | | | | | | We make liveness information for global registers available on `JMP` and `BCTR`, which were the last instructions missing. With complete liveness information we do not need to reserve global registers in `freeReg` anymore. Moreover we assign R9 and R10 to callee saves registers. Cleanup by removing `Reg_Su`, which was unused, from `freeReg` and removing unused register definitions. The calculation of the number of floating point registers is too conservative. Just follow X86 and specify the constants directly. Overall on PowerPC this results in 0.3 % smaller code size in nofib while runtime is slightly better in some tests.
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-153-5/+5
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* NCG: correctly escape path strings on Windows (#16389)Sylvain Henry2019-03-092-2/+4
| | | | | GHC native code generator generates .incbin and .file directives. We need to escape those strings correctly on Windows (see #16389).
* Rip out object splittingBen Gamari2019-03-057-65/+27
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Don't wrap the entry map for LiveInfo in Maybe.klebinger.andreas@gmx.at2019-02-156-26/+27
| | | | | | | | | | | | | It never really encoded a invariant. * The linear register allocator just did partial pattern matches * The graph allocator just set it to (Just mapEmpty) for Nothing So I changed LiveInfo to directly contain the map. Further natCmmTopToLive which filled in Nothing is no longer exported. Instead we know call cmmTopLiveness which changes the type AND fills in the map.
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-144-12/+51
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* Fix Int overflow on 32 bit platformPeter Trommler2019-02-101-1/+1
|
* Stack: fix name mangling.Tamar Christina2019-02-091-1/+1
|