summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/AsmCodeGen.hs
Commit message (Collapse)AuthorAgeFilesLines
* Don't wrap the entry map for LiveInfo in Maybe.klebinger.andreas@gmx.at2019-02-151-3/+1
| | | | | | | | | | | | | It never really encoded a invariant. * The linear register allocator just did partial pattern matches * The graph allocator just set it to (Just mapEmpty) for Nothing So I changed LiveInfo to directly contain the map. Further natCmmTopToLive which filled in Nothing is no longer exported. Instead we know call cmmTopLiveness which changes the type AND fills in the map.
* Allow resizing the stack for the graph allocator.klebinger.andreas@gmx.at2019-02-081-3/+15
| | | | | | | | | | The graph allocator now dynamically resizes the number of stack slots when running into the limit. This fixes #8657. Also loop membership of basic blocks is now available in the register allocator for cost heuristics.
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-177/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* Replace most occurences of foldl with foldl'.klebinger.andreas@gmx.at2018-08-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds foldl' to GhcPrelude and changes must occurences of foldl to foldl'. This leads to better performance especially for quick builds where GHC does not perform strictness analysis. It does change strictness behaviour when we use foldl' to turn a argument list into function applications. But this is only a drawback if code looks ONLY at the last argument but not at the first. And as the benchmarks show leads to fewer allocations in practice at O2. Compiler performance for Nofib: O2 Allocations: -1 s.d. ----- -0.0% +1 s.d. ----- -0.0% Average ----- -0.0% O2 Compile Time: -1 s.d. ----- -2.8% +1 s.d. ----- +1.3% Average ----- -0.8% O0 Allocations: -1 s.d. ----- -0.2% +1 s.d. ----- -0.1% Average ----- -0.2% Test Plan: ci Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal Reviewed By: bgamari, monoidal Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4929
* Make shortcutting at the asm stage toggleable and default for O2.Andreas Klebinger2018-04-131-2/+4
| | | | | | | | | | | | | | | | | | | Shortcutting during the asm stage of codegen is often redundant as most cases get caught during the Cmm passes. For example during compilation of all of nofib only 508 jumps are eleminated. For this reason I moved the pass from -O1 to -O2. I also made it toggleable with -fasm-shortcutting. Test Plan: ci Reviewers: bgamari Reviewed By: bgamari Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4555
* nativeGen: Use foldl' instead of foldlBen Gamari2017-11-281-1/+1
|
* nativeGen: Use plusUFMList instead of foldrBen Gamari2017-11-281-2/+2
|
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Add support for producing position-independent executablesBen Gamari2017-08-221-3/+3
| | | | | | | | | | | | | | | | Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
* Use correct section types syntax for architectureBen Gamari2017-07-111-1/+2
| | | | | | | | | | | | | | | | | | Previously GHC would always assume that section types began with `@` while producing assembly, which is not true. For instance, in ARM assembly syntax section types begin with `%`. This abstracts out section type pretty-printing and adjusts it to correctly account for the target architectures assembly flavor. Reviewers: austin, hvr, Phyx Reviewed By: Phyx Subscribers: Phyx, rwbarton, thomie, erikd GHC Trac Issues: #13937 Differential Revision: https://phabricator.haskell.org/D3712
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* Replace Digraph's Node type synonym with a data typeMatthew Pickering2017-04-041-8/+6
| | | | | | | | | | | | | This refactoring makes it more obvious when we are constructing a Node for the digraph rather than a less useful 3-tuple. Reviewers: austin, goldfire, bgamari, simonmar, dfeuer Reviewed By: dfeuer Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3414
* Generalize CmmUnwind and pass unwind information through NCGBen Gamari2017-02-081-9/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack unwinding information produced by GHC is currently quite approximate. Essentially we assume that register values do not change at all within a basic block. While this is somewhat true in normal Haskell code, blocks containing foreign calls often break this assumption. This results in unreliable call stacks, especially in the code containing foreign calls. This is worse than it sounds as unreliable unwinding information can at times result in segmentation faults. This patch set attempts to improve this situation by tracking unwinding information with finer granularity. By dispensing with the assumption of one unwinding table per block, we allow the compiler to accurately represent the areas surrounding foreign calls. Towards this end we generalize the representation of unwind information in the backend in three ways, * Multiple CmmUnwind nodes can occur per block * CmmUnwind nodes can now carry unwind information for multiple registers (while not strictly necessary; this makes emitting unwinding information a bit more convenient in the compiler) * The NCG backend is given an opportunity to modify the unwinding records since it may need to make adjustments due to, for instance, native calling convention requirements for foreign calls (see #11353). This sets the stage for resolving #11337 and #11338. Test Plan: Validate Reviewers: scpmw, simonmar, austin, erikd Subscribers: qnikst, thomie Differential Revision: https://phabricator.haskell.org/D2741
* Fix terminal corruption bug and clean up SDoc interface.Phil Ruffwind2017-01-101-4/+3
| | | | | | | | | | | | | | | | | | | | | | | - Fix #13076 by wrapping `printDoc_` so that the terminal color is reset even if an exception occurs. - Add `printSDoc`, `printSDocLn`, and `bufLeftRenderSDoc` to keep `SDoc` values abstract (they are wrappers of `printDoc_`, `printDoc`, and `bufLeftRender` respectively). - Remove unused function: `printForAsm` Test Plan: manual Reviewers: RyanGlScott, austin, dfeuer, bgamari Reviewed By: dfeuer, bgamari Subscribers: dfeuer, mpickering, thomie Differential Revision: https://phabricator.haskell.org/D2932 GHC Trac Issues: #13076
* BlockId: remove BlockMap and BlockSet synonymsMichal Terepeta2016-12-081-5/+5
| | | | | | | | | | | | | | | | | | | | This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
* Remove most functions from cmm/BlockIdMichal Terepeta2016-11-291-1/+2
| | | | | | | | | | | | | | | | | | | | It seems that `BlockId` module could simply go away in favor of Hoopl's `Label`. This is the first step to do that. In a few places I had to add some type signatures, but most of them seem to help with code readability. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2765
* AsmCodeGen: Refactor worker in cmmNativeGensBen Gamari2016-11-291-10/+13
| | | | | | | | | | | | Test Plan: Validate Reviewers: austin, simonmar, michalt Reviewed By: simonmar, michalt Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2736
* Test for unnecessary register spillsThomas Jakway2016-11-151-1/+12
| | | | | | | | | | | | Reviewers: mainland, simonmar, michalt, bgamari, austin Reviewed By: bgamari Subscribers: simonpj, mpickering, thomie Differential Revision: https://phabricator.haskell.org/D2638 GHC Trac Issues: #12744, #12745
* AsmCodeGen: Give linear-scan and coloring reg. allocators different cc namesÖmer Sinan Ağacan2016-08-061-2/+2
|
* Document some codegen nondeterminismBartosz Nitka2016-07-071-1/+3
| | | | | | | | | Bit-for-bit reproducible binaries are not a goal for now, so this is just marking places that could be a problem. Doing this will allow eltsUFM to be removed and will leave only nonDetEltsUFM. GHC Trac: #4012
* nativeGen: Allow -fregs-graph to be usedBen Gamari2016-06-301-4/+2
| | | | | | | | | | | | | | | | Previously the flag was silently ignored due the #7679 and #8657. This, however, seems unnecessarily brutal and makes experimentation unduly difficult for users. Test Plan: Validate Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2335 GHC Trac Issues: #7679, #8657
* Provide Uniquable version of SCCBartosz Nitka2016-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We want to remove the `Ord Unique` instance because there's no way to implement it in deterministic way and it's too easy to use by accident. We sometimes compute SCC for datatypes whose Ord instance is implemented in terms of Unique. The Ord constraint on SCC is just an artifact of some internal data structures. We can have an alternative implementation with a data structure that uses Uniquable instead. This does exactly that and I'm pleased that I didn't have to introduce any duplication to do that. Test Plan: ./validate I looked at performance tests and it's a tiny bit better. Reviewers: bgamari, simonmar, ezyang, austin, goldfire Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2359 GHC Trac Issues: #4012
* Replace calls to `ptext . sLit` with `text`Jan Stolarek2016-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | Summary: In the past the canonical way for constructing an SDoc string literal was the composition `ptext . sLit`. But for some time now we have function `text` that does the same. Plus it has some rules that optimize its runtime behaviour. This patch takes all uses of `ptext . sLit` in the compiler and replaces them with calls to `text`. The main benefits of this patch are clener (shorter) code and less dependencies between module, because many modules now do not need to import `FastString`. I don't expect any performance benefits - we mostly use SDocs to report errors and it seems there is little to be gained here. Test Plan: ./validate Reviewers: bgamari, austin, goldfire, hvr, alanz Subscribers: goldfire, thomie, mpickering Differential Revision: https://phabricator.haskell.org/D1784
* Remove some redundant definitions/constraintsHerbert Valerio Riedel2015-12-311-1/+0
| | | | | | Starting with GHC 7.10 and base-4.8, `Monad` implies `Applicative`, which allows to simplify some definitions to exploit the superclass relationship. This a first refactoring to that end.
* Drop pre-AMP compatibility CPP conditionalsHerbert Valerio Riedel2015-12-311-3/+0
| | | | | | | | | | | | Since GHC 8.1/8.2 only needs to be bootstrap-able by GHC 7.10 and GHC 8.0 (and GHC 8.2), we can now finally drop all that pre-AMP compatibility CPP-mess for good! Reviewers: austin, goldfire, bgamari Subscribers: goldfire, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1724
* Add sparc64 a known architecture (Ticket #11211)John Paul Adrian Glaubitz2015-12-191-0/+1
| | | | | | | | | | | | Explicitly pass "--no-relax" on ArchSPARC64 (as ArchSPARC does) where gcc's default specs set "-mrelax" which conflicts with "-Wl,-r". Known architecture will also help extending sparc NCG support 64-bit ABI. Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* Support multiple debug output levelsBen Gamari2015-11-231-3/+3
| | | | | | | | | We now only strip block information from DebugBlocks when compiling with `-g1`, intended to be used when only minimal debug information is desired. `-g2` is assumed when `-g` is passed without any integer argument. Differential Revision: https://phabricator.haskell.org/D1281
* Implement function-sections for Haskell code, #8405Simon Brenner2015-11-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag -split-sections that does similar things to -split-objs, but using sections in single object files instead of relying on the Satanic Splitter and other abominations. This is very similar to the GCC flags -ffunction-sections and -fdata-sections. The --gc-sections linker flag, which allows unused sections to actually be removed, is added to all link commands (if the linker supports it) so that space savings from having base compiled with sections can be realized. Supported both in LLVM and the native code-gen, in theory for all architectures, but really tested on x86 only. In the GHC build, a new SplitSections variable enables -split-sections for relevant parts of the build. Test Plan: validate with both settings of SplitSections Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari Reviewed By: simonmar, thomie, bgamari Subscribers: hsyl20, erikd, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D1242 GHC Trac Issues: #8405
* Make Monad/Applicative instances MRP-friendlyHerbert Valerio Riedel2015-10-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors pure/(*>) and return/(>>) in MRP-friendly way, i.e. such that the explicit definitions for `return` and `(>>)` match the MRP-style default-implementation, i.e. return = pure and (>>) = (*>) This way, e.g. all `return = pure` definitions can easily be grepped and removed in GHC 8.1; Test Plan: Harbormaster Reviewers: goldfire, alanz, bgamari, quchen, austin Reviewed By: quchen, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1312
* Rename package key to unit ID, and installed package ID to component ID.Edward Z. Yang2015-10-141-3/+3
| | | | | | Comes with Haddock submodule update. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
* AsmCodeGen: Ensure LLVM .line directives are sortedBen Gamari2015-10-071-2/+6
| | | | | | | | Apparently some Clang 3.6 expects these to be sorted. Patch due to Peter Wortmann. Fixes #10687.
* Annotate CmmBranch with an optional likely targetSimon Marlow2015-09-231-2/+2
| | | | | | | | | | | | | | | | | Summary: This allows the code generator to give hints to later code generation steps about which branch is most likely to be taken. Right now it is only taken into account in one place: a special case in CmmContFlowOpt that swapped branches over to maximise the chance of fallthrough, which is now disabled when there is a likelihood setting. Test Plan: validate Reviewers: austin, simonpj, bgamari, ezyang, tibbe Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1273
* Implement PowerPC 64-bit native code backend for LinuxPeter Trommler2015-07-031-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the PowerPC 32-bit native code generator for "64-bit PowerPC ELF Application Binary Interface Supplement 1.9" by Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification -- OpenPOWER ABI for Linux Supplement" by IBM. The latter ABI is mainly used on POWER7/7+ and POWER8 Linux systems running in little-endian mode. The code generator supports both static and dynamic linking. PowerPC 64-bit code for ELF ABI 1.9 and 2 is mostly position independent anyway, and thus so is all the code emitted by the code generator. In other words, -fPIC does not make a difference. rts/stg/SMP.h support is implemented. Following the spirit of the introductory comment in PPC/CodeGen.hs, the rest of the code is a straightforward extension of the 32-bit implementation. Limitations: * Code is generated only in the medium code model, which is also gcc's default * Local symbols are not accessed directly, which seems to also be the case for 32-bit * LLVM does not work, but this does not work on 32-bit either * Must use the system runtime linker in GHCi, because the GHC linker for "static" object files (rts/Linker.c) for PPC 64-bit is not implemented. The system runtime (dynamic) linker works. * The handling of the system stack (register 1) is not ELF- compliant so stack traces break. Instead of allocating a new stack frame, spill code should use the "official" spill area in the current stack frame and deallocation code should restore the back chain * DWARF support is missing Fixes #9863 Test Plan: validate (on powerpc, too) Reviewers: simonmar, trofi, erikd, austin Reviewed By: trofi Subscribers: bgamari, arnons1, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D629 GHC Trac Issues: #9863
* Greatly speed up nativeCodeGen/seqBlocksJoachim Breitner2015-05-161-18/+35
| | | | | | | | | When working on #10397, I noticed that "reorder" in nativeCodeGen/seqBlocks took more than 60% of the time. With this refactoring, it does not even show up in the profile any more. This fixes #10422. Differential Revision: https://phabricator.haskell.org/D893
* Generate DWARF info sectionPeter Wortmann2014-12-161-30/+43
| | | | | | | | | | | | | | | | This is where we actually make GHC emit DWARF code. The info section contains all the general meta information bits as well as an entry for every block of native code. Notes: * We need quite a few new labels in order to properly address starts and ends of blocks. * Thanks to Nathan Howell for taking the iniative to get our own Haskell language ID for DWARF! (From Phabricator D396)
* Generate .loc/.file directives from source ticksPeter Wortmann2014-12-161-51/+78
| | | | | | | | | | | | | | | | | | | | | | | | This generates DWARF, albeit indirectly using the assembler. This is the easiest (and, apparently, quite standard) method of generating the .debug_line DWARF section. Notes: * Note we have to make sure that .file directives appear correctly before the respective .loc. Right now we ppr them manually, which makes them absent from dumps. Fixing this would require .file to become a native instruction. * We have to pass a lot of things around the native code generator. I know Ian did quite a bit of refactoring already, but having one common monad could *really* simplify things here... * To support SplitObjcs, we need to emit/reset all DWARF data at every split. We use the occassion to move split marker generation to cmmNativeGenStream as well, so debug data extraction doesn't have to choke on it. (From Phabricator D396)
* Debug data extraction (NCG support)Peter Wortmann2014-12-161-50/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the Debug module is to collect all required information to generate debug information (DWARF etc.) in the back-ends. Our main data structure is the "debug block", which carries all information we have about a block of code that is going to get produced. Notes: * Debug blocks are arranged into a tree according to tick scopes. This makes it easier to reason about inheritance rules. Note however that tick scopes are not guaranteed to form a tree, which requires us to "copy" ticks to not lose them. * This is also where we decide what source location we regard as representing a code block the "best". The heuristic is basically that we want the most specific source reference that comes from the same file we are currently compiling. This seems to be the most useful choice in my experience. * We are careful to not be too lazy so we don't end up breaking streaming. Debug data will be kept alive until the end of codegen, after all. * We change native assembler dumps to happen right away for every Cmm group. This simplifies the code somewhat and is consistent with how pretty much all of GHC handles dumps with respect to streamed code. (From Phabricator D169)
* Unlit AsmCodeGen.lhsHerbert Valerio Riedel2014-11-301-0/+1040
Fwiw, this wasn't really a proper .lhs to begin with...