summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/X86/Regs.hs
Commit message (Collapse)AuthorAgeFilesLines
* Modules: CmmToAsm (#13009)Sylvain Henry2020-02-241-442/+0
|
* Modules: Driver (#13009)Sylvain Henry2020-02-211-1/+1
| | | | submodule updates: nofib, haddock
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-2/+2
|
* Remove empty NCG.hJohn Ericson2019-09-131-1/+0
|
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-101-1/+1
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-161-1/+0
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-031-0/+1
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-191-1/+1
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* removing x87 register support from native code genCarter Schonwald2019-04-101-58/+45
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* Fix regDotColor for amd64.klebinger.andreas@gmx.at2019-01-271-0/+2
| | | | | Add missing color mappings to regDotColor for amd64. Also set fakeRegs to red instead of xmm regs.
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* Allow CmmLabelDiffOff with different widthsSimon Marlow2018-05-161-1/+1
| | | | | | | | | | | | | | | | | | Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
* Mark xmm6 as caller saved in the register allocator for windows.klebinger.andreas@gmx.at2018-01-311-2/+4
| | | | | | | | | | | | | | | | | | This prevents the register being picked up as a scratch register. Otherwise the allocator would be free to use it before a call. This fixes #14619. Test Plan: ci, repro case on #14619 Reviewers: bgamari, Phyx, erikd, simonmar, RyanGlScott, simonpj Reviewed By: Phyx, RyanGlScott, simonpj Subscribers: simonpj, RyanGlScott, Phyx, rwbarton, thomie, carter GHC Trac Issues: #14619 Differential Revision: https://phabricator.haskell.org/D4348
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Fix typos in diagnostics, testsuite and commentsGabor Greif2017-09-071-1/+1
|
* nativeGen: Don't index into linked listsBen Gamari2017-08-291-4/+6
| | | | | | | | | | | There were a couple places where we indexed into linked lists of register names. Replace these with arrays. Reviewers: austin Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3893
* Refactor: delete most of the module FastTypesThomas Miedema2015-08-211-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverses some of the work done in #1405, and goes back to the assumption that the bootstrap compiler understands GHC-haskell. In particular: * use MagicHash instead of _ILIT and _CLIT * pattern matching on I# if possible, instead of using iUnbox unnecessarily * use Int#/Char#/Addr# instead of the following type synonyms: - type FastInt = Int# - type FastChar = Char# - type FastPtr a = Addr# * inline the following functions: - iBox = I# - cBox = C# - fastChr = chr# - fastOrd = ord# - eqFastChar = eqChar# - shiftLFastInt = uncheckedIShiftL# - shiftR_FastInt = uncheckedIShiftRL# - shiftRLFastInt = uncheckedIShiftRL# * delete the following unused functions: - minFastInt - maxFastInt - uncheckedIShiftRA# - castFastPtr - panicDocFastInt and pprPanicFastInt * rename panicFastInt back to panic# These functions remain, since they actually do something: * iUnbox * bitAndFastInt * bitOrFastInt Test Plan: validate Reviewers: austin, bgamari Subscribers: rwbarton Differential Revision: https://phabricator.haskell.org/D1141 GHC Trac Issues: #1405
* Delete FastBoolThomas Miedema2015-08-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverses some of the work done in Trac #1405, and assumes GHC is smart enough to do its own unboxing of booleans now. I would like to do some more performance measurements, but the code changes can be reviewed already. Test Plan: With a perf build: ./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp +RTS -t --machine-readable before: ``` [("bytes allocated", "1300744864") ,("num_GCs", "302") ,("average_bytes_used", "8811118") ,("max_bytes_used", "24477464") ,("num_byte_usage_samples", "9") ,("peak_megabytes_allocated", "64") ,("init_cpu_seconds", "0.001") ,("init_wall_seconds", "0.001") ,("mutator_cpu_seconds", "2.833") ,("mutator_wall_seconds", "4.283") ,("GC_cpu_seconds", "0.960") ,("GC_wall_seconds", "0.961") ] ``` after: ``` [("bytes allocated", "1301088064") ,("num_GCs", "310") ,("average_bytes_used", "8820253") ,("max_bytes_used", "24539904") ,("num_byte_usage_samples", "9") ,("peak_megabytes_allocated", "64") ,("init_cpu_seconds", "0.001") ,("init_wall_seconds", "0.001") ,("mutator_cpu_seconds", "2.876") ,("mutator_wall_seconds", "4.474") ,("GC_cpu_seconds", "0.965") ,("GC_wall_seconds", "0.979") ] ``` CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time measurements are rather noisy. Reviewers: austin, bgamari, rwbarton Subscribers: nomeata Differential Revision: https://phabricator.haskell.org/D1143 GHC Trac Issues: #1405
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-0/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Use the correct callClobberedRegs on Windows/x64 (#8834)Simon Marlow2014-03-271-0/+3
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Remove dead codeJan Stolarek2013-09-101-7/+0
|
* Remove OldCmm, convert backends to consume new CmmSimon Marlow2012-11-121-1/+1
| | | | | | | | | | | | | | | | | | This removes the OldCmm data type and the CmmCvt pass that converts new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been converted to consume new Cmm. The main difference between the two data types is that conditional branches in new Cmm have both true/false successors, whereas in OldCmm the false case was a fallthrough. To generate slightly better code we occasionally need to invert a conditional to ensure that the branch-not-taken becomes a fallthrough; this was previously done in CmmCvt, and it is now done in CmmContFlowOpt. We could go further and use the Hoopl Block representation for native code, which would mean that we could use Hoopl's postorderDfs and analyses for native code, but for now I've left it as is, using the old ListGraph representation for native code.
* Teach the linear register allocator how to allocate more stack if necessarySimon Marlow2012-09-201-3/+3
| | | | | | | | | This squashes the "out of spill slots" panic that occasionally happens on x86, by adding instructions to bump and retreat the C stack pointer as necessary. The panic has become more common since the new codegen, because we lump code into larger blocks, and the register allocator isn't very good at reusing stack slots for spilling (see Note [extra spill slots]).
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-6/+5
|
* Move more constants into platformConstantsIan Lynagh2012-09-141-2/+4
|
* Move more code into codeGen/CodeGen/Platform.hsIan Lynagh2012-08-281-210/+4
| | | | | | | | HaskellMachRegs.h is no longer included in anything under compiler/ Also, includes/CodeGen.Platform.hs now includes "stg/MachRegs.h" rather than <stg/MachRegs.h> which means that we always get the file from the tree, rather than from the bootstrapping compiler.
* Fix -fPIC with the new code generatorSimon Marlow2012-08-281-2/+0
| | | | The CmmBlocks inside CmmExprs were not getting the PIC treatment
* More CPP removal in nativeGen/X86/Regs.hsIan Lynagh2012-08-221-10/+8
|
* More CPP removal in nativeGen/X86/Regs.hsIan Lynagh2012-08-221-15/+10
|
* Remove some CPP in nativeGen/X86/Regs.hsIan Lynagh2012-08-221-24/+18
|
* Pass platform down to lastintIan Lynagh2012-08-211-15/+15
|
* Pass platform down to lastxmmIan Lynagh2012-08-211-19/+23
|
* Reduce the likelihood of x64/x86-64 changes breaking the build on other ↵Erik de Castro Lopo2012-08-211-36/+20
| | | | | | | | | | | | | | arches (#7083). Code that needs to differentiate between i386 and x86-64 should now be written as if x86-64 is the default and i386 is the special case. Eg: # if i386_TARGET_ARCH someFuncion = ..... # else someFuncion = ..... # endif
* Start separating out the RTS and Haskell imports of MachRegs.hIan Lynagh2012-08-061-1/+1
| | | | No functional differences yet
* Fix build for non x86/x86_64 (#7065)Paolo Capriotti2012-07-111-1/+3
|
* Fix overlapping patterns warning (#7065)Paolo Capriotti2012-07-111-8/+10
|
* Don't re-allocate %esi on x86.Simon Marlow2012-07-091-0/+17
| | | | | | | | | | | | | Recent changes have freed up %esi for general use on x86 when it is not being used for R1. However, x86 has a non-uniform register architecture where there is no 8-bit equivalent of %esi. The register allocators aren't sophisticated enough to cope with this, so we have to back off and treat %esi as non-allocatable for now. (of course, LLVM doesn't suffer from this problem) One workaround would be to change the calling convention to use %rbx for R1, however we can't change the calling convention now without patching LLVM too.
* Fix compile failure on non x86/x86-64 (#7054).Erik de Castro Lopo2012-07-091-0/+5
|
* Allow the register allocator access to argument regs (R1.., F1.., etc.)Simon Marlow2012-07-061-52/+21
| | | | | | | | | | | | | | | This was made possible by the recent change to codeGen to attach the live GlobalRegs to every CmmJump, and we'll be relying on it quite heavily in the new code generator too. What this means essentially is that when we see x = R1 the register allocator will automatically assign x to R1 and generate no code at all (also known as "coalescing"). It wasn't possible before because the register allocator had to assume that R1 was always live, because it didn't have access to accurate liveness information.
* Use SDoc rather than Doc in the native gensIan Lynagh2012-06-121-3/+2
| | | | This avoid lots of converting back and forth between the two types.
* Fix compile for CPUs other than x86/x86_64.Erik de Castro Lopo2012-03-221-2/+4
|
* Fixes for the calling convention on Win64Ian Lynagh2012-03-211-12/+20
| | | | In particular, fixes for FP arguments
* Rename allArgRegs to allIntArgRegsIan Lynagh2012-03-211-7/+7
|
* Define allArgRegs correctly for Win64Ian Lynagh2012-03-191-0/+4
|
* Allow the use of R9 and R10 in primops; fixes trac #5423Ian Lynagh2011-11-061-0/+6
|
* Make an import unconditionalIan Lynagh2011-09-061-4/+0
| | | | Fixes build on non-Intel/PPC platforms
* Add missing import (fixes #5460).Erik de Castro Lopo2011-09-061-1/+1
|
* Some CPP removal in X86.RegsIan Lynagh2011-08-311-12/+8
|
* Whitespace only in X86.RegsIan Lynagh2011-08-311-202/+202
|
* Start de-CPPing X86.RegsIan Lynagh2011-08-301-12/+8
|