summaryrefslogtreecommitdiff
path: root/compiler/GHC/CmmToAsm/X86
Commit message (Collapse)AuthorAgeFilesLines
* rts: Ensure that XMM registers are preserved on Win64Ben Gamari2022-05-051-3/+3
| | | | | | | | Previously we only preserved the bottom 64-bits of the callee-saved 128-bit XMM registers, in violation of the Win64 calling convention. Fix this. Fixes #21465.
* nativeGen: Note signed-extended nature of MOVwip/windows-high-codegenBen Gamari2022-04-061-0/+4
|
* Don't assume that labels are 32-bit on WindowsBen Gamari2022-04-061-10/+17
|
* Refactor is32BitLit to take Platform rather than BoolBen Gamari2022-04-061-37/+33
|
* Generate LEA for label expressionsBen Gamari2022-04-061-0/+16
|
* nativeGen/x86: Use %rip-relative addressingBen Gamari2022-04-061-8/+49
| | | | | | | On Windows with high-entropy ASLR we must use %rip-relative addressing to avoid overflowing the signed 32-bit immediate size of x86-64. Since %rip-relative addressing comes essentially for free and can make linking significantly easier, we use it on all platforms.
* codeGen: Fix signedness of jump table indexingBen Gamari2022-03-181-6/+40
| | | | | | | | | | Previously while constructing the jump table index we would zero-extend the discriminant before subtracting the start of the jump-table. This goes subtly wrong in the case of a sub-word, signed discriminant, as described in the included Note. Fix this in both the PPC and X86 NCGs. Fixes #21186.
* NCG: inline some 64-bit primops on x86/32-bit (#5444)Sylvain Henry2022-02-232-37/+272
| | | | | | | | Several 64-bit operation were implemented with FFI calls on 32-bit architectures but we can easily implement them with inline assembly code. Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
* NCG: refactor the way registers are handledSylvain Henry2022-02-231-137/+91
| | | | | | | | | | | | * add getLocalRegReg to avoid allocating a CmmLocal just to call getRegisterReg * 64-bit registers: in the general case we must always use the virtual higher part of the register, so we might as well always return it with the lower part. The only exception is to implement 64-bit to 32-bit conversions. We now have to explicitly discard the higher part when matching on Reg64/RegCode64 datatypes instead of explicitly fetching the higher part from the lower part: much safer default.
* NCG: refactor X86 codegenSylvain Henry2022-02-231-932/+1054
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preliminary work done to make working on #5444 easier. Mostly make make control-flow easier to follow: * renamed genCCall into genForeignCall * split genForeignCall into the part dispatching on PrimTarget (genPrim) and the one really generating code for a C call (cf ForeignTarget and genCCall) * made genPrim/genSimplePrim only dispatch on MachOp: each MachOp now has its own code generation function. * out-of-line primops are not handled in a partial `outOfLineCmmOp` anymore but in the code generation functions directly. Helper functions have been introduced (e.g. genLibCCall) for code sharing. * the latter two bullets make code generated for primops that are only sometimes out-of-line (e.g. Pdep or Memcpy) and the logic to select between inline/out-of-line much more localized * avoided passing is32bit as an argument as we can easily get it from NatM state when we really need it * changed genCCall type to avoid it being partial (it can't handle PrimTarget) * globally removed 12 calls to `panic` thanks to better control flow and types ("parse, don't validate" ftw!).
* StgToCmm: Get rid of GHC.Driver.Session importsJohn Ericson2022-02-081-1/+0
| | | | | `DynFlags` is gone, but let's move a few trivial things around to get rid of its module too.
* Fix some notesMatthew Pickering2022-02-081-1/+1
|
* Introduce alignment to CmmStoreBen Gamari2022-02-041-1/+1
|
* Introduce alignment in CmmLoadBen Gamari2022-02-041-24/+24
|
* Fix a few Note inconsistenciesBen Gamari2022-02-013-17/+10
|
* Consistently upper-case "Note ["Ben Gamari2022-02-012-2/+2
| | | | | | This was achieved with git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
* CmmToAsm: Drop RegPairBen Gamari2022-01-293-8/+0
| | | | SPARC was its last and only user.
* A few comment cleanupsBen Gamari2022-01-291-7/+0
|
* Perf: avoid using (replicateM . length) when possibleSylvain Henry2021-12-171-2/+1
| | | | Extracted from !6622
* nativeGen/x86: Don't encode large shift offsetsBen Gamari2021-12-021-1/+10
| | | | | | | | | | | Handle the case of a shift larger than the width of the shifted value. This is necessary since x86 applies a mask of 0x1f to the shift amount, meaning that, e.g., `shr 47, $eax` will actually shift by 47 & 0x1f == 15. See #20626. (cherry picked from commit 31370f1afe1e2f071b3569fb5ed4a115096127ca)
* i386: fix codegen of 64-bit comparisonsSylvain Henry2021-11-061-14/+21
|
* Rectifying COMMENT and `mkComment` across platforms to work with SDocBenjamin Maurer2021-09-293-4/+3
| | | | and exhibit similar behaviors. Issue 20400
* ncg: Kill incorrect unreachable codeBen Gamari2021-09-111-3/+3
| | | | | | As noted in #18183, these cases were previously incorrect and unused. Closes #18183.
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* PrimOps: Add CAS op for all int sizesPeter Trommler2021-08-021-2/+5
| | | | | | | | | | | PPC NCG: Implement CAS inline for 32 and 64 bit testsuite: Add tests for smaller atomic CAS X86 NCG: Catch calls to CAS C fallback Primops: Add atomicCasWord[8|16|32|64]Addr# Add tests for atomicCasWord[8|16|32|64]Addr# Add changelog entry for new primops X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch ghc-prim: 64-bit CAS C fallback on all archs
* Fix #19931John Ericson2021-07-211-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | The issue was the renderer for x86 addressing modes assumes native size registers, but we were passing in a possibly-smaller index in conjunction with a native-sized base pointer. The easist thing to do is just extend the register first. I also changed the other NGC backends implementing jump tables accordingly. On one hand, I think PowerPC and Sparc don't have the small sub-registers anyways so there is less to worry about. On the other hand, to the extent that's true the zero extension can become a no-op. I should give credit where it's due: @hsyl20 really did all the work for me in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4717#note_355874, but I was daft and missed the "Oops" and so ended up spending a silly amount of time putting it all back together myself. The unregisterised backend change is a bit different, because here we are translating the actual case not a jump table, and the fix is to handle right-sized literals not addressing modes. But it makes sense to include here too because it's the same change in the subsequent commit that exposes both bugs.
* Add Word64#/Int64# primopsSylvain Henry2021-07-151-0/+30
| | | | | | | | | | | | | | | | | | | | | | | Word64#/Int64# are only used on 32-bit architectures. Before this patch, operations on these types were directly using the FFI. Now we use real primops that are then lowered into ccalls. The advantage of doing this is that we can now perform constant folding on Word64#/Int64# (#19024). Most of this work was done by John Ericson in !3658. However this patch doesn't go as far as e.g. changing Word64 to always be using Word64#. Noticeable performance improvements T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD Metric Decrease: T9203 haddock.Cabal haddock.base
* Fix #19889 - Invalid BMI2 instructions generated.wip/andreask/bim-fixAndreas Klebinger2021-07-062-24/+26
| | | | | When arguments are 8 *or 16* bits wide, then truncate before/after and use the 32bit operation.
* Adds AArch64 Native Code GeneratorMoritz Angermann2021-06-051-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which we add a new code generator to the Glasgow Haskell Compiler. This codegen supports ELF and Mach-O targets, thus covering Linux, macOS, and BSDs in principle. It was tested only on macOS and Linux. The NCG follows a similar structure as the other native code generators we already have, and should therfore be realtively easy to follow. It supports most of the features required for a proper native code generator, but does not claim to be perfect or fully optimised. There are still opportunities for optimisations. Metric Decrease: ManyAlternatives ManyConstructors MultiLayerModules PmSeriesG PmSeriesS PmSeriesT PmSeriesV T10421 T10421a T10858 T11195 T11276 T11303b T11374 T11822 T12227 T12545 T12707 T13035 T13253 T13253-spj T13379 T13701 T13719 T14683 T14697 T15164 T15630 T16577 T17096 T17516 T17836 T17836b T17977 T17977b T18140 T18282 T18304 T18478 T18698a T18698b T18923 T1969 T3064 T5030 T5321FD T5321Fun T5631 T5642 T5837 T783 T9198 T9233 T9630 T9872d T9961 WWRec Metric Increase: T4801
* Cmm: fix sinking after suspendThreadSylvain Henry2021-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose a safe call: myCall(x,y,z) It is lowered into three unsafe calls in Cmm: r = suspendThread(...); myCall(x,y,z); resumeThread(r); Consider the following situation for myCall arguments: x = Sp[..] -- stack y = Hp[..] -- heap z = R1 -- global register r = suspendThread(...); myCall(x,y,z); resumeThread(r); The sink pass assumes that unsafe calls clobber memory (heap and stack), hence x and y assignments are not sunk after `suspendThread`. The sink pass also correctly handles global register clobbering for all unsafe calls, except `suspendThread`! `suspendThread` is special because it releases the capability the thread is running on. Hence the sink pass must also take into account global registers that are mapped into memory (in the capability). In the example above, we could get: r = suspendThread(...); z = R1 myCall(x,y,z); resumeThread(r); But this transformation isn't valid if R1 is (BaseReg->rR1) as BaseReg is invalid between suspendThread and resumeThread. This caused argument corruption at least with the C backend ("unregisterised") in #19237. Fix #19237
* Remove useless {-# LANGUAGE CPP #-} pragmasSylvain Henry2021-05-125-5/+4
|
* Fully remove HsVersions.hSylvain Henry2021-05-125-10/+0
| | | | | | | | | | Replace uses of WARN macro with calls to: warnPprTrace :: Bool -> SDoc -> a -> a Remove the now unused HsVersions.h Bump haddock submodule
* Replace CPP assertions with Haskell functionsSylvain Henry2021-05-121-13/+15
| | | | | | | | | | | | | | | There is no reason to use CPP. __LINE__ and __FILE__ macros are now better replaced with GHC's CallStack. As a bonus, assert error messages now contain more information (function name, column). Here is the mapping table (HasCallStack omitted): * ASSERT: assert :: Bool -> a -> a * MASSERT: massert :: Bool -> m () * ASSERTM: assertM :: m Bool -> m () * ASSERT2: assertPpr :: Bool -> SDoc -> a -> a * MASSERT2: massertPpr :: Bool -> SDoc -> m () * ASSERTM2: assertPprM :: m Bool -> SDoc -> m ()
* Replace (ptext .. sLit) with `text`Sylvain Henry2021-04-292-214/+209
| | | | | | | | | | | | | | | 1. `text` is as efficient as `ptext . sLit` thanks to the rewrite rules 2. `text` is visually nicer than `ptext . sLit` 3. `ptext . sLit` encourages using one `ptext` for several `sLit` as in: ptext $ case xy of ... -> sLit ... ... -> sLit ... which may allocate SDoc's TextBeside constructors at runtime instead of sharing them into CAFs.
* Re-export GHC.Bits from GHC.Prelude with custom shift implementation.Andreas Klebinger2021-04-092-2/+0
| | | | | | | This allows us to use the unsafe shifts in non-debug builds for performance. For older versions of base we instead export Data.Bits See also #19618
* Transfer tickish things to GHC.Types.TickishLuite Stegeman2021-03-201-1/+1
| | | | | Metric Increase: MultiLayerModules
* Save the type of breakpoints in the Breakpoint tick in STGLuite Stegeman2021-03-201-1/+1
| | | | | | | | GHCi needs to know the types of all breakpoints, but it's not possible to get the exprType of any expression in STG. This is preparation for the upcoming change to make GHCi bytecode from STG instead of Core.
* Require GHC 8.10 as the minimum compiler for bootstrappingRyan Scott2021-03-091-7/+0
| | | | | | | Now that GHC 9.0.1 is released, it is time to drop support for bootstrapping with GHC 8.8, as we only support building with the previous two major GHC releases. As an added bonus, this allows us to remove several bits of CPP that are either always true or no longer reachable.
* Fix typosBrian Wignall2021-02-061-1/+1
|
* Add Addr# atomic primops (#17751)Sylvain Henry2020-11-161-5/+5
| | | | This reuses the codegen used for ByteArray#'s atomic primops.
* nativeGen/dwarf: Fix procedure end addressesBen Gamari2020-11-151-9/+19
| | | | | | | | | | | | Previously the `.debug_aranges` and `.debug_info` (DIE) DWARF information would claim that procedures (represented with a `DW_TAG_subprogram` DIE) would only span the range covered by their entry block. This omitted all of the continuation blocks (represented by `DW_TAG_lexical_block` DIEs), confusing `perf`. Fix this by introducing a end-of-procedure label and using this as the `DW_AT_high_pc` of procedure `DW_TAG_subprogram` DIEs Fixes #17605.
* codeGen: Produce local symbols for module-internal functionsBen Gamari2020-11-111-0/+11
| | | | | | | | | | | | | | | | | | | | It turns out that some important native debugging/profiling tools (e.g. perf) rely only on symbol tables for function name resolution (as opposed to using DWARF DIEs). However, previously GHC would emit temporary symbols (e.g. `.La42b`) to identify module-internal entities. Such symbols are dropped during linking and therefore not visible to runtime tools (in addition to having rather un-helpful unique names). For instance, `perf report` would often end up attributing all cost to the libc `frame_dummy` symbol since Haskell code was no covered by any proper symbol (see #17605). We now rather follow the model of C compilers and emit descriptively-named local symbols for module internal things. Since this will increase object file size this behavior can be disabled with the `-fno-expose-internal-symbols` flag. With this `perf record` can finally be used against Haskell executables. Even more, with `-g3` `perf annotate` provides inline source code.
* Don't use LEA with 8-bit registers (#18614)Sylvain Henry2020-11-041-2/+6
|
* NCG: Fix 64bit int comparisons on 32bit x86Andreas Klebinger2020-11-042-30/+100
| | | | | | | | | | | We no compare these by doing 64bit subtraction and checking the resulting flags. We used to do this differently but the old approach was broken when the high bits compared equal and the comparison was one of >= or <=. The new approach should be both correct and faster.
* Add the proper HLint rules and remove redundant keywords from compilerHécate2020-11-011-20/+21
|
* rts: fix race condition in StgCRunTamar Christina2020-10-091-2/+3
| | | | | | | | | | | | | | | | On windows the stack has to be allocated 4k at a time, otherwise we get a segfault. This is done by using a helper ___chkstk_ms that is provided by libgcc. The Haskell side already knows how to handle this but we need to do the same from STG. Previously we would drop the stack in StgRun but would only make it valid whenever the scheduler loop ran. This approach was fundamentally broken in that it falls apart when you take a signal from the OS. We see it less often because you initially get allocated a 1MB stack block which you have to blow past first. Concretely this means we must always keep the stack valid. Fixes #18601.
* WinIO: Small changes related to atomic request swaps.Andreas Klebinger2020-10-071-0/+2
| | | | | | | | | | | | Move the atomix exchange over the Ptr type to an internal module. Fix a bug caused by us passing ptr-to-ptr instead of ptr to atomic exchange. Renamed interlockedExchange to exchangePtr. I've also added an cas primitive. It turned out we don't need it for WinIO but I'm leaving it in as it's useful for other things.
* Don't import GHC.Unit to reduce the number of dependenciesSylvain Henry2020-10-011-1/+1
|
* Introduce OutputablePSylvain Henry2020-09-172-52/+56
| | | | | | | | | | | | | | | | | | | | | | | | | Some types need a Platform value to be pretty-printed: CLabel, Cmm types, instructions, etc. Before this patch they had an Outputable instance and the Platform value was obtained via sdocWithDynFlags. It meant that the *renderer* of the SDoc was responsible of passing the appropriate Platform value (e.g. via the DynFlags given to showSDoc). It put the burden of passing the Platform value on the renderer while the generator of the SDoc knows the Platform it is generating the SDoc for and there is no point passing a different Platform at rendering time. With this patch, we introduce a new OutputableP class: class OutputableP a where pdoc :: Platform -> a -> SDoc With this class we still have some polymorphism as we have with `ppr` (i.e. we can use `pdoc` on a variety of types instead of having a dedicated `pprXXX` function for each XXX type). One step closer removing `sdocWithDynFlags` (#10143) and supporting several platforms (#14335).
* PPC and X86: Portable printing of IEEE floatsPeter Trommler2020-08-261-13/+4
| | | | | | | | | | | | GNU as and the AIX assembler support floating point literals. SPARC seems to have support too but I cannot test on SPARC. Curiously, `doubleToBytes` is also used in the LLVM backend. To avoid endianness issues when cross-compiling float and double literals are printed as C-style floating point values. The assembler then takes care of memory layout and endianness. This was brought up in #18431 by @hsyl20.