summaryrefslogtreecommitdiff
path: root/compiler/GHC/CmmToAsm
Commit message (Collapse)AuthorAgeFilesLines
* nativeGen: Note signed-extended nature of MOVwip/windows-high-codegenBen Gamari2022-04-061-0/+4
|
* Don't assume that labels are 32-bit on WindowsBen Gamari2022-04-061-10/+17
|
* Refactor is32BitLit to take Platform rather than BoolBen Gamari2022-04-061-37/+33
|
* Generate LEA for label expressionsBen Gamari2022-04-061-0/+16
|
* nativeGen/x86: Use %rip-relative addressingBen Gamari2022-04-061-8/+49
| | | | | | | On Windows with high-entropy ASLR we must use %rip-relative addressing to avoid overflowing the signed 32-bit immediate size of x86-64. Since %rip-relative addressing comes essentially for free and can make linking significantly easier, we use it on all platforms.
* Refactor handling of global initializersBen Gamari2022-04-012-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | GHC uses global initializers for a number of things including cost-center registration, info-table provenance registration, and setup of foreign exports. Previously, the global initializer arrays which referenced these initializers would live in the object file of the C stub, which would then be merged into the main object file of the module. Unfortunately, this approach is no longer tenable with the move to Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does not support object merging (that is, the -r flag). Instead we are now rather packaging a module's object files into a static library. However, this is problematic in the case of initializers as there are no references to the C stub object in the archive, meaning that the linker may drop the object from the final link. This patch refactors our handling of global initializers to instead place initializer arrays within the object file of the module to which they belong. We do this by introducing a Cmm data declaration containing the initializer array in the module's Cmm stream. While the initializer functions themselves remain in separate C stub objects, the reference from the module's object ensures that they are not dropped from the final link. In service of #21068.
* Fix all invalid haddock comments in the compilerZubin Duggal2022-03-294-8/+8
| | | | Fixes #20935 and #20924
* hi haddock: Lex and store haddock docs in interface filesZubin Duggal2022-03-231-2/+2
| | | | | | | | | | | | | | | | | | Names appearing in Haddock docstrings are lexed and renamed like any other names appearing in the AST. We currently rename names irrespective of the namespace, so both type and constructor names corresponding to an identifier will appear in the docstring. Haddock will select a given name as the link destination based on its own heuristics. This patch also restricts the limitation of `-haddock` being incompatible with `Opt_KeepRawTokenStream`. The export and documenation structure is now computed in GHC and serialised in .hi files. This can be used by haddock to directly generate doc pages without reparsing or renaming the source. At the moment the operation of haddock is not modified, that's left to a future patch. Updates the haddock submodule with the minimum changes needed.
* codeGen: Fix signedness of jump table indexingBen Gamari2022-03-182-11/+47
| | | | | | | | | | Previously while constructing the jump table index we would zero-extend the discriminant before subtracting the start of the jump-table. This goes subtly wrong in the case of a sub-word, signed discriminant, as described in the included Note. Fix this in both the PPC and X86 NCGs. Fixes #21186.
* NCG: inline some 64-bit primops on x86/32-bit (#5444)Sylvain Henry2022-02-232-37/+272
| | | | | | | | Several 64-bit operation were implemented with FFI calls on 32-bit architectures but we can easily implement them with inline assembly code. Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
* NCG: refactor the way registers are handledSylvain Henry2022-02-233-269/+229
| | | | | | | | | | | | * add getLocalRegReg to avoid allocating a CmmLocal just to call getRegisterReg * 64-bit registers: in the general case we must always use the virtual higher part of the register, so we might as well always return it with the lower part. The only exception is to implement 64-bit to 32-bit conversions. We now have to explicitly discard the higher part when matching on Reg64/RegCode64 datatypes instead of explicitly fetching the higher part from the lower part: much safer default.
* NCG: refactor X86 codegenSylvain Henry2022-02-231-932/+1054
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preliminary work done to make working on #5444 easier. Mostly make make control-flow easier to follow: * renamed genCCall into genForeignCall * split genForeignCall into the part dispatching on PrimTarget (genPrim) and the one really generating code for a C call (cf ForeignTarget and genCCall) * made genPrim/genSimplePrim only dispatch on MachOp: each MachOp now has its own code generation function. * out-of-line primops are not handled in a partial `outOfLineCmmOp` anymore but in the code generation functions directly. Helper functions have been introduced (e.g. genLibCCall) for code sharing. * the latter two bullets make code generated for primops that are only sometimes out-of-line (e.g. Pdep or Memcpy) and the logic to select between inline/out-of-line much more localized * avoided passing is32bit as an argument as we can easily get it from NatM state when we really need it * changed genCCall type to avoid it being partial (it can't handle PrimTarget) * globally removed 12 calls to `panic` thanks to better control flow and types ("parse, don't validate" ftw!).
* NCG: minor code factorizationSylvain Henry2022-02-092-51/+35
|
* StgToCmm: Get rid of GHC.Driver.Session importsJohn Ericson2022-02-081-1/+0
| | | | | `DynFlags` is gone, but let's move a few trivial things around to get rid of its module too.
* Fix some notesMatthew Pickering2022-02-081-1/+1
|
* Introduce alignment to CmmStoreBen Gamari2022-02-043-3/+3
|
* Introduce alignment in CmmLoadBen Gamari2022-02-043-37/+37
|
* cmm: Introduce cmmLoadBWord and cmmLoadGCWordBen Gamari2022-02-041-1/+2
|
* Fix a few Note inconsistenciesBen Gamari2022-02-0111-41/+34
|
* Consistently upper-case "Note ["Ben Gamari2022-02-015-5/+5
| | | | | | This was achieved with git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
* CmmToAsm: Drop ncgExpandTopBen Gamari2022-01-294-4/+0
| | | | This was only needed for SPARC's synthetic instructions.
* CmmToAsm: Drop RegPairBen Gamari2022-01-2914-53/+0
| | | | SPARC was its last and only user.
* Rip out remaining SPARC supportBen Gamari2022-01-294-20/+0
|
* A few comment cleanupsBen Gamari2022-01-292-12/+0
|
* Drop SPARC NCGBen Gamari2022-01-2924-4214/+10
|
* Fix typosKrzysztof Gogolewski2021-12-252-2/+2
|
* Cmm: DynFlags to CmmConfig refactordoyougnu2021-12-221-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add files GHC.Cmm.Config, GHC.Driver.Config.Cmm Cmm: DynFlag references --> CmmConfig Cmm.Pipeline: reorder imports, add handshake Cmm: DynFlag references --> CmmConfig Cmm.Pipeline: DynFlag references --> CmmConfig Cmm.LayoutStack: DynFlag references -> CmmConfig Cmm.Info.Build: DynFlag references -> CmmConfig Cmm.Config: use profile to retrieve platform Cmm.CLabel: unpack NCGConfig in labelDynamic Cmm.Config: reduce CmmConfig surface area Cmm.Config: add cmmDoCmmSwitchPlans field Cmm.Config: correct cmmDoCmmSwitchPlans flag The original implementation dispatches work in cmmImplementSwitchPlans in an `otherwise` branch, hence we must add a not to correctly dispatch Cmm.Config: add cmmSplitProcPoints simplify Config remove cmmBackend, and cmmPosInd Cmm.CmmToAsm: move ncgLabelDynamic to CmmToAsm Cmm.CLabel: remove cmmLabelDynamic function Cmm.Config: rename cmmOptDoLinting -> cmmDoLinting testsuite: update CountDepsAst CountDepsParser
* Perf: avoid using (replicateM . length) when possibleSylvain Henry2021-12-173-6/+3
| | | | Extracted from !6622
* nativeGen/aarch64: Fix handling of subword valuesBen Gamari2021-12-021-80/+189
| | | | | | | | | | | | | | | | Here we rework the handling of sub-word operations in the AArch64 backend, fixing a number of bugs and inconsistencies. In short, we now impose the invariant that all subword values are represented in registers in zero-extended form. Signed arithmetic operations are then responsible for sign-extending as necessary. Possible future work: * Use `CMP`s extended register form to avoid burning an instruction in sign-extending the second operand. * Track sign-extension state of registers to elide redundant sign extensions in blocks with frequent sub-word signed arithmetic.
* nativeGen/aarch64: Don't rely on register width to determine amodeBen Gamari2021-12-021-12/+16
| | | | | We might be loading, e.g., a 16- or 8-bit value, in which case the register width is not reflective of the loaded element size.
* ncg/aarch64: Don't sign extend loadsBen Gamari2021-12-021-2/+2
| | | | | Previously we would emit the sign-extending LDS[HB] instructions for sub-word loads. However, this is wrong, as noted in #20638.
* nativeGen/x86: Don't encode large shift offsetsBen Gamari2021-12-021-1/+10
| | | | | | | | | | | Handle the case of a shift larger than the width of the shifted value. This is necessary since x86 applies a mask of 0x1f to the shift amount, meaning that, e.g., `shr 47, $eax` will actually shift by 47 & 0x1f == 15. See #20626. (cherry picked from commit 31370f1afe1e2f071b3569fb5ed4a115096127ca)
* i386: fix codegen of 64-bit comparisonsSylvain Henry2021-11-061-14/+21
|
* Do not sign extend CmmInt's unless negative.Moritz Angermann2021-10-221-0/+5
| | | Might fix #20526.
* code gen: Disable dead code elimination when -finfo-table-map is enabledMatthew Pickering2021-10-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's important that when -finfo-table-map is enabled that we generate IPE entries just for those info tables which are actually used. To this end, the info tables which are used are collected just before code generation starts and entries only created for those tables. Not accounted for in this scheme was the dead code elimination in the native code generator. When compiling GHC this optimisation removed an info table which had an IPE entry which resulting in the following kind of linker error: ``` /home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sS_info' /home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sH_info' /home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sm_info' collect2: error: ld returned 1 exit status `cc' failed in phase `Linker'. (Exit code: 1) Development.Shake.cmd, system command failed ``` Unfortunately, by the time this optimisation happens the structure of the CmmInfoTable has been lost, we only have the generated code for the info table to play with so we can no longer just collect all the used info tables and generate the IPE map. This leaves us with two options: 1. Return a list of the names of the discarded info tables and then remove them from the map. This is awkward because we need to do code generation for the map as well. 2. Just disable this small code size optimisation when -finfo-table-map is enabled. The option produces very big object files anyway. Option 2 is much easier to implement and means we don't have to thread information around awkwardly. It's at the cost of slightly larger object files (as dead code is not eliminated). Disabling this optimisation allows an IPE build of GHC to complete successfully. Fixes #20428
* Don't use FastString for UTF-8 encoding onlySylvain Henry2021-10-021-1/+1
|
* code gen: Improve efficiency of findPrefRealRegMatthew Pickering2021-10-013-19/+50
| | | | | | | | | | | | | | | | | | Old strategy: For each variable linearly scan through all the blocks and check to see if the variable is any of the block register mappings. This is very slow when you have a lot of blocks. New strategy: Maintain a map from virtual registers to the first real register the virtual register was assigned to. Consult this map in findPrefRealReg. The map is updated when the register mapping is updated and is hidden behind the BlockAssigment abstraction. On the mmark package this reduces compilation time from about 44s to 32s. Ticket: #19471
* NCG: Linear-reg-alloc: A few small implemenation tweaks.Andreas Klebinger2021-09-301-19/+21
| | | | | | Removed an intermediate list via a fold. realRegsAlias: Manually inlined the list functions to get better code. Linear.hs added a bang somewhere.
* Rectifying COMMENT and `mkComment` across platforms to work with SDocBenjamin Maurer2021-09-2912-25/+27
| | | | and exhibit similar behaviors. Issue 20400
* Typo [skip ci]wip/typo-cgMatthew Pickering2021-09-231-1/+1
|
* Code Gen: Rewrite shortcutWeightMap more efficientlyMatthew Pickering2021-09-171-33/+53
| | | | | | | | | | | | This function was one of the main sources of allocation in a ticky profile due to how it repeatedly deleted nodes from a large map. Now firstly the cuts are normalised, so that chains of cuts are elimated before any rewrites are applied. Then the CFG is traversed and reconstructed once whilst applying the necessary rewrites to remove shortcutted edges (based on the normalised cuts). Ticket: #19471
* Code Gen: Use more efficient block merging algorithmMatthew Pickering2021-09-171-27/+17
| | | | | | | | | | | | | | | | | | The previous algorithm scaled poorly when there was a large number of blocks and edges. The algorithm links together block chains which have edges between them in the CFG. The new algorithm uses a union find data structure in order to efficiently merge together blocks and calculate which block chain each block id belonds to. I copied the UnionFind data structure which already existed in Cabal into the GHC library rathert than reimplement it myself. This change results in a very significant reduction in allocations when compiling the mmark package. Ticket: #19471
* Code Gen: Optimise successors calculation in loop calculationMatthew Pickering2021-09-171-5/+4
| | | | | | | | | | Before this change, the whole map would be traversed in order to delete a node from the graph before calculating successors. This is quite inefficient if the CFG is big, as was the case in the mmark package. A more efficient alternative is to leave the CFG untouched and then just delete the node once after the lookups have been performed. Ticket: #19471
* Code Gen: Replace another lazy fmap with strict mapMapMatthew Pickering2021-09-171-1/+1
|
* Code Gen: Use strict map rather than lazy map in loop analysisMatthew Pickering2021-09-171-1/+3
| | | | | | | | | | | We were ending up with a big 1GB thunk spike as the `fmap` operation did not force the key values promptly. This fixes the high maximum memory consumption when compiling the mmark package. Compilation is still slow and allocates a lot more than previous releases. Related to #19471
* ncg: Kill incorrect unreachable codeBen Gamari2021-09-111-3/+3
| | | | | | As noted in #18183, these cases were previously incorrect and unused. Closes #18183.
* AArch64 NCG: Emit FABS instructions for fabsFloat# and fabsDouble#ARATA Mizuki2021-08-283-2/+21
| | | | Closes #20275
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-093-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* [AArch64/Darwin] fix packed calling conv alignmentMoritz Angermann2021-08-021-6/+38
| | | | | | Apparently we need some padding as well. Fixes #20137
* PrimOps: Add CAS op for all int sizesPeter Trommler2021-08-022-3/+37
| | | | | | | | | | | PPC NCG: Implement CAS inline for 32 and 64 bit testsuite: Add tests for smaller atomic CAS X86 NCG: Catch calls to CAS C fallback Primops: Add atomicCasWord[8|16|32|64]Addr# Add tests for atomicCasWord[8|16|32|64]Addr# Add changelog entry for new primops X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch ghc-prim: 64-bit CAS C fallback on all archs