| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
On Windows with high-entropy ASLR we must use %rip-relative addressing
to avoid overflowing the signed 32-bit immediate size of x86-64.
Since %rip-relative addressing comes essentially for free and can make
linking significantly easier, we use it on all platforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC uses global initializers for a number of things including
cost-center registration, info-table provenance registration, and setup
of foreign exports. Previously, the global initializer arrays which
referenced these initializers would live in the object file of the C
stub, which would then be merged into the main object file of the
module.
Unfortunately, this approach is no longer tenable with the move to
Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does
not support object merging (that is, the -r flag). Instead we are now
rather packaging a module's object files into a static library. However,
this is problematic in the case of initializers as there are no
references to the C stub object in the archive, meaning that the linker
may drop the object from the final link.
This patch refactors our handling of global initializers to instead
place initializer arrays within the object file of the module to which
they belong. We do this by introducing a Cmm data declaration containing
the initializer array in the module's Cmm stream. While the initializer
functions themselves remain in separate C stub objects, the reference
from the module's object ensures that they are not dropped from the
final link.
In service of #21068.
|
|
|
|
| |
Fixes #20935 and #20924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Names appearing in Haddock docstrings are lexed and renamed like any other names
appearing in the AST. We currently rename names irrespective of the namespace,
so both type and constructor names corresponding to an identifier will appear in
the docstring. Haddock will select a given name as the link destination based on
its own heuristics.
This patch also restricts the limitation of `-haddock` being incompatible with
`Opt_KeepRawTokenStream`.
The export and documenation structure is now computed in GHC and serialised in
.hi files. This can be used by haddock to directly generate doc pages without
reparsing or renaming the source. At the moment the operation of haddock
is not modified, that's left to a future patch.
Updates the haddock submodule with the minimum changes needed.
|
|
|
|
|
|
|
|
|
|
| |
Previously while constructing the jump table index we would
zero-extend the discriminant before subtracting the start of the
jump-table. This goes subtly wrong in the case of a sub-word, signed
discriminant, as described in the included Note. Fix this in both the
PPC and X86 NCGs.
Fixes #21186.
|
|
|
|
|
|
|
|
| |
Several 64-bit operation were implemented with FFI calls on 32-bit
architectures but we can easily implement them with inline assembly
code.
Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add getLocalRegReg to avoid allocating a CmmLocal just to call
getRegisterReg
* 64-bit registers: in the general case we must always use the virtual
higher part of the register, so we might as well always return it with
the lower part. The only exception is to implement 64-bit to 32-bit
conversions. We now have to explicitly discard the higher part when
matching on Reg64/RegCode64 datatypes instead of explicitly fetching
the higher part from the lower part: much safer default.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preliminary work done to make working on #5444 easier.
Mostly make make control-flow easier to follow:
* renamed genCCall into genForeignCall
* split genForeignCall into the part dispatching on PrimTarget (genPrim) and
the one really generating code for a C call (cf ForeignTarget and genCCall)
* made genPrim/genSimplePrim only dispatch on MachOp: each MachOp now
has its own code generation function.
* out-of-line primops are not handled in a partial `outOfLineCmmOp`
anymore but in the code generation functions directly. Helper
functions have been introduced (e.g. genLibCCall) for code sharing.
* the latter two bullets make code generated for primops that are only
sometimes out-of-line (e.g. Pdep or Memcpy) and the logic to select
between inline/out-of-line much more localized
* avoided passing is32bit as an argument as we can easily get it from NatM
state when we really need it
* changed genCCall type to avoid it being partial (it can't handle
PrimTarget)
* globally removed 12 calls to `panic` thanks to better control flow and
types ("parse, don't validate" ftw!).
|
| |
|
|
|
|
|
| |
`DynFlags` is gone, but let's move a few trivial things around to get
rid of its module too.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This was achieved with
git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
|
|
|
|
| |
This was only needed for SPARC's synthetic instructions.
|
|
|
|
| |
SPARC was its last and only user.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add files GHC.Cmm.Config, GHC.Driver.Config.Cmm
Cmm: DynFlag references --> CmmConfig
Cmm.Pipeline: reorder imports, add handshake
Cmm: DynFlag references --> CmmConfig
Cmm.Pipeline: DynFlag references --> CmmConfig
Cmm.LayoutStack: DynFlag references -> CmmConfig
Cmm.Info.Build: DynFlag references -> CmmConfig
Cmm.Config: use profile to retrieve platform
Cmm.CLabel: unpack NCGConfig in labelDynamic
Cmm.Config: reduce CmmConfig surface area
Cmm.Config: add cmmDoCmmSwitchPlans field
Cmm.Config: correct cmmDoCmmSwitchPlans flag
The original implementation dispatches work in cmmImplementSwitchPlans
in an `otherwise` branch, hence we must add a not to correctly dispatch
Cmm.Config: add cmmSplitProcPoints simplify Config
remove cmmBackend, and cmmPosInd
Cmm.CmmToAsm: move ncgLabelDynamic to CmmToAsm
Cmm.CLabel: remove cmmLabelDynamic function
Cmm.Config: rename cmmOptDoLinting -> cmmDoLinting
testsuite: update CountDepsAst CountDepsParser
|
|
|
|
| |
Extracted from !6622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we rework the handling of sub-word operations in the AArch64
backend, fixing a number of bugs and inconsistencies. In short,
we now impose the invariant that all subword values are represented in
registers in zero-extended form. Signed arithmetic operations are then
responsible for sign-extending as necessary.
Possible future work:
* Use `CMP`s extended register form to avoid burning an instruction
in sign-extending the second operand.
* Track sign-extension state of registers to elide redundant sign
extensions in blocks with frequent sub-word signed arithmetic.
|
|
|
|
|
| |
We might be loading, e.g., a 16- or 8-bit value, in which case the
register width is not reflective of the loaded element size.
|
|
|
|
|
| |
Previously we would emit the sign-extending LDS[HB] instructions for
sub-word loads. However, this is wrong, as noted in #20638.
|
|
|
|
|
|
|
|
|
|
|
| |
Handle the case of a shift larger than the width of the shifted value.
This is necessary since x86 applies a mask of 0x1f to the shift amount,
meaning that, e.g., `shr 47, $eax` will actually shift by
47 & 0x1f == 15.
See #20626.
(cherry picked from commit 31370f1afe1e2f071b3569fb5ed4a115096127ca)
|
| |
|
|
|
| |
Might fix #20526.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's important that when -finfo-table-map is enabled that we generate
IPE entries just for those info tables which are actually used. To this
end, the info tables which are used are collected just before code
generation starts and entries only created for those tables.
Not accounted for in this scheme was the dead code elimination in the
native code generator. When compiling GHC this optimisation removed an
info table which had an IPE entry which resulting in the following kind
of linker error:
```
/home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sS_info'
/home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sH_info'
/home/matt/ghc-with-debug/_build/stage1/lib/../lib/x86_64-linux-ghc-9.3.20210928/libHSCabal-3.5.0.0-ghc9.3.20210928.so: error: undefined reference to '.Lc5sm_info'
collect2: error: ld returned 1 exit status
`cc' failed in phase `Linker'. (Exit code: 1)
Development.Shake.cmd, system command failed
```
Unfortunately, by the time this optimisation happens the structure of
the CmmInfoTable has been lost, we only have the generated code for the
info table to play with so we can no longer just collect all the used
info tables and generate the IPE map.
This leaves us with two options:
1. Return a list of the names of the discarded info tables and then
remove them from the map. This is awkward because we need to do code
generation for the map as well.
2. Just disable this small code size optimisation when -finfo-table-map
is enabled. The option produces very big object files anyway.
Option 2 is much easier to implement and means we don't have to thread
information around awkwardly. It's at the cost of slightly larger object
files (as dead code is not eliminated).
Disabling this optimisation allows an IPE build of GHC to complete
successfully.
Fixes #20428
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old strategy: For each variable linearly scan through all the blocks and
check to see if the variable is any of the block register mappings. This
is very slow when you have a lot of blocks.
New strategy: Maintain a map from virtual registers to the first real
register the virtual register was assigned to. Consult this map in
findPrefRealReg.
The map is updated when the register mapping is updated and is hidden
behind the BlockAssigment abstraction.
On the mmark package this reduces compilation time from about 44s to
32s.
Ticket: #19471
|
|
|
|
|
|
| |
Removed an intermediate list via a fold.
realRegsAlias: Manually inlined the list functions to get better code.
Linear.hs added a bang somewhere.
|
|
|
|
| |
and exhibit similar behaviors. Issue 20400
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function was one of the main sources of allocation in a ticky
profile due to how it repeatedly deleted nodes from a large map.
Now firstly the cuts are normalised, so that chains of cuts are elimated
before any rewrites are applied. Then the CFG is traversed and
reconstructed once whilst applying the necessary rewrites to remove
shortcutted edges (based on the normalised cuts).
Ticket: #19471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous algorithm scaled poorly when there was a large number of
blocks and edges.
The algorithm links together block chains which have edges between them
in the CFG. The new algorithm uses a union find data structure in order
to efficiently merge together blocks and calculate which block chain
each block id belonds to.
I copied the UnionFind data structure which already existed in Cabal
into the GHC library rathert than reimplement it myself.
This change results in a very significant reduction in allocations when
compiling the mmark package.
Ticket: #19471
|
|
|
|
|
|
|
|
|
|
| |
Before this change, the whole map would be traversed in order to delete
a node from the graph before calculating successors. This is quite
inefficient if the CFG is big, as was the case in the mmark package. A
more efficient alternative is to leave the CFG untouched and then just
delete the node once after the lookups have been performed.
Ticket: #19471
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We were ending up with a big 1GB thunk spike as the `fmap` operation did
not force the key values promptly.
This fixes the high maximum memory consumption when compiling the mmark
package. Compilation is still slow and allocates a lot more than
previous releases.
Related to #19471
|
|
|
|
|
|
| |
As noted in #18183, these cases were previously incorrect and unused.
Closes #18183.
|
|
|
|
| |
Closes #20275
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
|
|
| |
Apparently we need some padding as well.
Fixes #20137
|
|
|
|
|
|
|
|
|
|
|
| |
PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs
|