| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Previously we only preserved the bottom 64-bits of the callee-saved
128-bit XMM registers, in violation of the Win64 calling convention.
Fix this.
Fixes #21465.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
On Windows with high-entropy ASLR we must use %rip-relative addressing
to avoid overflowing the signed 32-bit immediate size of x86-64.
Since %rip-relative addressing comes essentially for free and can make
linking significantly easier, we use it on all platforms.
|
| |
|
|
|
|
|
|
|
|
| |
Previously while constructing the jump table index we would
zero-extend the discriminant before subtracting the start of the
jump-table. This goes subtly wrong in the case of a sub-word, signed
discriminant, as described in the included Note. Fix this in both the
PPC and X86 NCGs.
Fixes #21186.
|
| |
|
|
|
|
|
|
| |
Several 64-bit operation were implemented with FFI calls on 32-bit
architectures but we can easily implement them with inline assembly
code.
Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* add getLocalRegReg to avoid allocating a CmmLocal just to call
getRegisterReg
* 64-bit registers: in the general case we must always use the virtual
higher part of the register, so we might as well always return it with
the lower part. The only exception is to implement 64-bit to 32-bit
conversions. We now have to explicitly discard the higher part when
matching on Reg64/RegCode64 datatypes instead of explicitly fetching
the higher part from the lower part: much safer default.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preliminary work done to make working on #5444 easier.
Mostly make make control-flow easier to follow:
* renamed genCCall into genForeignCall
* split genForeignCall into the part dispatching on PrimTarget (genPrim) and
the one really generating code for a C call (cf ForeignTarget and genCCall)
* made genPrim/genSimplePrim only dispatch on MachOp: each MachOp now
has its own code generation function.
* out-of-line primops are not handled in a partial `outOfLineCmmOp`
anymore but in the code generation functions directly. Helper
functions have been introduced (e.g. genLibCCall) for code sharing.
* the latter two bullets make code generated for primops that are only
sometimes out-of-line (e.g. Pdep or Memcpy) and the logic to select
between inline/out-of-line much more localized
* avoided passing is32bit as an argument as we can easily get it from NatM
state when we really need it
* changed genCCall type to avoid it being partial (it can't handle
PrimTarget)
* globally removed 12 calls to `panic` thanks to better control flow and
types ("parse, don't validate" ftw!).
|
| |
|
|
|
| |
`DynFlags` is gone, but let's move a few trivial things around to get
rid of its module too.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
| |
This was achieved with
git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
|
| |
|
|
| |
SPARC was its last and only user.
|
| | |
|
| |
|
|
| |
Extracted from !6622
|
| |
|
|
|
|
|
|
|
|
|
| |
Handle the case of a shift larger than the width of the shifted value.
This is necessary since x86 applies a mask of 0x1f to the shift amount,
meaning that, e.g., `shr 47, $eax` will actually shift by
47 & 0x1f == 15.
See #20626.
(cherry picked from commit 31370f1afe1e2f071b3569fb5ed4a115096127ca)
|
| | |
|
| |
|
|
| |
and exhibit similar behaviors. Issue 20400
|
| |
|
|
|
|
| |
As noted in #18183, these cases were previously incorrect and unused.
Closes #18183.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
| |
|
|
|
|
|
|
|
|
|
| |
PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue was the renderer for x86 addressing modes assumes native size
registers, but we were passing in a possibly-smaller index in
conjunction with a native-sized base pointer.
The easist thing to do is just extend the register first.
I also changed the other NGC backends implementing jump tables
accordingly. On one hand, I think PowerPC and Sparc don't have the small
sub-registers anyways so there is less to worry about. On the other
hand, to the extent that's true the zero extension can become a no-op.
I should give credit where it's due: @hsyl20 really did all the work for
me in
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4717#note_355874,
but I was daft and missed the "Oops" and so ended up spending a silly
amount of time putting it all back together myself.
The unregisterised backend change is a bit different, because here we
are translating the actual case not a jump table, and the fix is to
handle right-sized literals not addressing modes. But it makes sense to
include here too because it's the same change in the subsequent commit
that exposes both bugs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Word64#/Int64# are only used on 32-bit architectures. Before this patch,
operations on these types were directly using the FFI. Now we use real
primops that are then lowered into ccalls.
The advantage of doing this is that we can now perform constant folding on
Word64#/Int64# (#19024).
Most of this work was done by John Ericson in !3658. However this patch
doesn't go as far as e.g. changing Word64 to always be using Word64#.
Noticeable performance improvements
T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD
haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD
haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD
Metric Decrease:
T9203
haddock.Cabal
haddock.base
|
| |
|
|
|
| |
When arguments are 8 *or 16* bits wide, then truncate before/after
and use the 32bit operation.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppose a safe call: myCall(x,y,z)
It is lowered into three unsafe calls in Cmm:
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
Consider the following situation for myCall arguments:
x = Sp[..] -- stack
y = Hp[..] -- heap
z = R1 -- global register
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
The sink pass assumes that unsafe calls clobber memory (heap and stack),
hence x and y assignments are not sunk after `suspendThread`. The sink
pass also correctly handles global register clobbering for all unsafe
calls, except `suspendThread`!
`suspendThread` is special because it releases the capability the thread
is running on. Hence the sink pass must also take into account global
registers that are mapped into memory (in the capability).
In the example above, we could get:
r = suspendThread(...);
z = R1
myCall(x,y,z);
resumeThread(r);
But this transformation isn't valid if R1 is (BaseReg->rR1) as BaseReg
is invalid between suspendThread and resumeThread. This caused argument
corruption at least with the C backend ("unregisterised") in #19237.
Fix #19237
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Replace uses of WARN macro with calls to:
warnPprTrace :: Bool -> SDoc -> a -> a
Remove the now unused HsVersions.h
Bump haddock submodule
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason to use CPP. __LINE__ and __FILE__ macros are now
better replaced with GHC's CallStack. As a bonus, assert error messages
now contain more information (function name, column).
Here is the mapping table (HasCallStack omitted):
* ASSERT: assert :: Bool -> a -> a
* MASSERT: massert :: Bool -> m ()
* ASSERTM: assertM :: m Bool -> m ()
* ASSERT2: assertPpr :: Bool -> SDoc -> a -> a
* MASSERT2: massertPpr :: Bool -> SDoc -> m ()
* ASSERTM2: assertPprM :: m Bool -> SDoc -> m ()
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. `text` is as efficient as `ptext . sLit` thanks to the rewrite rules
2. `text` is visually nicer than `ptext . sLit`
3. `ptext . sLit` encourages using one `ptext` for several `sLit` as in:
ptext $ case xy of
... -> sLit ...
... -> sLit ...
which may allocate SDoc's TextBeside constructors at runtime instead
of sharing them into CAFs.
|
| |
|
|
|
|
|
| |
This allows us to use the unsafe shifts in non-debug builds for performance.
For older versions of base we instead export Data.Bits
See also #19618
|
| |
|
|
|
| |
Metric Increase:
MultiLayerModules
|
| |
|
|
|
|
|
|
| |
GHCi needs to know the types of all breakpoints, but it's
not possible to get the exprType of any expression in STG.
This is preparation for the upcoming change to make GHCi
bytecode from STG instead of Core.
|
| |
|
|
|
|
|
| |
Now that GHC 9.0.1 is released, it is time to drop support for bootstrapping
with GHC 8.8, as we only support building with the previous two major GHC
releases. As an added bonus, this allows us to remove several bits of CPP that
are either always true or no longer reachable.
|
| | |
|
| |
|
|
| |
This reuses the codegen used for ByteArray#'s atomic primops.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously the `.debug_aranges` and `.debug_info` (DIE) DWARF
information would claim that procedures (represented with a
`DW_TAG_subprogram` DIE) would only span the range covered by their entry
block. This omitted all of the continuation blocks (represented by
`DW_TAG_lexical_block` DIEs), confusing `perf`. Fix this by introducing
a end-of-procedure label and using this as the `DW_AT_high_pc` of
procedure `DW_TAG_subprogram` DIEs
Fixes #17605.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that some important native debugging/profiling tools (e.g.
perf) rely only on symbol tables for function name resolution (as
opposed to using DWARF DIEs). However, previously GHC would emit
temporary symbols (e.g. `.La42b`) to identify module-internal
entities. Such symbols are dropped during linking and therefore not
visible to runtime tools (in addition to having rather un-helpful unique
names). For instance, `perf report` would often end up attributing all
cost to the libc `frame_dummy` symbol since Haskell code was no covered
by any proper symbol (see #17605).
We now rather follow the model of C compilers and emit
descriptively-named local symbols for module internal things. Since this
will increase object file size this behavior can be disabled with the
`-fno-expose-internal-symbols` flag.
With this `perf record` can finally be used against Haskell executables.
Even more, with `-g3` `perf annotate` provides inline source code.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
We no compare these by doing 64bit subtraction and
checking the resulting flags.
We used to do this differently but the old approach was
broken when the high bits compared equal and the comparison
was one of >= or <=.
The new approach should be both correct and faster.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On windows the stack has to be allocated 4k at a time, otherwise we get
a segfault. This is done by using a helper ___chkstk_ms that is provided
by libgcc. The Haskell side already knows how to handle this but we need
to do the same from STG. Previously we would drop the stack in StgRun
but would only make it valid whenever the scheduler loop ran.
This approach was fundamentally broken in that it falls apart when you
take a signal from the OS. We see it less often because you initially
get allocated a 1MB stack block which you have to blow past first.
Concretely this means we must always keep the stack valid.
Fixes #18601.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Move the atomix exchange over the Ptr type to an internal module.
Fix a bug caused by us passing ptr-to-ptr instead of ptr to
atomic exchange.
Renamed interlockedExchange to exchangePtr.
I've also added an cas primitive. It turned out we don't need it
for WinIO but I'm leaving it in as it's useful for other things.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some types need a Platform value to be pretty-printed: CLabel, Cmm
types, instructions, etc.
Before this patch they had an Outputable instance and the Platform value
was obtained via sdocWithDynFlags. It meant that the *renderer* of the
SDoc was responsible of passing the appropriate Platform value (e.g. via
the DynFlags given to showSDoc). It put the burden of passing the
Platform value on the renderer while the generator of the SDoc knows the
Platform it is generating the SDoc for and there is no point passing a
different Platform at rendering time.
With this patch, we introduce a new OutputableP class:
class OutputableP a where
pdoc :: Platform -> a -> SDoc
With this class we still have some polymorphism as we have with `ppr`
(i.e. we can use `pdoc` on a variety of types instead of having a
dedicated `pprXXX` function for each XXX type).
One step closer removing `sdocWithDynFlags` (#10143) and supporting
several platforms (#14335).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
GNU as and the AIX assembler support floating point literals.
SPARC seems to have support too but I cannot test on SPARC.
Curiously, `doubleToBytes` is also used in the LLVM backend.
To avoid endianness issues when cross-compiling float and double literals
are printed as C-style floating point values. The assembler then takes
care of memory layout and endianness.
This was brought up in #18431 by @hsyl20.
|