| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
| |
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
| |
This way CPP conditionals can be avoided for the transition period.
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for several new primitive operations which
support using processor-specific instructions to help guide data and
cache locality decisions. We have levels ranging from [0..3]
For LLVM, we generate llvm.prefetch intrinsics at the proper locality
level (similar to GCC.)
For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and
PowerPC, the locality levels are ignored.
This closes #8256.
Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem with unreachable code is that it might refer to undefined
registers. This happens accidentally: a block can be orphaned by an
optimisation, for example when the result of a comparsion becomes
known.
The register allocator panics when it finds an undefined register,
because they shouldn't occur in generated code. So we need to also
discard unreachable code to prevent this panic being triggered by
optimisations.
The register alloator already does a strongly-connected component
analysis, so it ought to be easy to make it discard unreachable code
as part of that traversal. It turns out that we need a different
variant of the scc algorithm to do that (see Digraph), however the new
variant also generates slightly better code by putting the blocks
within a loop in a better order for register allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width and element type.
SIMD primops are now polymorphic in vector size and element type, but
only internally to the compiler. More specifically, utils/genprimopcode
has been extended so that it "knows" about SIMD vectors. This allows us
to, for example, write a single definition for the "add two vectors"
primop in primops.txt.pp and have it instantiated at many vector types.
This generates a primop in GHC.Prim for each vector type at which "add
two vectors" is instantiated, but only one data constructor for the
PrimOp data type, so the code generator is much, much simpler.
|
|
|
|
|
| |
Authored-by: David Luposchainsky <dluposchainsky@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch encompasses most of the basic infrastructure for GHCJS. It
includes:
* A new extension, -XJavaScriptFFI
* A new architecture, ArchJavaScript
* Parser and lexer support for 'foreign import javascript', only
available under -XJavaScriptFFI, using ArchJavaScript.
* As a knock-on, there is also a new 'WayCustom' constructor in
DynFlags, so clients of the GHC API can add custom 'tags' to their
built files. This should be useful for other users as well.
The remaining changes are really just the resulting fallout, making sure
all the cases are handled appropriately for DynFlags and Platform.
Authored-by: Luite Stegeman <stegeman@gmail.com>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machop: MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Authored-by: Vincent Hanquez <tab@snarc.org>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
| |
Clang doesn't like whitespace between macro and arguments.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
| |
This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machops MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Patch from Vincent Hanquez.
|
|
|
|
| |
It doesn't actually use it yet
|
| |
|
|
|
|
|
| |
It now has its own class, and the addImport function is defined in that
class, rather than needing to be passed as an argument.
|
| |
|
|
|
|
|
| |
Darwin x86 has inconsistent PIC base register, so splitting (which happened before)
ensures that each cmm procedure only has one entry point (namely the first block).
|
|
|
|
|
|
| |
per cmm procedure on Darwin/PPC, because of splitting.
x86 should be treated the same way, I'll come back to that later.
|
|
|
|
| |
add FETCHPC to all of them (this fixes #7814).
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
I don't think the x86-64 version is quite right, but this ought to be
enough to pass cgrun071.
This code is terrible and needs a complete refactor. There's a lot of
duplication, and we ought to be specifying the ABI in a much more
abstract way (like LLVM).
|
|
|
|
| |
Thanks to @PHO on #7498 for pointing this out.
|
|
|
|
| |
This patch is ported from #7510, which fixes the same bug in the x86 nativeGen.
|
|
|
|
| |
We have to reduce the maximum number of instructions to jump over depending on the number of info tables in a proc.
|
|
|
|
| |
Its implementation is totally specific to PPC.
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for 6 XMM registers on x86-64 which overlap with the F
and D registers and may hold 128-bit wide SIMD vectors. Because there is not a
good way to attach type information to STG registers, we aggressively bitcast in
the LLVM back-end.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lays the groundwork needed for primop support for SIMD vectors. In
addition to the groundwork, we add support for the FloatX4# primitive type and
associated primops.
* Add the FloatX4# primitive type and associated primops.
* Add CodeGen support for Float vectors.
* Compile vector operations to LLVM vector operations in the LLVM code
generator.
* Make the x86 native backend fail gracefully when encountering vector primops.
* Only generate primop wrappers for vector primops when using LLVM.
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| | |
I don't actually know if suggesting -fllvm as a workaround is useful
advice, but -fvia-C certainly won't help as it doesn't do anything
any more.
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
| |
This will add the following preprocessor defines when Haskell source
files are compiled:
* __SSE__ - If any version of SSE is enabled
* __SSE2__ - If SSE2 or greater is enabled
* __SSE4_2_ - If SSE4.2 is enabled
Note that SSE2 is enabled by default on x86-64.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were four bugs here. Clearly I didn't test this enough to
expose the bugs - it appeared to work on x86/Linux, but completely by
accident it seems.
1. the delta was wrong by a factor of the slot size (as noted on #7498)
2. we weren't correctly aligning the stack pointer (sp needs to be
16-byte aligned on x86/x86_64)
3. we were doing the adjustment multiple times in the case of a block
that was both a return point and a local branch target. To fix this I
had to add new shim blocks to adjust the stack pointer, and retarget
the original branches. See comment for details.
4. we were doing the adjustment for CALL instructions, which is
unnecessary and wrong; only JMPs should be preceded by a stack
adjustment.
(Someone with a PPC box will need to update the PPC version of
allocMoreStack to fix the above bugs, using the x86 version as a
guide.)
|
| |
|
|
|
|
| |
Fixes #7498.
|
| |
|
| |
|
| |
|
| |
|