| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
- Remove unneeded ones
- Use <..> for inter-package.
Besides general clean up, helps distinguish between the RTS we link
against vs the RTS we compile for.
|
| |
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
| |
Unfortunately this will require more work; register allocation is
quite broken.
This reverts commit acd795583625401c5554f8e04ec7efca18814011.
|
|
|
|
|
|
|
| |
This adds support for constructing vector types from Float#, Double# etc
and performing arithmetic operations on them
Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
|
|
|
|
| |
Previously log and exp were primitives yet log1p and expm1 were FFI
calls. Fix this non-uniformity.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On powerpc32 64-bit comparison code generated dangling
target labels. This caused ghc build failure as:
$ ./configure --target=powerpc-unknown-linux-gnu && make
...
SCCs aren't in reverse dependent order
bad blockId n3U
This happened because condIntCode' in PPC codegen generated
label name but did not place the label into `cmp_lo` code block.
The change adds the `cmp_lo` label into the case of negative
comparison.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
* makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
behavior in 32bit haskell code
* removes the 80bit floating point representation from the supported float sizes
* theres still 1 tiny bit of x87 support needed,
for handling float and double return values in FFI calls wrt the C ABI on x86_32,
but this one piece does not leak into the rest of NCG.
* Lots of code thats not been touched in a long time got deleted as a
consequence of all of this
all in all, this change paves the way towards a lot of future further
improvements in how GHC handles floating point computations, along with
making the native code gen more accessible to a larger pool of contributors.
|
|
|
|
|
|
| |
This commit includes the necessary changes in code and
documentation to support a primop that reverses a word's
bits. It also includes a test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We make liveness information for global registers
available on `JMP` and `BCTR`, which were the last instructions
missing. With complete liveness information we do not need to
reserve global registers in `freeReg` anymore. Moreover we
assign R9 and R10 to callee saves registers.
Cleanup by removing `Reg_Su`, which was unused, from `freeReg`
and removing unused register definitions.
The calculation of the number of floating point registers is too
conservative. Just follow X86 and specify the constants directly.
Overall on PowerPC this results in 0.3 % smaller code size in nofib
while runtime is slightly better in some tests.
|
|
|
|
| |
Fixes #16222
|
|
|
|
|
| |
Rename constructors in calling convention data type to reflect the
fact that they represent an ELF ABI not only a Linux ABI.
|
| |
|
|
|
|
|
| |
All operating systems except AIX and Darwin follow the ELF
specification.
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG.
Refactor common code with remainder operation.
Test Plan: validate (I validated on Linux powerpc64le and x86_64)
Reviewers: erikd, hvr, bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5323
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate code for MachOps with smaller than wordsize data.
Refactor conversion MachOps.
Fixes #15854
Test Plan: validate (I validated on powerpc64le and x86_64 Linux)
Reviewers: bgamari, hvr, erikd, simonmar
Subscribers: rwbarton, carter
GHC Trac Issues: #15854
Differential Revision: https://phabricator.haskell.org/D5300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: hvr, bgamari, simonmar, jrtc27
Reviewed By: bgamari
Subscribers: alpmestan, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D5034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes it possible to generate a static 32-bit relative label
offset on x86_64. Currently we can only generate word-sized label
offsets.
This will be used in D4634 to shrink info tables. See D4632 for more
details.
Test Plan: See D4632
Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4633
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'.
I found 'plusWord2#' while implementing this, which both lacks
documentation and has a slightly different specification than
'addWordC#', which means the generic implementation is unnecessarily
complex.
While I was at it, I also added lacking meta-information on PrimOps
and refactored 'subWordC#'s generic implementation to be branchless.
Reviewers: bgamari, simonmar, jrtc27, dfeuer
Reviewed By: bgamari, dfeuer
Subscribers: dfeuer, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4592
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is required by D4363. D4362 has the implementation for i386
this commit adds PowerPC.
Test Plan: validate
Reviewers: erikd, hvr, simonmar, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4468
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the bit deposit and extraction operations provided
by the BMI and BMI2 instruction set extensions on modern amd64 machines.
Implement x86 code generator for pdep and pext. Properly initialise
bmiVersion field.
pdep and pext test cases
Fix pattern match for pdep and pext instructions
Fix build of pdep and pext code for 32-bit architectures
Test Plan: Validate
Reviewers: austin, simonmar, bgamari, angerman
Reviewed By: bgamari
Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy
GHC Trac Issues: #14206
Differential Revision: https://phabricator.haskell.org/D4236
|
|
|
|
|
|
| |
This broke the 32-bit build.
This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the bit deposit and extraction operations provided
by the BMI and BMI2 instruction set extensions on modern amd64 machines.
Test Plan: Validate
Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd
Reviewed By: bgamari
Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie
GHC Trac Issues: #14206
Differential Revision: https://phabricator.haskell.org/D4063
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit rGHC8b007ab assigns the same label to the first basic block
of a proc and to the proc entry point. This violates the PPC 64-bit ELF
v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols.
This patch fixes duplicate symbols caused by block labels
In commit rGHCd7b8da1 an info table label is generated from a block id.
Getting the entry label from that info label leads to an undefined
symbol because a suffix "_entry" that is not present in the block label.
To fix that issue add a new info table label flavour for labels
derived from block ids. Converting such a label with toEntryLabel
produces the original block label.
Fixes #14311
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #14311
Differential Revision: https://phabricator.haskell.org/D4149
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement AtomicRMW ops, atomic read, atomic write
in PowerPC native code generator. Also implement
branch prediction because we need it in atomic ops
anyway.
This patch improves the issue in #12537 a bit but
does not fix it entirely.
The fallback operations for atomicread and atomicwrite
in libraries/ghc-prim/cbits/atomic.c are incorrect.
This patch avoids those functions by implementing the
operations directly in the native code generator. This
is also what the x86/amd64 NCG and the LLVM backend
do.
Test Plan: validate on AIX and PowerPC (32-bit) Linux
Reviewers: erikd, hvr, austin, bgamari, simonmar
Reviewed By: hvr, bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #12537
Differential Revision: https://phabricator.haskell.org/D3984
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depends on D4090
Reviewers: austin, bgamari, erikd, simonmar, alexbiehl
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4091
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes #14221, where the NCG and the DWARF code were apparently
giving two different names to the same block.
Test Plan: Validate with DWARF support enabled.
Reviewers: simonmar, austin
Subscribers: rwbarton, thomie
GHC Trac Issues: #14221
Differential Revision: https://phabricator.haskell.org/D3977
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously due to #12759 we disabled PIE support entirely. However, this
breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE,
allowing the user to build PIEs.
Test Plan: Validate
Reviewers: rwbarton, austin, simonmar
Subscribers: trommler, simonmar, trofi, jrtc27, thomie
GHC Trac Issues: #12759, #13702
Differential Revision: https://phabricator.haskell.org/D3589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Phab:D3265 we introduced MO_F32_Fabs and MO_F64_Fabs.
This patch improves code generation by generating PowerPC fabs
instructions.
Test Plan: run numeric/should_run/numrun015 or validate
Reviewers: austin, bgamari, hvr, simonmar, erikd
Reviewed By: erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3512
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide PowerPC optimised implementations of callish prim ops.
MO_?_QuotRem
The generic implementation of quotient remainder prim ops uses
a division and a remainder operation. There is no remainder on
PowerPC and so we need to implement remainder "by hand" which
results in a duplication of the divide operation when using the
generic code.
Avoid this duplication by implementing the prim op in the native
code generator.
MO_U_Mul2
Use PowerPC's instructions for long multiplication.
Addition and subtraction
Use PowerPC add/subtract with carry/overflow instructions
MO_Clz and MO_Ctz
Use PowerPC's CNTLZ instruction and implement count trailing
zeros using count leading zeros
MO_QuotRem2
Implement an algorithm given by Henry Warren in "Hacker's Delight"
using PowerPC divide instruction. TODO: Use long division instructions
when available (POWER7 and later).
Test Plan: validate on AIX and 32-bit Linux
Reviewers: simonmar, erikd, hvr, austin, bgamari
Reviewed By: erikd, hvr, bgamari
Subscribers: trofi, kgardas, thomie
Differential Revision: https://phabricator.haskell.org/D2973
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we have this in libraries/base/GHC/Float.hs:
```
abs x | x == 0 = 0 -- handles (-0.0)
| x > 0 = x
| otherwise = negateFloat x
```
But 3-4 years ago it was noted that this was inefficient:
https://mail.haskell.org/pipermail/libraries/2013-April/019690.html
We can generate better code for X86 and llvm and for others generate
some custom cmm code which is similar to what the compiler generates
now.
Reviewers: austin, simonmar, hvr, bgamari
Reviewed By: bgamari
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D3265
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Opcode lwa is a 64-bit opcode and allows a DS-form only. This patch
generates lwa opcodes only when the offset is a multiple of 4.
Fixes #12621
Test Plan: validate
Reviewers: erikd, hvr, simonmar, austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2547
GHC Trac Issues: #12621
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the ABI specifications a minimal stack frame consists
of a header and a minimum size parameter save area. We reserve the
minimal size for each ABI.
On PowerPC 64-bil Linux and AIX the parameter save area can accomodate
up to eight parameters. So calls with eight parameters and fewer
can be done without allocating a new stack frame and deallocating
that stack frame after the call. On AIX one additional spill slot
is available on the stack.
Code size for all nofib benchmarks is 0.3 % smaller on powerpc64.
Test Plan: validate on AIX
Reviewers: hvr!, erikd, austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2445
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a call to a fixed function the TOC does not need to be saved.
The linker handles TOC saving.
Refactor TOC handling by folding the two functions toc_before and
toc_after into the code generating the call sequence. This saves
repeating the case distinction in those two functions.
Test Plan: validate on PowerPC 32-bit Linux and AIX
Reviewers: hvr, simonmar, austin, erikd, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux 64-bit PowerPC the first 13 floating point parameters are
passed in registers. We only passed the first 8 floating point params.
The alignment of a floating point single precision value in ELF v1.9 is
the second word of a doubleword. For ELF v2 we support only little
endian and the least significant word of a doubleword is the first word,
so no special handling is required.
Add a regression test.
Test Plan: validate on powerpc Linux and AIX
Reviewers: erikd, hvr, austin, simonmar, bgamari
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2327
GHC Trac Issues: #12134
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate a clrr[wd]i instruction to clear the tag bits in a pointer.
This saves one instruction and one temporary register.
Optimize signed comparison with zero after andi. operation This saves
one instruction when comparing a pointer tag with zero.
This reduces code size by 0.6 % in all nofib benchmarks.
Test Plan: validate on AIX and 32-bit Linux
Reviewed By: erikd, hvr
Differential Revision: https://phabricator.haskell.org/D2093
|
|
|
|
|
|
| |
Reviewed By: bgamari, trommler
Differential Revision: https://phabricator.haskell.org/D2020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the previous work to revive the unregisterised GHC build
for AIX/ppc32. Strictly speaking, AIX runs on POWER4 (and later)
hardware, but the PPC32 instructions implemented in the PPC NCG
represent a compatible subset of the POWER4 ISA.
IBM AIX follows the PowerOpen ABI (and shares many similiarites with the
Linux PPC64 ELF V1 NCG backend) but uses the rather limited XCOFF
format (compared to ELF).
This doesn't support dynamic libraries yet.
A major limiting factor is that the AIX assembler does not support the
`@ha`/`@l` relocation types nor the ha16()/lo16() functions Darwin's
assembler supports. Therefore we need to avoid emitting those. In case
of numeric literals we simply compute the functions ourselves, while for
labels we have to use local TOCs and hope everything fits into a 16bit
offset (for ppc32 this gives us at most 16384 entries per TOC section,
which is enough to compile GHC).
Another issue is that XCOFF doesn't seem to have a relocation type for
label-differences, and therefore the label-differences placed into
tables-next-to-code can't be relocated, but the linker may rearrange
different sections, so we need to place all read-only sections into the
same `.text[PR]` section to workaround this.
Finally, the PowerOpen ABI distinguishes between function-descriptors
and actualy entry-point addresses. For AIX we need to be specific when
emitting assembler code whether we want the address of the function
descriptor `printf`) or for the entry-point (`.printf`). So we let the
asm pretty-printer prefix a dot to all emitted subroutine
calls (i.e. `BL`) on AIX only. For now, STG routines' entry-point labels
are not prefixed by a label and don't have any associated
function-descriptor.
Reviewers: austin, trommler, erikd, bgamari
Reviewed By: trommler, erikd, bgamari
Differential Revision: https://phabricator.haskell.org/D2019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a flag -split-sections that does similar things to
-split-objs, but using sections in single object files instead of
relying on the Satanic Splitter and other abominations. This is very
similar to the GCC flags -ffunction-sections and -fdata-sections.
The --gc-sections linker flag, which allows unused sections to actually
be removed, is added to all link commands (if the linker supports it) so
that space savings from having base compiled with sections can be
realized.
Supported both in LLVM and the native code-gen, in theory for all
architectures, but really tested on x86 only.
In the GHC build, a new SplitSections variable enables -split-sections
for relevant parts of the build.
Test Plan: validate with both settings of SplitSections
Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari
Reviewed By: simonmar, thomie, bgamari
Subscribers: hsyl20, erikd, kgardas, thomie
Differential Revision: https://phabricator.haskell.org/D1242
GHC Trac Issues: #8405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a subWordC# primop which implements subtraction with overflow
reporting.
Reviewers: tibbe, goldfire, rwbarton, bgamari, austin, hvr
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1334
GHC Trac Issues: #10962
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows the code generator to give hints to later code generation
steps about which branch is most likely to be taken. Right now it
is only taken into account in one place: a special case in
CmmContFlowOpt that swapped branches over to maximise the chance of
fallthrough, which is now disabled when there is a likelihood setting.
Test Plan: validate
Reviewers: austin, simonpj, bgamari, ezyang, tibbe
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1273
|