| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
submodule updates: nofib, haddock
|
| |
|
| |
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
* makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
behavior in 32bit haskell code
* removes the 80bit floating point representation from the supported float sizes
* theres still 1 tiny bit of x87 support needed,
for handling float and double return values in FFI calls wrt the C ABI on x86_32,
but this one piece does not leak into the rest of NCG.
* Lots of code thats not been touched in a long time got deleted as a
consequence of all of this
all in all, this change paves the way towards a lot of future further
improvements in how GHC handles floating point computations, along with
making the native code gen more accessible to a larger pool of contributors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We make liveness information for global registers
available on `JMP` and `BCTR`, which were the last instructions
missing. With complete liveness information we do not need to
reserve global registers in `freeReg` anymore. Moreover we
assign R9 and R10 to callee saves registers.
Cleanup by removing `Reg_Su`, which was unused, from `freeReg`
and removing unused register definitions.
The calculation of the number of floating point registers is too
conservative. Just follow X86 and specify the constants directly.
Overall on PowerPC this results in 0.3 % smaller code size in nofib
while runtime is slightly better in some tests.
|
| |
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes it possible to generate a static 32-bit relative label
offset on x86_64. Currently we can only generate word-sized label
offsets.
This will be used in D4634 to shrink info tables. See D4632 for more
details.
Test Plan: See D4632
Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4633
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux 64-bit PowerPC the first 13 floating point parameters are
passed in registers. We only passed the first 8 floating point params.
The alignment of a floating point single precision value in ELF v1.9 is
the second word of a doubleword. For ELF v2 we support only little
endian and the least significant word of a doubleword is the first word,
so no special handling is required.
Add a regression test.
Test Plan: validate on powerpc Linux and AIX
Reviewers: erikd, hvr, austin, simonmar, bgamari
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2327
GHC Trac Issues: #12134
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the previous work to revive the unregisterised GHC build
for AIX/ppc32. Strictly speaking, AIX runs on POWER4 (and later)
hardware, but the PPC32 instructions implemented in the PPC NCG
represent a compatible subset of the POWER4 ISA.
IBM AIX follows the PowerOpen ABI (and shares many similiarites with the
Linux PPC64 ELF V1 NCG backend) but uses the rather limited XCOFF
format (compared to ELF).
This doesn't support dynamic libraries yet.
A major limiting factor is that the AIX assembler does not support the
`@ha`/`@l` relocation types nor the ha16()/lo16() functions Darwin's
assembler supports. Therefore we need to avoid emitting those. In case
of numeric literals we simply compute the functions ourselves, while for
labels we have to use local TOCs and hope everything fits into a 16bit
offset (for ppc32 this gives us at most 16384 entries per TOC section,
which is enough to compile GHC).
Another issue is that XCOFF doesn't seem to have a relocation type for
label-differences, and therefore the label-differences placed into
tables-next-to-code can't be relocated, but the linker may rearrange
different sections, so we need to place all read-only sections into the
same `.text[PR]` section to workaround this.
Finally, the PowerOpen ABI distinguishes between function-descriptors
and actualy entry-point addresses. For AIX we need to be specific when
emitting assembler code whether we want the address of the function
descriptor `printf`) or for the entry-point (`.printf`). So we let the
asm pretty-printer prefix a dot to all emitted subroutine
calls (i.e. `BL`) on AIX only. For now, STG routines' entry-point labels
are not prefixed by a label and don't have any associated
function-descriptor.
Reviewers: austin, trommler, erikd, bgamari
Reviewed By: trommler, erikd, bgamari
Differential Revision: https://phabricator.haskell.org/D2019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement access to spill slots at offsets larger than 16 bits.
Also allocation and deallocation of spill slots was restricted to
16 bit offsets. Now 32 bit offsets are supported on all PowerPC
platforms.
The implementation of 32 bit offsets requires more than one instruction
but the native code generator wants one instruction. So we implement
pseudo-instructions that are pretty printed into multiple assembly
instructions.
With pseudo-instructions for spill slot allocation and deallocation
we can also implement handling of the back chain pointer according
to the ELF ABIs.
Test Plan: validate (especially on powerpc (32 bit))
Reviewers: bgamari, austin, erikd
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1296
GHC Trac Issues: #7830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverses some of the work done in #1405, and goes back to the
assumption that the bootstrap compiler understands GHC-haskell.
In particular:
* use MagicHash instead of _ILIT and _CLIT
* pattern matching on I# if possible, instead of using iUnbox
unnecessarily
* use Int#/Char#/Addr# instead of the following type synonyms:
- type FastInt = Int#
- type FastChar = Char#
- type FastPtr a = Addr#
* inline the following functions:
- iBox = I#
- cBox = C#
- fastChr = chr#
- fastOrd = ord#
- eqFastChar = eqChar#
- shiftLFastInt = uncheckedIShiftL#
- shiftR_FastInt = uncheckedIShiftRL#
- shiftRLFastInt = uncheckedIShiftRL#
* delete the following unused functions:
- minFastInt
- maxFastInt
- uncheckedIShiftRA#
- castFastPtr
- panicDocFastInt and pprPanicFastInt
* rename panicFastInt back to panic#
These functions remain, since they actually do something:
* iUnbox
* bitAndFastInt
* bitOrFastInt
Test Plan: validate
Reviewers: austin, bgamari
Subscribers: rwbarton
Differential Revision: https://phabricator.haskell.org/D1141
GHC Trac Issues: #1405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverses some of the work done in Trac #1405, and assumes GHC is
smart enough to do its own unboxing of booleans now.
I would like to do some more performance measurements, but the code
changes can be reviewed already.
Test Plan:
With a perf build:
./inplace/bin/ghc-stage2 nofib/spectral/simple/Main.hs -fforce-recomp
+RTS -t --machine-readable
before:
```
[("bytes allocated", "1300744864")
,("num_GCs", "302")
,("average_bytes_used", "8811118")
,("max_bytes_used", "24477464")
,("num_byte_usage_samples", "9")
,("peak_megabytes_allocated", "64")
,("init_cpu_seconds", "0.001")
,("init_wall_seconds", "0.001")
,("mutator_cpu_seconds", "2.833")
,("mutator_wall_seconds", "4.283")
,("GC_cpu_seconds", "0.960")
,("GC_wall_seconds", "0.961")
]
```
after:
```
[("bytes allocated", "1301088064")
,("num_GCs", "310")
,("average_bytes_used", "8820253")
,("max_bytes_used", "24539904")
,("num_byte_usage_samples", "9")
,("peak_megabytes_allocated", "64")
,("init_cpu_seconds", "0.001")
,("init_wall_seconds", "0.001")
,("mutator_cpu_seconds", "2.876")
,("mutator_wall_seconds", "4.474")
,("GC_cpu_seconds", "0.965")
,("GC_wall_seconds", "0.979")
]
```
CPU time seems to be up a bit, but I'm not sure. Unfortunately CPU time
measurements are rather noisy.
Reviewers: austin, bgamari, rwbarton
Subscribers: nomeata
Differential Revision: https://phabricator.haskell.org/D1143
GHC Trac Issues: #1405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit renames the Size module in the native code generator to
Format, as proposed by a todo, as well as adjusting parameter names in
other modules that use it.
Test Plan: validate
Reviewers: austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: bgamari, simonmar, thomie
Projects: #ghc
Differential Revision: https://phabricator.haskell.org/D865
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the PowerPC 32-bit native code generator for "64-bit
PowerPC ELF Application Binary Interface Supplement 1.9" by
Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification --
OpenPOWER ABI for Linux Supplement" by IBM.
The latter ABI is mainly used on POWER7/7+ and POWER8
Linux systems running in little-endian mode. The code generator
supports both static and dynamic linking. PowerPC 64-bit
code for ELF ABI 1.9 and 2 is mostly position independent
anyway, and thus so is all the code emitted by the code
generator. In other words, -fPIC does not make a difference.
rts/stg/SMP.h support is implemented.
Following the spirit of the introductory comment in
PPC/CodeGen.hs, the rest of the code is a straightforward
extension of the 32-bit implementation.
Limitations:
* Code is generated only in the medium code model, which
is also gcc's default
* Local symbols are not accessed directly, which seems to
also be the case for 32-bit
* LLVM does not work, but this does not work on 32-bit either
* Must use the system runtime linker in GHCi, because the
GHC linker for "static" object files (rts/Linker.c) for
PPC 64-bit is not implemented. The system runtime
(dynamic) linker works.
* The handling of the system stack (register 1) is not ELF-
compliant so stack traces break. Instead of allocating a new
stack frame, spill code should use the "official" spill area
in the current stack frame and deallocation code should restore
the back chain
* DWARF support is missing
Fixes #9863
Test Plan: validate (on powerpc, too)
Reviewers: simonmar, trofi, erikd, austin
Reviewed By: trofi
Subscribers: bgamari, arnons1, kgardas, thomie
Differential Revision: https://phabricator.haskell.org/D629
GHC Trac Issues: #9863
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
And fix things all the way down to it. Namely:
- remove 'r30' from free registers, it's an .LCTOC1 register
for gcc. generated .plt stubs expect it to be initialised.
- fix PicBase computation, which originally forgot to use 'tmp'
reg in 'initializePicBase_ppc.fetchPC'
- mark 'ForeighTarget's as implicitly using 'PicBase' register
(see comment for details)
- add 64-bit MO_Sub and test on alloclimit3/4 regtests
- fix dynamic label offsets to match with .LCTOC1 offset
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
Test Plan: validate passes equal amount of vanilla/dyn tests
Reviewers: simonmar, erikd, austin
Reviewed By: erikd, austin
Subscribers: carter, thomie
Differential Revision: https://phabricator.haskell.org/D560
GHC Trac Issues: #8024, #9831
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the OldCmm data type and the CmmCvt pass that converts
new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been
converted to consume new Cmm.
The main difference between the two data types is that conditional
branches in new Cmm have both true/false successors, whereas in OldCmm
the false case was a fallthrough. To generate slightly better code we
occasionally need to invert a conditional to ensure that the
branch-not-taken becomes a fallthrough; this was previously done in
CmmCvt, and it is now done in CmmContFlowOpt.
We could go further and use the Hoopl Block representation for native
code, which would mean that we could use Hoopl's postorderDfs and
analyses for native code, but for now I've left it as is, using the
old ListGraph representation for native code.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
HaskellMachRegs.h is no longer included in anything under compiler/
Also, includes/CodeGen.Platform.hs now includes "stg/MachRegs.h"
rather than <stg/MachRegs.h> which means that we always get the file
from the tree, rather than from the bootstrapping compiler.
|
|
|
|
| |
The CmmBlocks inside CmmExprs were not getting the PIC treatment
|
| |
|
| |
|
|
|
|
| |
No functional differences yet
|
|
|
|
| |
This avoid lots of converting back and forth between the two types.
|
| |
|
|
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the new code generator to make use of the Hoopl package
for dataflow analysis. Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch. I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
- Update new codegen to work with latest Hoopl
- Add some notes on new code gen to cmm-notes
- Enable Hoopl lag package.
- Add SPJ note to cmm-notes
- Improve GC calls on new code generator.
Work in this branch was done by:
- Milan Straka <fox@ucw.cz>
- John Dias <dias@cs.tufts.edu>
- David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was done as part of an honours thesis at UNSW, the paper describing the
work and results can be found at:
http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
A Homepage for the backend can be found at:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM
Quick summary of performance is that for the 'nofib' benchmark suite, runtimes
are within 5% slower than the NCG and generally better than the C code
generator. For some code though, such as the DPH projects benchmark, the LLVM
code generator outperforms the NCG and C code generator by about a 25%
reduction in run times.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new flag -msse2 enables code generation for SSE2 on x86. It
results in substantially faster floating-point performance; the main
reason for doing this was that our x87 code generation is appallingly
bad, and since we plan to drop -fvia-C soon, we need a way to generate
half-decent floating-point code.
The catch is that SSE2 is only available on CPUs that support it (P4+,
AMD K8+). We'll have to think hard about whether we should enable it
by default for the libraries we ship. In the meantime, at least
-msse2 should be an acceptable replacement for "-fvia-C
-optc-ffast-math -fexcess-precision".
SSE2 also has the advantage of performing all operations at the
correct precision, so floating-point results are consistent with other
platforms.
I also tweaked the x87 code generation a bit while I was here, now
it's slighlty less bad than before.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The old Reg type is now split into VirtualReg and RealReg.
* For the graph coloring allocator, the type of the register graph
is now (Graph VirtualReg RegClass RealReg), which shows that it colors
in nodes representing virtual regs with colors representing real regs.
(as was intended)
* RealReg contains two contructors, RealRegSingle and RealRegPair,
where RealRegPair is used to represent a SPARC double reg
constructed from two single precision FP regs.
* On SPARC we can now allocate double regs into an arbitrary register
pair, instead of reserving some reg ranges to only hold float/double values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- nativeGen/Instruction defines a type class for a generic
instruction set. Each of the instruction sets we have,
X86, PPC and SPARC are instances of it.
- The register alloctors use this type class when they need
info about a certain register or instruction, such as
regUsage, mkSpillInstr, mkJumpInstr, patchRegs..
- nativeGen/Platform defines some data types enumerating
the architectures and operating systems supported by the
native code generator.
- DynFlags now keeps track of the current build platform, and
the PositionIndependentCode module uses this to decide what
to do instead of relying of #ifdefs.
- It's not totally retargetable yet. Some info info about the
build target is still hardwired, but I've tried to contain
most of it to a single module, TargetRegs.
- Moved the SPILL and RELOAD instructions into LiveInstr.
- Reg and RegClass now have their own modules, and are shared
across all architectures.
|
| |
|
| |
|
|
|