| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Fixes build problems on platforms for which we did not have
and Arch constructor.
|
|
|
|
|
|
|
|
| |
In particular, the "#error" for platforms without a NCG is gone,
which means the module should now build on all platforms again.
I'm not sure if this is the nicest way to handle multiple platforms
here, but it works for now.
|
|
|
|
| |
From Erik de Castro Lopo.
|
| |
|
|\
| |
| |
| | |
coloured-core
|
| |
| |
| |
| |
| |
| |
| | |
The SDoc type now passes around an abstract SDocContext rather than
just a PprStyle which required touching a few more files. This should
also make it easier to integrate DynFlags passing, so that we can get
rid of global variables.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
can't emit the ffrees before a conditional jump, because we don't want
to ffree the stack registers if the jump isn't taken (d'oh).
This commit fixes it properly, by moving the pass that inserts the
ffrees to *before* we do the jump-shortcutting which introduces the
conditional non-local jumps.
|
| |
| |
| |
| | |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| |
| |
| |
| | |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
We achieve this by splitting up instruction selection for case
switches into two parts: the actual code generation, and the
generation of the accompanying jump table. With this scheme,
the jump fixup code can modify the contents of the jump table
stored within the JMP_TBL (or BCTL) instruction, before the
actual data section is created.
SPARC and PPC patches are untested; they might not work!
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the new code generator to make use of the Hoopl package
for dataflow analysis. Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch. I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
- Update new codegen to work with latest Hoopl
- Add some notes on new code gen to cmm-notes
- Enable Hoopl lag package.
- Add SPJ note to cmm-notes
- Improve GC calls on new code generator.
Work in this branch was done by:
- Milan Straka <fox@ucw.cz>
- John Dias <dias@cs.tufts.edu>
- David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs.
|
| |
|
|
|
|
|
|
|
|
| |
Using Haskell conditionals means the compiler sees all the code, so
there should be less rot of code specific to uncommon arches. Code
for other platforms should still be optimised away, although if we want
to support targetting other arches then we'll need to compile it
for-real anyway.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* I've pushed the SPILL and RELOAD instrs down into the
LiveInstr type to make them easier to work with.
* When the graph allocator does a spill cycle it now just
re-annotates the LiveCmmTops instead of converting them
to NatCmmTops and back.
* This saves working out the SCCS again, and avoids rewriting
the SPILL and RELOAD meta instructions into real machine
instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was done as part of an honours thesis at UNSW, the paper describing the
work and results can be found at:
http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
A Homepage for the backend can be found at:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM
Quick summary of performance is that for the 'nofib' benchmark suite, runtimes
are within 5% slower than the NCG and generally better than the C code
generator. For some code though, such as the DPH projects benchmark, the LLVM
code generator outperforms the NCG and C code generator by about a 25%
reduction in run times.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The error messages eliminated are:
> compiler/nativeGen/AsmCodeGen.lhs:875:31:
> Not in scope: `mkRtsCodeLabel'
> compiler/nativeGen/AsmCodeGen.lhs:879:31:
> Not in scope: `mkRtsCodeLabel'
> compiler/nativeGen/AsmCodeGen.lhs:883:31:
> Not in scope: `mkRtsCodeLabel'
|
|
|
|
|
|
|
|
|
|
|
| |
The native back ends had difficulties with loops;
in particular the code for branch-chain elimination
could run in infinite loops or drop basic blocks.
The old codeGen didn't expose these problems.
Also, my fix for T3286 in the new codegen was getting
applied to too many (some wrong) cases; a better pattern
match fixed that.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The old Reg type is now split into VirtualReg and RealReg.
* For the graph coloring allocator, the type of the register graph
is now (Graph VirtualReg RegClass RealReg), which shows that it colors
in nodes representing virtual regs with colors representing real regs.
(as was intended)
* RealReg contains two contructors, RealRegSingle and RealRegPair,
where RealRegPair is used to represent a SPARC double reg
constructed from two single precision FP regs.
* On SPARC we can now allocate double regs into an arbitrary register
pair, instead of reserving some reg ranges to only hold float/double values.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- nativeGen/Instruction defines a type class for a generic
instruction set. Each of the instruction sets we have,
X86, PPC and SPARC are instances of it.
- The register alloctors use this type class when they need
info about a certain register or instruction, such as
regUsage, mkSpillInstr, mkJumpInstr, patchRegs..
- nativeGen/Platform defines some data types enumerating
the architectures and operating systems supported by the
native code generator.
- DynFlags now keeps track of the current build platform, and
the PositionIndependentCode module uses this to decide what
to do instead of relying of #ifdefs.
- It's not totally retargetable yet. Some info info about the
build target is still hardwired, but I've tried to contain
most of it to a single module, TargetRegs.
- Moved the SPILL and RELOAD instructions into LiveInstr.
- Reg and RegClass now have their own modules, and are shared
across all architectures.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed while working on the new IO library that GHC was writing out
the .s file in lots of little chunks. It turns out that this is a
result of using multiple printDocs to avoid space leaks in the NCG,
where each printDoc is finishing up with an hFlush.
What's worse, is that this makes poor use of the optimisation inside
printDoc that uses its own buffering to avoid hitting the Handle all
the time.
So I hacked around this by making the buffering optimisation inside
Pretty visible from the outside, for use in the NCG. The changes are
quite small.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merge does not turn on the new codegen (which only compiles
a select few programs at this point),
but it does introduce some changes to the old code generator.
The high bits:
1. The Rep Swamp patch is finally here.
The highlight is that the representation of types at the
machine level has changed.
Consequently, this patch contains updates across several back ends.
2. The new Stg -> Cmm path is here, although it appears to have a
fair number of bugs lurking.
3. Many improvements along the CmmCPSZ path, including:
o stack layout
o some code for infotables, half of which is right and half wrong
o proc-point splitting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eager blackholing can improve parallel performance by reducing the
chances that two threads perform the same computation. However, it
has a cost: one extra memory write per thunk entry.
To get the best results, any code which may be executed in parallel
should be compiled with eager blackholing turned on. But since
there's a cost for sequential code, we make it optional and turn it on
for the parallel package only. It might be a good idea to compile
applications (or modules) with parallel code in with
-feager-blackholing.
ToDo: document -feager-blackholing.
|
|
|
|
|
|
|
|
|
|
| |
The i386 native code generator has to arrange that the FPU stack is
clear on exit from any function that uses the FPU. Unfortunately it
was getting this wrong (and has been ever since this code was written,
I think): it was looking for basic blocks that used the FPU and adding
the code to clear the FPU stack on any non-local exit from the block.
In fact it should be doing this on a whole-function basis, rather than
individual basic blocks.
|
| |
|
|
|
|
|
|
| |
C-- no longer has 'hints'; to guide parameter passing, it
has 'kinds'. Renamed type constructor, data constructor, and record
fields accordingly
|
| |
|
|
|
|
|
| |
If these modules use UniqFM then we get a stack overflow when compiling
modules that use fundeps. I haven't tracked down the actual cause.
|
|
|
|
|
| |
If these modules use UniqFM then we get a stack overflow when compiling
modules that use fundeps. I haven't tracked down the actual cause.
|
|
|
|
|
|
|
| |
This allows the instance of UserOfLocalRegs to be within Haskell98, and IMHO
makes the code a little cleaner generally.
This is one small (though tedious) step towards making GHC's code more
portable...
|
|
|
|
| |
To help with debugging / nicer -ddump-asm-regalloc-stages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Iterative coalescing interleaves conservative coalesing with the regular
simplify/scan passes. This increases the chance that nodes will be coalesced
as they will have a lower degree than at the beginning of simplify. The end
result is that more register to register moves will be eliminated in the
output code, though the iterative nature of the algorithm makes it slower
compared to non-iterative coloring.
Use -fregs-iterative for graph coloring allocation with iterative coalescing
-fregs-graph for non-iterative coalescing.
The plan is for iterative coalescing to be enabled with -O2 and have a
quicker, non-iterative algorithm otherwise. The time/benefit tradeoff
between iterative and not is still being tuned - optimal graph coloring
is NP-hard, afterall..
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes too numerous to comment on, but here is some old history that
I saved:
Wed Aug 15 11:07:13 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* type synonyms made consistent with new Cmm types
M ./compiler/nativeGen/MachInstrs.hs -2 +2
Mon Aug 20 19:22:14 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* pushing return info beyond cmm into codegen
M ./compiler/codeGen/Bitmap.hs r3
M ./compiler/codeGen/CgBindery.lhs r3
M ./compiler/codeGen/CgCallConv.hs r3
M ./compiler/codeGen/CgCase.lhs r3
M ./compiler/codeGen/CgClosure.lhs r3
M ./compiler/codeGen/CgCon.lhs r3
M ./compiler/codeGen/CgExpr.lhs r3
M ./compiler/codeGen/CgForeignCall.hs -6 +7 r3
M ./compiler/codeGen/CgHeapery.lhs r3
M ./compiler/codeGen/CgHpc.hs +1 r3
M ./compiler/codeGen/CgInfoTbls.hs r3
M ./compiler/codeGen/CgLetNoEscape.lhs r3
M ./compiler/codeGen/CgMonad.lhs r3
M ./compiler/codeGen/CgParallel.hs r3
M ./compiler/codeGen/CgPrimOp.hs +3 r3
M ./compiler/codeGen/CgProf.hs r3
M ./compiler/codeGen/CgStackery.lhs r3
M ./compiler/codeGen/CgTailCall.lhs r3
M ./compiler/codeGen/CgTicky.hs r3
M ./compiler/codeGen/CgUtils.hs -1 +1 r3
M ./compiler/codeGen/ClosureInfo.lhs r3
M ./compiler/codeGen/CodeGen.lhs r3
M ./compiler/codeGen/SMRep.lhs r3
M ./compiler/nativeGen/AsmCodeGen.lhs -2 +2 r1
M ./compiler/nativeGen/MachCodeGen.hs -3 +3 r1
M ./compiler/nativeGen/MachInstrs.hs r1
M ./compiler/nativeGen/MachRegs.lhs r1
M ./compiler/nativeGen/NCGMonad.hs r1
M ./compiler/nativeGen/PositionIndependentCode.hs r1
M ./compiler/nativeGen/PprMach.hs r1
M ./compiler/nativeGen/RegAllocInfo.hs r1
M ./compiler/nativeGen/RegisterAlloc.hs r1
Mon Aug 20 20:54:41 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* put CmmReturnInfo into a CmmCall (and related types)
M ./compiler/cmm/Cmm.hs -2 +1 r3
M ./compiler/cmm/CmmBrokenBlock.hs -13 +12 r1
M ./compiler/cmm/CmmCPS.hs -3 +3
M ./compiler/cmm/CmmCPSGen.hs -8 +6 r1
M ./compiler/cmm/CmmLint.hs -1 +1
M ./compiler/cmm/CmmLive.hs -1 +1
M ./compiler/cmm/CmmOpt.hs -3 +3
M ./compiler/cmm/CmmParse.y -6 +6 r3
M ./compiler/cmm/PprC.hs -3 +3
M ./compiler/cmm/PprCmm.hs -7 +4 r2
M ./compiler/codeGen/CgForeignCall.hs -7 +6 r2
M ./compiler/codeGen/CgHpc.hs -1 r1
M ./compiler/codeGen/CgPrimOp.hs -3 r1
M ./compiler/codeGen/CgUtils.hs -1 +1 r1
M ./compiler/nativeGen/AsmCodeGen.lhs -2 +2
M ./compiler/nativeGen/MachCodeGen.hs -3 +3 r1
Tue Aug 21 18:09:13 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* add call info in nativeGen
M ./compiler/nativeGen/AsmCodeGen.lhs r1
M ./compiler/nativeGen/MachInstrs.hs r1
M ./compiler/nativeGen/MachRegs.lhs r1
M ./compiler/nativeGen/NCGMonad.hs r1
M ./compiler/nativeGen/PositionIndependentCode.hs r1
M ./compiler/nativeGen/PprMach.hs r1
M ./compiler/nativeGen/RegAllocInfo.hs r1
Wed Aug 22 16:41:58 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* ListGraph is now a newtype, not a synonym
The resultant bookkeepping is unenviable, but the change
greatly simplifies our ability to make Cmm things propertly
Outputable for both list-graph and zipper-graph representations.
M ./compiler/cmm/Cmm.hs -5 +3
M ./compiler/cmm/CmmCPS.hs -2 +2
M ./compiler/cmm/CmmCPSGen.hs -1 +1
M ./compiler/cmm/CmmContFlowOpt.hs -3 +3
M ./compiler/cmm/CmmCvt.hs -2 +2
M ./compiler/cmm/CmmInfo.hs -2 +3
M ./compiler/cmm/CmmLint.hs -1 +1
M ./compiler/cmm/CmmOpt.hs -2 +2
M ./compiler/cmm/PprC.hs -1 +1
M ./compiler/cmm/PprCmm.hs -5 +8
M ./compiler/cmm/PprCmmZ.hs -7 +1
M ./compiler/codeGen/CgMonad.lhs -1 +1
M ./compiler/nativeGen/AsmCodeGen.lhs -15 +15
M ./compiler/nativeGen/MachCodeGen.hs -2 +2
M ./compiler/nativeGen/PositionIndependentCode.hs -6 +6
M ./compiler/nativeGen/PprMach.hs -3 +2
M ./compiler/nativeGen/RegAllocColor.hs +1
M ./compiler/nativeGen/RegAllocLinear.hs -4 +5
M ./compiler/nativeGen/RegCoalesce.hs -6 +6
M ./compiler/nativeGen/RegLiveness.hs -12 +12
Thu Aug 23 13:44:49 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* diagnostic assistance in case fromJust fails
M ./compiler/nativeGen/MachCodeGen.hs -2 +5
Thu Aug 23 14:07:28 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* give every block, even the first, a label
With branch-chain elimination, the first block of a procedure
might be the target of a branch. This actually happens to
a dozen or more procedures in the run-time system.
M ./compiler/nativeGen/PprMach.hs -8 +3
Fri Aug 24 17:27:04 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* clean up the code in PprMach
M ./compiler/nativeGen/PprMach.hs -16 +14
Fri Aug 24 19:35:03 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* a bunch of impedance matching to get the compiler to build, plus
* the plus is diagnostics for unreachable code, which required
moving a lot of prettyprinting code
M ./compiler/cmm/Cmm.hs -7 +5
M ./compiler/cmm/CmmCPSZ.hs -1 +1
M ./compiler/cmm/CmmCvt.hs -8 +8
M ./compiler/cmm/CmmParse.y -4 +3
M ./compiler/cmm/MkZipCfg.hs -19 +9
M ./compiler/cmm/PprCmmZ.hs -118 +4
M ./compiler/cmm/ZipCfg.hs -1 +13
M ./compiler/cmm/ZipCfgCmm.hs -10 +129
M ./compiler/main/HscMain.lhs -4 +4
M ./compiler/nativeGen/NCGMonad.hs -2 +2
M ./compiler/nativeGen/RegAllocInfo.hs -3 +3
Fri Aug 31 14:38:02 BST 2007 Norman Ramsey <nr@eecs.harvard.edu>
* fix a warning about an import
M ./compiler/nativeGen/RegAllocColor.hs -1 +1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Testing whether a node in the conflict graph is trivially
colorable (triv) is still a somewhat expensive operation.
When we find a triv node during scanning, even though we remove
it and its edges from the graph, this is unlikely to to make the
nodes we've just scanned become triv - so there's not much point
re-scanning them right away.
Scanning now takes place in passes. We scan the whole graph for
triv nodes and remove all the ones found in a batch before rescanning
old nodes.
Register allocation for SHA1.lhs now takes (just) 40% of total
compile time with -O2 -fregs-graph on x86
|
|
|
|
|
|
|
|
|
| |
trivColorable was soaking up total 31% time, 41% alloc when
compiling SHA1.lhs with -O2 -fregs-graph on x86.
Refactoring to use unboxed accumulators and walk directly
over the UniqFM holding the set of conflicts reduces this
to 17% time, 6% alloc.
|
|
|
|
|
|
|
|
|
| |
The type parameter to a C-- procedure now represents a control-flow
graph, not a single instruction. The newtype ListGraph preserves the
current representation while enabling other representations and a
sensible way of prettyprinting. Except for a few changes in the
prettyprinter the new compiler binary should be bit-for-bit identical
to the old.
|