|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| | If these modules use UniqFM then we get a stack overflow when compiling
modules that use fundeps. I haven't tracked down the actual cause. | 
| | 
| 
| 
| 
| | If these modules use UniqFM then we get a stack overflow when compiling
modules that use fundeps. I haven't tracked down the actual cause. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | re-recording to avoid new conflicts was too hard, so I just put it
all in one big patch :-(  (besides, some of the changes depended on
each other.)  Here are what the component patches were:
Fri Dec 28 11:02:55 EST 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * document BreakArray better
Fri Dec 28 11:39:22 EST 2007  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * properly ifdef BreakArray for GHCI
Fri Jan  4 13:50:41 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * change ifs on __GLASGOW_HASKELL__ to account for... (#1405)
  for it not being defined. I assume it being undefined implies
  a compiler with relatively modern libraries but without most
  unportable glasgow extensions.
Fri Jan  4 14:21:21 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * MyEither-->EitherString to allow Haskell98 instance
Fri Jan  4 16:13:29 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * re-portabilize Pretty, and corresponding changes
Fri Jan  4 17:19:55 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * Augment FastTypes to be much more complete
Fri Jan  4 20:14:19 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * use FastFunctions, cleanup FastString slightly
Fri Jan  4 21:00:22 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * Massive de-"#", mostly Int# --> FastInt (#1405)
Fri Jan  4 21:02:49 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * miscellaneous unnecessary-extension-removal
Sat Jan  5 19:30:13 EST 2008  Isaac Dupree <id@isaac.cedarswampstudios.org>
  * add FastFunctions | 
| | 
| 
| 
| | Fixes building with -Werror (i.e. validate) and GHC < 6.6 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | If iterative coalescing isn't turned on, then do a single aggressive
coalescing pass for the first build/color cycle and then back off to 
conservative coalescing for subseqent passes.
Aggressive coalescing is a cheap way to eliminate lots of the reg-reg
moves, but it can make the graph less colorable - if we turn it on 
for every pass then allocation for code with a large amount of register
pressure (ie SHA1) doesn't converge in a sensible number of cycles. | 
| | 
| 
| 
| 
| 
| | Coalescences must be applied to the unsimplified graph in the same order 
they were found by the iterative coalescing algorithm - otherwise
the vreg rewrite mapping will be wrong and badness will ensue. | 
| | 
| 
| 
| 
| 
| 
| 
| | When -dasm-lint is turned on the register conflict graph is checked for 
internal consistency after each build/color pass. Make sure that all 
edges point to valid nodes, that nodes are colored differently to their
neighbours, and if the graph is supposed to be colored, that all nodes
actually have a color. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Use Chaitin's formula for calculation of spill costs.
  Cost to spill some vreg = (num writes + num reads) / degree of node
With 2 extra provisos:
  1) Don't spill vregs that live for only 1 instruction.
  2) Always prefer to spill vregs that live for a number of instructions
       more than 10 times the number of vregs in that class.
Proviso 2 is there to help deal with basic blocks containing very long
live ranges - SHA1 has live ranges > 1700 instructions. We don't ever
try to keep these long lived ranges in regs at the expense of others.
Because stack slots are allocated from a global pool, and there is no
slot coalescing yet, without this condition the allocation of SHA1 dosn't
converge fast enough and eventually runs out of stack slots.
Prior to this patch we were just choosing to spill the range with the
longest lifetime, so we didn't bump into this particular problem. | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Iterative coalescing interleaves conservative coalesing with the regular
simplify/scan passes. This increases the chance that nodes will be coalesced
as they will have a lower degree than at the beginning of simplify. The end
result is that more register to register moves will be eliminated in the
output code, though the iterative nature of the algorithm makes it slower
compared to non-iterative coloring.
Use -fregs-iterative  for graph coloring allocation with iterative coalescing
    -fregs-graph      for non-iterative coalescing.
The plan is for iterative coalescing to be enabled with -O2 and have a 
quicker, non-iterative algorithm otherwise. The time/benefit tradeoff
between iterative and not is still being tuned - optimal graph coloring
is NP-hard, afterall.. | 
| | 
| 
| 
| 
| 
| 
| 
| | We now do a deep seq on the graph after it is 'built', but before coloring.
Without this, the colorer will just force bits of it and the heap will
fill up with half evaluated pieces of graph from previous build/spill
stages and zillions of apply thunks. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Testing whether a node in the conflict graph is trivially 
colorable (triv) is still a somewhat expensive operation.
When we find a triv node during scanning, even though we remove
it and its edges from the graph, this is unlikely to to make the
nodes we've just scanned become triv - so there's not much point
re-scanning them right away.
Scanning now takes place in passes. We scan the whole graph for
triv nodes and remove all the ones found in a batch before rescanning
old nodes.
Register allocation for SHA1.lhs now takes (just) 40% of total
compile time with -O2 -fregs-graph on x86 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | trivColorable was soaking up total 31% time, 41% alloc when
compiling SHA1.lhs with -O2 -fregs-graph on x86.
Refactoring to use unboxed accumulators and walk directly
over the UniqFM holding the set of conflicts reduces this 
to 17% time, 6% alloc. | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| | Older GHCs can't parse OPTIONS_GHC.
This also changes the URL referenced for the -w options from
WorkingConventions#Warnings to CodingStyle#Warnings for the compiler
modules. | 
| | 
| 
| 
| | Conservative and iterative coalescing come next. | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| | Spiller can now insert spill/reload instrs without having to
worry about the current stack delta. Generation of actual machine
instructions for spills/reloads is deferred until after register
allocation proper. | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
|  | Refactored linear allocator into separate liveness annotation and allocation stages.
Added graph coloring allocator, use -fregs-graph to enable.
  New dump flags are
    -ddump-asm-native          -- output of cmm -> native transform.
    -ddump-asm-liveness        -- code annotated with register liveness info
    -ddump-asm-coalesce        -- output of register move coalescing
                                    (this is a separate pass when using the coloring allocator)
                                    (this could change in the future)
    -ddump-asm-regalloc        -- code after register allocation
    -ddump-asm-regalloc-stages -- blocks after each build/spill stage of coloring allocator
    -ddump-asm-conflicts       -- a global register liveness graph in graphviz format 
        
The new register allocator will allocate some registers, but it's not
quite ready for prime-time yet. The spill code generator needs some work... |