| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Formerly we punted on these and evaluated constructors always got a tag
of 1.
We now cascade switches because we have to check the tag first and when
it is MAX_PTR_TAG then get the precise tag from the info table and
switch on that. The only technically tricky part is that the default
case needs (logical) duplication. To do this we emit an extra label for
it and branch to that from the second switch. This avoids duplicated
codegen.
Here's a simple example of the new code gen:
data D = D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8
On a 64-bit system previously all constructors would be tagged 1. With
the new code gen D7 and D8 are tagged 7:
[Lib.D7_con_entry() {
...
{offset
c1eu: // global
R1 = R1 + 7;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
}]
[Lib.D8_con_entry() {
...
{offset
c1ez: // global
R1 = R1 + 7;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
}]
When switching we now look at the info table only when the tag is 7. For
example, if we derive Enum for the type above, the Cmm looks like this:
c2Le:
_s2Js::P64 = R1;
_c2Lq::P64 = _s2Js::P64 & 7;
switch [1 .. 7] _c2Lq::P64 {
case 1 : goto c2Lk;
case 2 : goto c2Ll;
case 3 : goto c2Lm;
case 4 : goto c2Ln;
case 5 : goto c2Lo;
case 6 : goto c2Lp;
case 7 : goto c2Lj;
}
// Read info table for tag
c2Lj:
_c2Lv::I64 = %MO_UU_Conv_W32_W64(I32[I64[_s2Js::P64 & (-8)] - 4]);
if (_c2Lv::I64 != 6) goto c2Lu; else goto c2Lt;
Generated Cmm sizes do not change too much, but binaries are very
slightly larger, due to the fact that the new instructions are longer in
encoded form. E.g. previously entry code for D8 above would be
00000000000001c0 <Lib_D8_con_info>:
1c0: 48 ff c3 inc %rbx
1c3: ff 65 00 jmpq *0x0(%rbp)
With this patch
00000000000001d0 <Lib_D8_con_info>:
1d0: 48 83 c3 07 add $0x7,%rbx
1d4: ff 65 00 jmpq *0x0(%rbp)
This is one byte longer.
Secondly, reading info table directly and then switching is shorter
_c1co:
movq -1(%rbx),%rax
movl -4(%rax),%eax
// Switch on info table tag
jmp *_n1d5(,%rax,8)
than doing the same switch, and then for the tag 7 doing another switch:
// When tag is 7
_c1ct:
andq $-8,%rbx
movq (%rbx),%rax
movl -4(%rax),%eax
// Switch on info table tag
...
Some changes of binary sizes in actual programs:
- In NoFib the worst case is 0.1% increase in benchmark "parser" (see
NoFib results below). All programs get slightly larger.
- Stage 2 compiler size does not change.
- In "containers" (the library) size of all object files increases
0.0005%. Size of the test program "bitqueue-properties" increases
0.03%.
nofib benchmarks kindly provided by Ömer (@osa1):
NoFib Results
=============
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS +0.0% 0.0% -0.0% -0.0% -0.0%
CSD +0.0% 0.0% 0.0% +0.0% +0.0%
FS +0.0% 0.0% 0.0% +0.0% 0.0%
S +0.0% 0.0% -0.0% 0.0% 0.0%
VS +0.0% 0.0% -0.0% +0.0% +0.0%
VSD +0.0% 0.0% -0.0% +0.0% -0.0%
VSM +0.0% 0.0% 0.0% 0.0% 0.0%
anna +0.0% 0.0% +0.1% -0.9% -0.0%
ansi +0.0% 0.0% -0.0% +0.0% +0.0%
atom +0.0% 0.0% 0.0% 0.0% 0.0%
awards +0.0% 0.0% -0.0% +0.0% 0.0%
banner +0.0% 0.0% -0.0% +0.0% 0.0%
bernouilli +0.0% 0.0% +0.0% +0.0% +0.0%
binary-trees +0.0% 0.0% -0.0% -0.0% -0.0%
boyer +0.0% 0.0% +0.0% 0.0% -0.0%
boyer2 +0.0% 0.0% +0.0% 0.0% -0.0%
bspt +0.0% 0.0% +0.0% +0.0% 0.0%
cacheprof +0.0% 0.0% +0.1% -0.8% 0.0%
calendar +0.0% 0.0% -0.0% +0.0% -0.0%
cichelli +0.0% 0.0% +0.0% 0.0% 0.0%
circsim +0.0% 0.0% -0.0% -0.1% -0.0%
clausify +0.0% 0.0% +0.0% +0.0% 0.0%
comp_lab_zift +0.0% 0.0% +0.0% 0.0% -0.0%
compress +0.0% 0.0% +0.0% +0.0% 0.0%
compress2 +0.0% 0.0% 0.0% 0.0% 0.0%
constraints +0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm1 +0.0% 0.0% +0.0% 0.0% 0.0%
cryptarithm2 +0.0% 0.0% +0.0% -0.0% 0.0%
cse +0.0% 0.0% +0.0% +0.0% 0.0%
digits-of-e1 +0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e2 +0.0% 0.0% +0.0% -0.0% -0.0%
dom-lt +0.0% 0.0% +0.0% +0.0% 0.0%
eliza +0.0% 0.0% -0.0% +0.0% 0.0%
event +0.0% 0.0% -0.0% -0.0% -0.0%
exact-reals +0.0% 0.0% +0.0% +0.0% +0.0%
exp3_8 +0.0% 0.0% -0.0% -0.0% -0.0%
expert +0.0% 0.0% +0.0% +0.0% +0.0%
fannkuch-redux +0.0% 0.0% +0.0% 0.0% 0.0%
fasta +0.0% 0.0% -0.0% -0.0% -0.0%
fem +0.0% 0.0% +0.0% +0.0% +0.0%
fft +0.0% 0.0% +0.0% -0.0% -0.0%
fft2 +0.0% 0.0% +0.0% +0.0% +0.0%
fibheaps +0.0% 0.0% +0.0% +0.0% 0.0%
fish +0.0% 0.0% +0.0% +0.0% 0.0%
fluid +0.0% 0.0% +0.0% +0.0% +0.0%
fulsom +0.0% 0.0% +0.0% -0.0% +0.0%
gamteb +0.0% 0.0% +0.0% -0.0% -0.0%
gcd +0.0% 0.0% +0.0% +0.0% 0.0%
gen_regexps +0.0% 0.0% +0.0% -0.0% -0.0%
genfft +0.0% 0.0% -0.0% -0.0% -0.0%
gg +0.0% 0.0% 0.0% -0.0% 0.0%
grep +0.0% 0.0% +0.0% +0.0% +0.0%
hidden +0.0% 0.0% +0.0% -0.0% -0.0%
hpg +0.0% 0.0% +0.0% -0.1% -0.0%
ida +0.0% 0.0% +0.0% -0.0% -0.0%
infer +0.0% 0.0% -0.0% -0.0% -0.0%
integer +0.0% 0.0% -0.0% -0.0% -0.0%
integrate +0.0% 0.0% 0.0% +0.0% 0.0%
k-nucleotide +0.0% 0.0% -0.0% -0.0% -0.0%
kahan +0.0% 0.0% -0.0% -0.0% -0.0%
knights +0.0% 0.0% +0.0% -0.0% -0.0%
lambda +0.0% 0.0% +1.2% -6.1% -0.0%
last-piece +0.0% 0.0% +0.0% -0.0% -0.0%
lcss +0.0% 0.0% +0.0% -0.0% -0.0%
life +0.0% 0.0% +0.0% -0.0% -0.0%
lift +0.0% 0.0% +0.0% +0.0% 0.0%
linear +0.0% 0.0% +0.0% +0.0% +0.0%
listcompr +0.0% 0.0% -0.0% -0.0% -0.0%
listcopy +0.0% 0.0% -0.0% -0.0% -0.0%
maillist +0.0% 0.0% +0.0% -0.0% -0.0%
mandel +0.0% 0.0% +0.0% +0.0% +0.0%
mandel2 +0.0% 0.0% +0.0% +0.0% -0.0%
mate +0.0% 0.0% +0.0% +0.0% +0.0%
minimax +0.0% 0.0% -0.0% +0.0% -0.0%
mkhprog +0.0% 0.0% +0.0% +0.0% +0.0%
multiplier +0.0% 0.0% 0.0% +0.0% -0.0%
n-body +0.0% 0.0% +0.0% -0.0% -0.0%
nucleic2 +0.0% 0.0% +0.0% +0.0% -0.0%
para +0.0% 0.0% +0.0% +0.0% +0.0%
paraffins +0.0% 0.0% +0.0% +0.0% +0.0%
parser +0.1% 0.0% +0.4% -1.7% -0.0%
parstof +0.0% 0.0% -0.0% -0.0% -0.0%
pic +0.0% 0.0% +0.0% 0.0% -0.0%
pidigits +0.0% 0.0% -0.0% -0.0% -0.0%
power +0.0% 0.0% +0.0% -0.0% -0.0%
pretty +0.0% 0.0% +0.0% +0.0% +0.0%
primes +0.0% 0.0% +0.0% 0.0% 0.0%
primetest +0.0% 0.0% +0.0% +0.0% +0.0%
prolog +0.0% 0.0% +0.0% +0.0% +0.0%
puzzle +0.0% 0.0% +0.0% +0.0% +0.0%
queens +0.0% 0.0% 0.0% +0.0% +0.0%
reptile +0.0% 0.0% +0.0% +0.0% 0.0%
reverse-complem +0.0% 0.0% -0.0% -0.0% -0.0%
rewrite +0.0% 0.0% +0.0% 0.0% -0.0%
rfib +0.0% 0.0% +0.0% +0.0% +0.0%
rsa +0.0% 0.0% +0.0% +0.0% +0.0%
scc +0.0% 0.0% +0.0% +0.0% +0.0%
sched +0.0% 0.0% +0.0% +0.0% +0.0%
scs +0.0% 0.0% +0.0% +0.0% 0.0%
simple +0.0% 0.0% +0.0% +0.0% +0.0%
solid +0.0% 0.0% +0.0% +0.0% 0.0%
sorting +0.0% 0.0% +0.0% -0.0% 0.0%
spectral-norm +0.0% 0.0% -0.0% -0.0% -0.0%
sphere +0.0% 0.0% +0.0% -1.0% 0.0%
symalg +0.0% 0.0% +0.0% +0.0% +0.0%
tak +0.0% 0.0% +0.0% +0.0% +0.0%
transform +0.0% 0.0% +0.4% -1.3% +0.0%
treejoin +0.0% 0.0% +0.0% -0.0% 0.0%
typecheck +0.0% 0.0% -0.0% +0.0% 0.0%
veritas +0.0% 0.0% +0.0% -0.1% +0.0%
wang +0.0% 0.0% +0.0% +0.0% +0.0%
wave4main +0.0% 0.0% +0.0% 0.0% -0.0%
wheel-sieve1 +0.0% 0.0% +0.0% +0.0% +0.0%
wheel-sieve2 +0.0% 0.0% +0.0% +0.0% 0.0%
x2n1 +0.0% 0.0% +0.0% +0.0% 0.0%
--------------------------------------------------------------------------------
Min +0.0% 0.0% -0.0% -6.1% -0.0%
Max +0.1% 0.0% +1.2% +0.0% +0.0%
Geometric Mean +0.0% -0.0% +0.0% -0.1% -0.0%
NoFib GC Results
================
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim +0.0% 0.0% -0.0% -0.0% -0.0%
constraints +0.0% 0.0% -0.0% 0.0% -0.0%
fibheaps +0.0% 0.0% 0.0% -0.0% -0.0%
fulsom +0.0% 0.0% 0.0% -0.6% -0.0%
gc_bench +0.0% 0.0% 0.0% 0.0% -0.0%
hash +0.0% 0.0% -0.0% -0.0% -0.0%
lcss +0.0% 0.0% 0.0% -0.0% 0.0%
mutstore1 +0.0% 0.0% 0.0% -0.0% -0.0%
mutstore2 +0.0% 0.0% +0.0% -0.0% -0.0%
power +0.0% 0.0% -0.0% 0.0% -0.0%
spellcheck +0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min +0.0% 0.0% -0.0% -0.6% -0.0%
Max +0.0% 0.0% +0.0% 0.0% 0.0%
Geometric Mean +0.0% +0.0% +0.0% -0.1% +0.0%
Fixes #14373
These performance regressions appear to be a fluke in CI. See the
discussion in !1742 for details.
Metric Increase:
T6048
T12234
T12425
Naperian
T12150
T5837
T13035
|
|
|
|
|
|
| |
[skip ci]
This should really be caught by the linters! (#16711)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Smaller than word size integers must be promoted to word size
when passed on the stack. While on little endian systems we can
get away with writing a small integer to a word size stack slot
and read it as a word ignoring the upper bits, on big endian
systems a small integer write ends up in the most significant
bits and a word size read that ignores the upper bits delivers
a random value.
On little endian systems a smaller than word size write to
the stack might be more efficient but that decision is
system specific and should be done as an optimization in the
respective backends.
Fixes #16258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
- `primops.txt.pp` gets two new sections for two new primitive types
for signed and unsigned 8-bit integers (`Int8#` and `Word8`
respectively) along with basic arithmetic and comparison
operations. `PrimRep`/`RuntimeRep` get two new constructors for
them. All of the primops translate into the existing `MachOP`s.
- For `CmmCall`s the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then
truncate them back their original width.
- x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate with new tests
Reviewers: hvr, goldfire, bgamari, simonmar
Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes a bunch of unnecessary includes of `HsVersions.h` along
with unnecessary CPP (e.g., due to checking for DEBUG which can be
achieved by looking at `debugIsOn`)
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4462
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: ./validate
Reviewers: goldfire, bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4367
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And use to mark `stg_stack_underflow_frame`, which we are unable to
determine a caller from.
To simplify parsing at the moment we steal the `return` keyword to
indicate an undefined unwind value. Perhaps this should be revisited.
Reviewers: scpmw, simonmar, austin, erikd
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D2738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack
unwinding information produced by GHC is currently quite approximate.
Essentially we assume that register values do not change at all within a
basic block. While this is somewhat true in normal Haskell code, blocks
containing foreign calls often break this assumption. This results in
unreliable call stacks, especially in the code containing foreign calls.
This is worse than it sounds as unreliable unwinding information can at
times result in segmentation faults.
This patch set attempts to improve this situation by tracking unwinding
information with finer granularity. By dispensing with the assumption of
one unwinding table per block, we allow the compiler to accurately
represent the areas surrounding foreign calls.
Towards this end we generalize the representation of unwind information
in the backend in three ways,
* Multiple CmmUnwind nodes can occur per block
* CmmUnwind nodes can now carry unwind information for multiple
registers (while not strictly necessary; this makes emitting
unwinding information a bit more convenient in the compiler)
* The NCG backend is given an opportunity to modify the unwinding
records since it may need to make adjustments due to, for instance,
native calling convention requirements for foreign calls (see
#11353).
This sets the stage for resolving #11337 and #11338.
Test Plan: Validate
Reviewers: scpmw, simonmar, austin, erikd
Subscribers: qnikst, thomie
Differential Revision: https://phabricator.haskell.org/D2741
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind adding special "rubbish" arguments was in unboxed sum types
depending on the tag some arguments are not used and we don't want to move some
special values (like 0 for literals and some special pointer for boxed slots)
for those arguments (to stack locations or registers). "StgRubbishArg" was an
indicator to the code generator that the value won't be used. During Stg-to-Cmm
we were then not generating any move or store instructions at all.
This caused problems in the register allocator because some variables were only
initialized in some code paths. As an example, suppose we have this STG: (after
unarise)
Lib.$WT =
\r [dt_sit]
case
case dt_sit of {
Lib.F dt_siv [Occ=Once] ->
(#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#];
Lib.I dt_siw [Occ=Once] ->
(#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw];
}
of
dt_six
{ (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE];
};
This basically unpacks a sum type to an unboxed sum with 3 fields, and then
moves the unboxed sum to a constructor (`Lib.T`).
This is the Cmm for the inner case expression (case expression in the scrutinee
position of the outer case):
ciN:
...
-- look at dt_sit's tag
if (_ciT::P64 != 1) goto ciS; else goto ciR;
ciS: -- Tag is 2, i.e. Lib.F
_siw::I64 = I64[_siu::P64 + 6];
_giE::I64 = _siw::I64;
_giD::P64 = stg_RUBBISH_ENTRY_info;
_giC::I64 = 2;
goto ciU;
ciR: -- Tag is 1, i.e. Lib.I
_siv::P64 = P64[_siu::P64 + 7];
_giD::P64 = _siv::P64;
_giC::I64 = 1;
goto ciU;
Here one of the blocks `ciS` and `ciR` is executed and then the execution
continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch
`_giE` is not initialized, because it's "rubbish" in the STG and so we don't
generate an assignment during code generator. The code generator then panics
during the register allocations:
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 8.1.20160722 for x86_64-unknown-linux):
LocalReg's live-in to graph ciY {_giE::I64}
(`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a
pointer slot, we have to initialize it otherwise garbage collector follows the
pointer to some random place. So we only remove assignment if the "rubbish" arg
has unboxed type.)
This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize
rubbish slots. If the slot is for boxed types we use the existing `absentError`,
otherwise we initialize the slot with literal 0.
Reviewers: simonpj, erikd, austin, simonmar, bgamari
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements primitive unboxed sum types, as described in
https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
Main changes are:
- Add new syntax for unboxed sums types, terms and patterns. Hidden
behind `-XUnboxedSums`.
- Add unlifted unboxed sum type constructors and data constructors,
extend type and pattern checkers and desugarer.
- Add new RuntimeRep for unboxed sums.
- Extend unarise pass to translate unboxed sums to unboxed tuples right
before code generation.
- Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
code generation when sum values are involved.
- Add user manual section for unboxed sums.
Some other changes:
- Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
`MultiValAlt` to be able to use those with both sums and tuples.
- Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
wrong, given an `Any` `TyCon`, there's no way to tell what its kind
is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
- Fix some bugs on the way: #12375.
Not included in this patch:
- Update Haddock for new the new unboxed sum syntax.
- `TemplateHaskell` support is left as future work.
For reviewers:
- Front-end code is mostly trivial and adapted from unboxed tuple code
for type checking, pattern checking, renaming, desugaring etc.
- Main translation routines are in `RepType` and `UnariseStg`.
Documentation in `UnariseStg` should be enough for understanding
what's going on.
Credits:
- Johan Tibell wrote the initial front-end and interface file
extensions.
- Simon Peyton Jones reviewed this patch many times, wrote some code,
and helped with debugging.
Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
simonmar, hvr, erikd
Reviewed By: simonpj
Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
thomie, mpickering
Differential Revision: https://phabricator.haskell.org/D2259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows the code generator to give hints to later code generation
steps about which branch is most likely to be taken. Right now it
is only taken into account in one place: a special case in
CmmContFlowOpt that swapped branches over to maximise the chance of
fallthrough, which is now disabled when there is a likelihood setting.
Test Plan: validate
Reviewers: austin, simonpj, bgamari, ezyang, tibbe
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-implements the code generation for case expressions at the Stg →
Cmm level, both for data type cases as well as for integral literal
cases. (Cases on float are still treated as before).
The goal is to allow for fancier strategies in implementing them, for a
cleaner separation of the strategy from the gritty details of Cmm, and
to run this later than the Common Block Optimization, allowing for one
way to attack #10124. The new module CmmSwitch contains a number of
notes explaining this changes. For example, it creates larger
consecutive jump tables than the previous code, if possible.
nofib shows little significant overall improvement of runtime. The
rather large wobbling comes from changes in the code block order
(see #8082, not much we can do about it). But the decrease in code size
alone makes this worthwhile.
```
Program Size Allocs Runtime Elapsed TotalMem
Min -1.8% 0.0% -6.1% -6.1% -2.9%
Max -0.7% +0.0% +5.6% +5.7% +7.8%
Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0%
```
Compilation time increases slightly:
```
-1 s.d. ----- -2.0%
+1 s.d. ----- +2.5%
Average ----- +0.3%
```
The test case T783 regresses a lot, but it is the only one exhibiting
any regression. The cause is the changed order of branches in an
if-then-else tree, which makes the hoople data flow analysis traverse
the blocks in a suboptimal order. Reverting that gets rid of this
regression, but has a consistent, if only very small (+0.2%), negative
effect on runtime. So I conclude that this test is an extreme outlier
and no reason to change the code.
Differential Revision: https://phabricator.haskell.org/D720
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves the scoping problem of CmmTick nodes: If we just put
CmmTicks into blocks we have no idea what exactly they are meant to
cover. Here we introduce tick scopes, which allow us to create
sub-scopes and merged scopes easily.
Notes:
* Given that the code often passes Cmm around "head-less", we have to
make sure that its intended scope does not get lost. To keep the amount
of passing-around to a minimum we define a CmmAGraphScoped type synonym
here that just bundles the scope with a portion of Cmm to be assembled
later.
* We introduce new scopes at somewhat random places, aligning with
getCode calls. This works surprisingly well, but we might have to
add new scopes into the mix later on if we find things too be too
coarse-grained.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
| |
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
|
| |
The Slow calling convention passes the closure in R1, but we were
ignoring this and hoping it would work, which it often did. However,
this bug seems to have been the cause of #7192, because the
graph-colouring allocator is more sensitive to having correct liveness
information on jumps.
|
| |
|
|
|
|
|
|
|
| |
All Cmm procedures now include the set of global registers that are live on
procedure entry, i.e., the global registers used to pass arguments to the
procedure. Only global registers that are use to pass arguments are included in
this list.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future.
|
|
|
|
| |
Fixes a perf problem in perf/compiler/T783
|
| |
|
|
|
|
|
|
| |
To explicitly choose whether you want an unregisterised build you now
need to use the "--enable-unregisterised"/"--disable-unregisterised"
configure flags.
|
|
|
|
|
|
|
| |
Instead of relying on common-block-elimination to share return
continuations in the common case (case-alternative heap checks) we do
it explicitly. This isn't hard to do, is more robust, and saves some
compilation time. Full commentary in Note [sharing continuations].
|
|
|
|
|
|
| |
This gives the register allocator access to R1.., F1.., D1.. etc. for
the new code generator, and is a cheap way to eliminate all the extra
"x = R1" assignments that we get from copyIn.
|
| |
|
|
|
|
|
|
|
|
| |
We also generate much better code for safe foreign calls (and maybe
also unsafe foreign calls) than previously. See the two new Notes:
Note [lower safe foreign calls]
Note [safe foreign call convention]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also:
- improvements to code generation: push slow-call continuations
on the stack instead of generating explicit continuations
- remove unused CmmInfo wrapper type (replace with CmmInfoTable)
- squash Area and AreaId together, remove now-unused RegSlot
- comment out old unused stack-allocation code that no longer
compiles after removal of RegSlot
|
| |
|
| |
|
| |
|
|
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|
| |
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
| |
...all to do with the new codgen path
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
This changes the new code generator to make use of the Hoopl package
for dataflow analysis. Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch. I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
- Update new codegen to work with latest Hoopl
- Add some notes on new code gen to cmm-notes
- Enable Hoopl lag package.
- Add SPJ note to cmm-notes
- Improve GC calls on new code generator.
Work in this branch was done by:
- Milan Straka <fox@ucw.cz>
- John Dias <dias@cs.tufts.edu>
- David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs.
|