| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add 'dumpAction' hook to DynFlags.
It allows GHC API users to catch dumped intermediate codes and
information. The format of the dump (Core, Stg, raw text, etc.) is now
reported allowing easier automatic handling.
* Add 'traceAction' hook to DynFlags.
Some dumps go through the trace mechanism (for instance unfoldings that
have been considered for inlining). This is problematic because:
1) dumps aren't written into files even with -ddump-to-file on
2) dumps are written on stdout even with GHC API
3) in this specific case, dumping depends on unsafe globally stored
DynFlags which is bad for GHC API users
We introduce 'traceAction' hook which allows GHC API to catch those
traces and to avoid using globally stored DynFlags.
* Avoid dumping empty logs via dumpAction/traceAction (but still write
empty files to keep the existing behavior)
|
|
|
|
|
|
|
|
|
|
|
|
| |
19 times out of 20 we already have dynflags in scope.
We could just always use `return dflags`. But this is in fact not free.
When looking at some STG code I noticed that we always allocate a
closure for this expression in the heap. Clearly a waste in these cases.
For the other cases we can either just modify the callsite to
get dynflags or use the _D variants of withTiming I added which
will use getDynFlags under the hood.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'withTiming' becomes a function that, when passed '-vN' (N >= 2) or
'-ddump-timings', will print timing (and possibly allocations) related
information. When additionally built with '-eventlog' and executed with
'+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent'
events to the eventlog.
'withTimingSilent' on the other hand will never print any timing information,
under any circumstance, and will only emit 'traceEvent' events to the eventlog.
As pointed out in !1672, 'traceMarker' is better suited for things that we
might want to visualize in tools like eventlog2html, while 'traceEvent'
is better suited for internal events that occur a lot more often and that we
don't necessarily want to visualize.
This addresses #17138 by using 'withTimingSilent' for all the codegen bits
that are expressed as a bunch of small computations over streams of codegen
ASTs.
|
| |
|
|
|
|
| |
separate file and add -ddump-cmm-verbose-by-proc to keep old behaviour (#16930)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move switch expressions into a local variable when generating switches.
This avoids duplicating the expression if we translate the switch
to a tree search. This fixes #16933.
Further we now check if all branches of a switch have the same
destination, replacing the switch with a direct branch if that
is the case.
Both of these patterns appear in the ENTER macro used by the RTS
but are unlikely to occur in intermediate Cmm generated by GHC.
Nofib result summary:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
Min -0.0% -0.0% -15.7% -15.6% 0.0%
Max -0.0% 0.0% +5.4% +5.5% 0.0%
Geometric Mean -0.0% -0.0% -1.0% -1.0% -0.0%
Compiler allocations go up slightly: +0.2%
Example output before and after the change taken from RTS code below.
All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
Instead the data is loaded once from memory in block c6.
Also the switch in block `ud` in the original code has been
eliminated completely.
Cmm without this commit:
```
stg_ap_0_fast() { // [R1]
{ []
}
{offset
ca: _c1::P64 = R1; // CmmAssign
goto c2; // CmmBranch
c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
c6: _c3::I64 = I64[_c1::P64];
if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
case 8, 9, 10, 11, 12, 13, 14 : goto c4;
}
ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
c4: R1 = _c1::P64;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
c8: _c1::P64 = P64[_c1::P64 + 8];
goto c2;
c7: R1 = _c1::P64;
call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
}
}
```
Cmm with this commit:
```
stg_ap_0_fast() { // [R1]
{ []
}
{offset
ca: _c1::P64 = R1;
goto c2;
c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
c6: _c3::I64 = I64[_c1::P64];
_ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
if (_ub::I64 < 26) goto uc; else goto uh;
uc: if (_ub::I64 < 15) goto ud; else goto uf;
ud: if (_ub::I64 < 8) goto c7; else goto c4;
uf: if (_ub::I64 >= 25) goto c4; else goto ug;
ug: if (_ub::I64 != 23) goto c7; else goto c4;
c4: R1 = _c1::P64;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
uh: if (_ub::I64 < 28) goto ui; else goto uj;
ui: if (_ub::I64 < 27) goto c7; else goto c8;
uj: if (_ub::I64 < 29) goto c8; else goto c7;
c8: _c1::P64 = P64[_c1::P64 + 8];
goto c2;
c7: R1 = _c1::P64;
call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
}
}
```
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Previously we would hvae a single big table of pointers per module,
with a set of bitmaps to reference entries within it. The new
representation is identical to a static constructor, which is much
simpler for the GC to traverse, and we get to remove the complicated
bitmap-traversal code from the GC.
- Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
document it much better (see Note [SRTs]). This has been something
I've wanted to do since we moved to the new code generator, I
finally had the opportunity to finish it while on a transatlantic
flight recently :)
There are a series of 4 diffs:
1. D4632 (this one), which does the bulk of the changes
2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
3. D4634 which takes advantage of D4632 and D4633 to save a word in
info tables that have an SRT on x86_64. This is where most of the
binary size improvement comes from.
4. D4637 which makes a further optimisation to merge some SRTs with
static FUN closures. This adds some complexity and the benefits
are fairly modest, so it's not clear yet whether we should do this.
Results (after (3), on x86_64)
- GHC itself (staticaly linked) is 5.2% smaller
- -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
- I measured the overhead of traversing all the static objects in a
major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
first thing in `Main.main`. The new version was 5-10% faster, but
the results did vary quite a bit.
- I'm not sure if there's a compile-time difference, the results are
too unreliable.
Test Plan: validate
Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4632
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d5c4d46a62ce6a0cfa6440344f707136eff18119.
Please see #14989 for details.
Test Plan: ./validate
Reviewers: bgamari, simonmar
Subscribers: thomie, carter
GHC Trac Issues: #14989
Differential Revision: https://phabricator.haskell.org/D4577
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sinking pass often gets rid of unnecessary registers
registers/assignements exposing more opportunities for CBE, so this
commit adds a second round of CBE after the sinking pass and should
fix #12915 (and some examples in #14226).
Nofib results:
* Binary size: 0.9% reduction on average
* Compile allocations: 0.7% increase on average
* Runtime: noisy, two separate runs of nofib showed a tiny
reduction on average, (~0.2-0.3%), but I think
this is mostly noise
* Compile time: very noisy, but generally within +/- 0.5% (one
run faster, one slower)
One interesting part of this change is that running CBE invalidates
results of proc-point analysis. But instead of re-doing the whole
analysis, we can use the map that CBE creates for replacing/comparing
block labels (maps a redundant label to a useful one) to update the
results of proc-point analysis. This lowers the overhead compared to the
previous experiment in #12915.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #12915, #14226
Differential Revision: https://phabricator.haskell.org/D4417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trac #14226 noted that the C-- CBE pass frequently fails to
common up semantically identical blocks due to the differences in local
register naming. These patches fixed this by making the pass consider
equality up to alpha-renaming.
However, the new logic failed to consider the possibility that local
register naming *may* matter across multiple blocks. This lead to the
regression #14754. I'll need to do a bit of thinking on a proper
solution to this but in the meantime I'm reverting all four patches.
This reverts commit a27056f9823f8bbe2302f1924b3ab38fd6752e37.
This reverts commit 6f990c54f922beae80362fe62426beededc21290.
This reverts commit 9aa73892e10e90a1799b9277da593e816a827364.
This reverts commit 7920a7d9c53083b234e060a3e72f00b601a46808.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simonpj suggested this as a follow-on to #14226 to avoid code
duplication. This also gives us the ability to CBE cases involving
foreign calls for free.
Test Plan: Validate
Reviewers: austin, simonmar, simonpj
Reviewed By: simonpj
Subscribers: michalt, simonpj, rwbarton, thomie
GHC Trac Issues: #14226
Differential Revision: https://phabricator.haskell.org/D3999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously due to #12759 we disabled PIE support entirely. However, this
breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE,
allowing the user to build PIEs.
Test Plan: Validate
Reviewers: rwbarton, austin, simonmar
Subscribers: trommler, simonmar, trofi, jrtc27, thomie
GHC Trac Issues: #12759, #13702
Differential Revision: https://phabricator.haskell.org/D3589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`procPointAnalysis` doesn't need to run in `UniqSM` (it consists of a
single `return` and the call to `analyzeCmm` function which is pure).
Making it non-monadic simplifies the code a bit.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: validate
Reviewers: austin, bgamari, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `-ddump-cmm` put all stages of Cmm processing into one output.
This patch changes its behavior and adds two more options to make
Cmm dumping flexible.
- `-ddump-cmm-from-stg` dumps only initial version of Cmm right after
STG->Cmm codegen
- `-ddump-cmm` dumps the final result of the Cmm pipeline processing
- `-ddump-cmm-verbose` dumps intermediate output of each Cmm pipeline
step
- `-ddump-cmm-proc` and `-ddump-cmm-caf` seems were lost. Now enabled
Test Plan: ./validate
Reviewers: thomie, simonmar, austin, bgamari
Reviewed By: thomie, simonmar
Subscribers: simonpj, thomie
Differential Revision: https://phabricator.haskell.org/D2393
GHC Trac Issues: #11717
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-implements the code generation for case expressions at the Stg →
Cmm level, both for data type cases as well as for integral literal
cases. (Cases on float are still treated as before).
The goal is to allow for fancier strategies in implementing them, for a
cleaner separation of the strategy from the gritty details of Cmm, and
to run this later than the Common Block Optimization, allowing for one
way to attack #10124. The new module CmmSwitch contains a number of
notes explaining this changes. For example, it creates larger
consecutive jump tables than the previous code, if possible.
nofib shows little significant overall improvement of runtime. The
rather large wobbling comes from changes in the code block order
(see #8082, not much we can do about it). But the decrease in code size
alone makes this worthwhile.
```
Program Size Allocs Runtime Elapsed TotalMem
Min -1.8% 0.0% -6.1% -6.1% -2.9%
Max -0.7% +0.0% +5.6% +5.7% +7.8%
Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0%
```
Compilation time increases slightly:
```
-1 s.d. ----- -2.0%
+1 s.d. ----- +2.5%
Average ----- +0.3%
```
The test case T783 regresses a lot, but it is the only one exhibiting
any regression. The cause is the changed order of branches in an
if-then-else tree, which makes the hoople data flow analysis traverse
the blocks in a suboptimal order. Reverting that gets rid of this
regression, but has a consistent, if only very small (+0.2%), negative
effect on runtime. So I conclude that this test is an extreme outlier
and no reason to change the code.
Differential Revision: https://phabricator.haskell.org/D720
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
|
|
|
|
|
| |
End of Cmm pipeline used to be split into two alternative flows,
depending on whether we did proc-point splitting or not. There
was a lot of code duplication between these two branches. But it
wasn't really necessary as the differences can be easily enclosed
within an if-then-else. I observed no impact of this change on
compilation performance.
|
| |
|
| |
|
| |
|
|
|
|
| |
This makes it consistent with the corresponding -cmm-sink flag
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things:
* Allows duplicating of global registers and literals by inlining
them. Previously we would only inline global register or literal
if it was used only once.
* Changes method of determining conflicts between a node and an
assignment. New method has two advantages. It relies on
DefinerOfRegs and UserOfRegs typeclasses, so if a set of registers
defined or used by a node should ever change, `conflicts` function
will use the changed definition. This definition also catches
more cases than the previous one (namely CmmCall and CmmForeignCall)
which is a step towards making it possible to run sinking pass
before stack layout (currently this doesn't work).
This patch also adds a lot of comments that are result of about two-week
long investigation of how sinking pass works and why it does what it does.
|
|
|
|
|
|
|
|
| |
On some architectures it might happen that stack layout pass will
invalidate the list of calculated procpoints by dropping some of
them. We fix this by checking whether a proc-point is in a graph
at the beginning of proc-point analysis. This is a speculative
fix for #8205.
|
|
|
|
|
|
|
| |
We weren't properly tracking the number of stack arguments in the
continuation of a foreign call. It happened to work when the
continuation was not a join point, but when it was a join point we
were using the wrong amount of stack fixup.
|
| |
|
|
|
|
| |
No change in functionality intended
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's only a single compiler backend now, so the 'z' suffix means
nothing. Also, the flags were confusingly named ('cmm-foo' vs
'foo-cmm',) and counter-intuitively, '-ddump-cmm' did not do at all what
you expected since the new backend went live.
Basically, all of the -ddump-cmmz-* flags are now -ddump-cmm-*. Some were
renamed to be more consistent.
This doesn't update the manual; it already mentions '-ddump-cmm' and
that flag implies all the others anyway, which is probably what you
want.
Signed-off-by: Austin Seipp <mad.one@gmail.com>
|
| |
|
|
|
|
| |
The workaround described in note [darwin-x86-pic] applies to Darwin/PPC too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the OldCmm data type and the CmmCvt pass that converts
new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been
converted to consume new Cmm.
The main difference between the two data types is that conditional
branches in new Cmm have both true/false successors, whereas in OldCmm
the false case was a fallthrough. To generate slightly better code we
occasionally need to invert a conditional to ensure that the
branch-not-taken becomes a fallthrough; this was previously done in
CmmCvt, and it is now done in CmmContFlowOpt.
We could go further and use the Hoopl Block representation for native
code, which would mean that we could use Hoopl's postorderDfs and
analyses for native code, but for now I've left it as is, using the
old ListGraph representation for native code.
|
|
|
|
|
|
|
| |
All Cmm procedures now include the set of global registers that are live on
procedure entry, i.e., the global registers used to pass arguments to the
procedure. Only global registers that are use to pass arguments are included in
this list.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were being inconsistent about how we tested whether dump flags
were enabled; in particular, sometimes we also checked the verbosity,
and sometimes we didn't.
This lead to oddities such as "ghc -v4" printing an "Asm code" section
which didn't contain any code, and "-v4" enabled some parts of
"-ddump-deriv" but not others.
Now all the tests use dopt, which also takes the verbosity into account
as appropriate.
|
|
|
|
|
| |
Mostly d -> g (matching DynFlag -> GeneralFlag).
Also renamed if* to when*, matching the Haskell if/when names
|
|
|
|
|
| |
This avoids confusion due to [DynFlag] and DynFlags being completely
different types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
| |
|