| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
|
| |
We introduce a PlatformWordSize type and use it in platformWordSize
field.
This removes to panic/error calls called when platform word size is not
32 or 64. We now check for this when reading the platform config.
|
|
|
|
|
|
|
| |
This prepares the way for making Int32# and Word32# the actual size they
claim to be.
Updates binary submodule for (de)serializing the new runtime reps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move switch expressions into a local variable when generating switches.
This avoids duplicating the expression if we translate the switch
to a tree search. This fixes #16933.
Further we now check if all branches of a switch have the same
destination, replacing the switch with a direct branch if that
is the case.
Both of these patterns appear in the ENTER macro used by the RTS
but are unlikely to occur in intermediate Cmm generated by GHC.
Nofib result summary:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
Min -0.0% -0.0% -15.7% -15.6% 0.0%
Max -0.0% 0.0% +5.4% +5.5% 0.0%
Geometric Mean -0.0% -0.0% -1.0% -1.0% -0.0%
Compiler allocations go up slightly: +0.2%
Example output before and after the change taken from RTS code below.
All but one of the memory loads `I32[_c3::I64 - 8]` are eliminated.
Instead the data is loaded once from memory in block c6.
Also the switch in block `ud` in the original code has been
eliminated completely.
Cmm without this commit:
```
stg_ap_0_fast() { // [R1]
{ []
}
{offset
ca: _c1::P64 = R1; // CmmAssign
goto c2; // CmmBranch
c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
c6: _c3::I64 = I64[_c1::P64];
if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;
ub: if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;
uc: if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;
ud: switch [8 .. 14] (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8])) {
case 8, 9, 10, 11, 12, 13, 14 : goto c4;
}
ue: if (I32[_c3::I64 - 8] >= 25 :: W32) goto c4; else goto uf;
uf: if (%MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]) != 23) goto c7; else goto c4;
c4: R1 = _c1::P64;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
ug: if (I32[_c3::I64 - 8] < 28 :: W32) goto uh; else goto ui;
uh: if (I32[_c3::I64 - 8] < 27 :: W32) goto c7; else goto c8;
ui: if (I32[_c3::I64 - 8] < 29 :: W32) goto c8; else goto c7;
c8: _c1::P64 = P64[_c1::P64 + 8];
goto c2;
c7: R1 = _c1::P64;
call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
}
}
```
Cmm with this commit:
```
stg_ap_0_fast() { // [R1]
{ []
}
{offset
ca: _c1::P64 = R1;
goto c2;
c2: if (_c1::P64 & 7 != 0) goto c4; else goto c6;
c6: _c3::I64 = I64[_c1::P64];
_ub::I64 = %MO_SS_Conv_W32_W64(I32[_c3::I64 - 8]);
if (_ub::I64 < 26) goto uc; else goto uh;
uc: if (_ub::I64 < 15) goto ud; else goto uf;
ud: if (_ub::I64 < 8) goto c7; else goto c4;
uf: if (_ub::I64 >= 25) goto c4; else goto ug;
ug: if (_ub::I64 != 23) goto c7; else goto c4;
c4: R1 = _c1::P64;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
uh: if (_ub::I64 < 28) goto ui; else goto uj;
ui: if (_ub::I64 < 27) goto c7; else goto c8;
uj: if (_ub::I64 < 29) goto c8; else goto c7;
c8: _c1::P64 = P64[_c1::P64 + 8];
goto c2;
c7: R1 = _c1::P64;
call (_c3::I64)(R1) args: 8, res: 0, upd: 8;
}
}
```
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
| |
Also used ByteString in some other relevant places
|
|
|
|
|
|
|
|
|
|
|
|
| |
This builds off of D4475.
Bumps binary submodule.
Reviewers: carter, AndreasK, hvr, goldfire, bgamari, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D5006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
- `primops.txt.pp` gets two new sections for two new primitive types
for signed and unsigned 8-bit integers (`Int8#` and `Word8`
respectively) along with basic arithmetic and comparison
operations. `PrimRep`/`RuntimeRep` get two new constructors for
them. All of the primops translate into the existing `MachOP`s.
- For `CmmCall`s the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then
truncate them back their original width.
- x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate with new tests
Reviewers: hvr, goldfire, bgamari, simonmar
Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4475
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4957
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes it possible to generate a static 32-bit relative label
offset on x86_64. Currently we can only generate word-sized label
offsets.
This will be used in D4634 to shrink info tables. See D4632 for more
details.
Test Plan: See D4632
Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4633
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Hoopl.Graph` has almost exactly the same function, so let's use that.
Also, use `IntMap.alter` to make it more efficient.
Also switch `Hoopl` to use strict maps.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix the naming and comments to indicate that we are calculating
*reverse* postorder (and not the standard postorder).
- Rewrite the calculation to avoid CPS code. I found it fairly
difficult to understand and the new one seems faster (according to
nofib, decreases compiler allocations by 0.2%)
- Remove `LabelsPtr`, which seems unnecessary and could be *really*
confusing. For instance, previously:
`postorder_dfs_from <block with label X>`
and
`postorder_dfs_from <label X>`
would actually mean quite different things (and give different
results).
- Change the `Dataflow` module to always use entry of the graph for
reverse postorder calculation. This should be the only change in
behavior of this commit.
Previously, if the caller provided initial facts for some of the
labels, we would use those labels for our postorder calculation.
However, I don't think that's correct in general - if the initial
facts did not contain the entry of the graph, we would never analyze
the blocks reachable from the entry but unreachable from the labels
provided with the initial facts. It seems that the only analysis that
used this was proc-point analysis, which I think would always include
the entry block (so I don't think there's any bug due to this).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4464
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, AndreasK, erikd
Reviewed By: AndreasK
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4398
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems that most uses of these folds should be strict left folds
(I could only find a single place that benefits from a right fold).
So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey`
replaces them with:
- `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds)
- `setFoldr`/`mapFoldr` (for the less common case where a right fold
actually makes sense, e.g., `CmmProcPoint`)
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter, kavon
Differential Revision: https://phabricator.haskell.org/D4356
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
| |
and join type signatures
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This copies the subset of Hoopl's functionality needed by GHC to
`cmm/Hoopl` and removes the dependency on the Hoopl package.
The main motivation for this change is the confusing/noisy interface
between GHC and Hoopl:
- Hoopl has `Label` which is GHC's `BlockId` but different than
GHC's `CLabel`
- Hoopl has `Unique` which is different than GHC's `Unique`
- Hoopl has `Unique{Map,Set}` which are different than GHC's
`Uniq{FM,Set}`
- GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is
needed just to filter the exposed functions (filter out some of the
Hoopl's and add the GHC ones)
With this change, we'll be able to simplify this significantly.
It'll also be much easier to do invasive changes (Hoopl is a public
package on Hackage with users that depend on the current behavior)
This should introduce no changes in functionality - it merely
copies the relevant code.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: simonpj, kavon, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3616
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commits relaxes the invariants of the Core syntax so that a
top-level variable can be bound to a primitive string literal of type
Addr#.
This commit:
* Relaxes the invatiants of the Core, and allows top-level bindings whose
type is Addr# as long as their RHS is either a primitive string literal or
another variable.
* Allows the simplifier and the full-laziness transformer to float out
primitive string literals to the top leve.
* Introduces the new StgGenTopBinding type to accomodate top-level Addr#
bindings.
* Introduces a new type of labels in the object code, with the suffix "_bytes",
for exported top-level Addr# bindings.
* Makes some built-in rules more robust. This was necessary to keep them
functional after the above changes.
This is a continuation of D2554.
Rebasing notes:
This had two slightly suspicious performance regressions:
* T12425: bytes allocated regressed by roughly 5%
* T4029: bytes allocated regressed by a bit over 1%
* T13035: bytes allocated regressed by a bit over 5%
These deserve additional investigation.
Rebased by: bgamari.
Test Plan: ./validate --slow
Reviewers: goldfire, trofi, simonmar, simonpj, austin, hvr, bgamari
Reviewed By: trofi, simonpj, bgamari
Subscribers: trofi, simonpj, gridaphobe, thomie
Differential Revision: https://phabricator.haskell.org/D2605
GHC Trac Issues: #8472
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the proposal in
https://github.com/ghc-proposals/ghc-proposals/pull/29 and
https://github.com/ghc-proposals/ghc-proposals/pull/35.
Here are some of the pieces of that proposal:
* Some of RuntimeRep's constructors have been shortened.
* TupleRep and SumRep are now parameterized over a list of RuntimeReps.
* This
means that two types with the same kind surely have the same
representation.
Previously, all unboxed tuples had the same kind, and thus the fact
above was
false.
* RepType.typePrimRep and friends now return a *list* of PrimReps. These
functions can now work successfully on unboxed tuples. This change is
necessary because we allow abstraction over unboxed tuple types and so
cannot
always handle unboxed tuples specially as we did before.
* We sometimes have to create an Id from a PrimRep. I thus split PtrRep
* into
LiftedRep and UnliftedRep, so that the created Ids have the right
strictness.
* The RepType.RepType type was removed, as it didn't seem to help with
* much.
* The RepType.repType function is also removed, in favor of typePrimRep.
* I have waffled a good deal on whether or not to keep VoidRep in
TyCon.PrimRep. In the end, I decided to keep it there. PrimRep is *not*
represented in RuntimeRep, and typePrimRep will never return a list
including
VoidRep. But it's handy to have in, e.g., ByteCodeGen and friends. I can
imagine another design choice where we have a PrimRepV type that is
PrimRep
with an extra constructor. That seemed to be a heavier design, though,
and I'm
not sure what the benefit would be.
* The last, unused vestiges of # (unliftedTypeKind) have been removed.
* There were several pretty-printing bugs that this change exposed;
* these are fixed.
* We previously checked for levity polymorphism in the types of binders.
* But we
also must exclude levity polymorphism in function arguments. This is
hard to check
for, requiring a good deal of care in the desugarer. See Note [Levity
polymorphism
checking] in DsMonad.
* In order to efficiently check for levity polymorphism in functions, it
* was necessary
to add a new bit of IdInfo. See Note [Levity info] in IdInfo.
* It is now safe for unlifted types to be unsaturated in Core. Core Lint
* is updated
accordingly.
* We can only know strictness after zonking, so several checks around
* strictness
in the type-checker (checkStrictBinds, the check for unlifted variables
under a ~
pattern) have been moved to the desugarer.
* Along the way, I improved the treatment of unlifted vs. banged
* bindings. See
Note [Strict binds checks] in DsBinds and #13075.
* Now that we print type-checked source, we must be careful to print
* ConLikes correctly.
This is facilitated by a new HsConLikeOut constructor to HsExpr.
Particularly troublesome
are unlifted pattern synonyms that get an extra void# argument.
* Includes a submodule update for haddock, getting rid of #.
* New testcases:
typecheck/should_fail/StrictBinds
typecheck/should_fail/T12973
typecheck/should_run/StrictPats
typecheck/should_run/T12809
typecheck/should_fail/T13105
patsyn/should_fail/UnliftedPSBind
typecheck/should_fail/LevPolyBounded
typecheck/should_compile/T12987
typecheck/should_compile/T11736
* Fixed tickets:
#12809
#12973
#11736
#13075
#12987
* This also adds a test case for #13105. This test case is
* "compile_fail" and
succeeds, because I want the testsuite to monitor the error message.
When #13105 is fixed, the test case will compile cleanly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This continues removal of `BlockId` module in favor of Hoopl's `Label`.
Most of the changes here are mechanical, apart from the orphan
`Outputable` instances for `LabelMap` and `LabelSet`. For now I've
moved them to `cmm/Hoopl`, since it's already trying to manage all
imports from Hoopl (to avoid any collisions).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: validate
Reviewers: bgamari, austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2800
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removed the alignment for strings and mark then as cstring sections in
the generated asm so the linker can merge duplicate sections.
Reviewers: rwbarton, trofi, austin, trommler, simonmar, hvr, bgamari
Reviewed By: hvr, bgamari
Subscribers: simonpj, hvr, thomie
Differential Revision: https://phabricator.haskell.org/D1290
GHC Trac Issues: #9577
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit:
- Moves the remaining few methods concerned with dataflow analysis
from `CmmUtils` to `Hoopl.Dataflow`.
- Refactors the code to not use `FwdPass` and simply pass `FwdTransfer`
and `DataflowLattice` directly.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: validate
Reviewers: austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had *a lot* of code copied from Hoopl that is for rewriting. But GHC
doesn't use it (it only uses some forked Hoopl code for analysis).
So we can safely kill all this code and make it much easier to refactor
and improve the parts that we do use.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2612
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind adding special "rubbish" arguments was in unboxed sum types
depending on the tag some arguments are not used and we don't want to move some
special values (like 0 for literals and some special pointer for boxed slots)
for those arguments (to stack locations or registers). "StgRubbishArg" was an
indicator to the code generator that the value won't be used. During Stg-to-Cmm
we were then not generating any move or store instructions at all.
This caused problems in the register allocator because some variables were only
initialized in some code paths. As an example, suppose we have this STG: (after
unarise)
Lib.$WT =
\r [dt_sit]
case
case dt_sit of {
Lib.F dt_siv [Occ=Once] ->
(#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#];
Lib.I dt_siw [Occ=Once] ->
(#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw];
}
of
dt_six
{ (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE];
};
This basically unpacks a sum type to an unboxed sum with 3 fields, and then
moves the unboxed sum to a constructor (`Lib.T`).
This is the Cmm for the inner case expression (case expression in the scrutinee
position of the outer case):
ciN:
...
-- look at dt_sit's tag
if (_ciT::P64 != 1) goto ciS; else goto ciR;
ciS: -- Tag is 2, i.e. Lib.F
_siw::I64 = I64[_siu::P64 + 6];
_giE::I64 = _siw::I64;
_giD::P64 = stg_RUBBISH_ENTRY_info;
_giC::I64 = 2;
goto ciU;
ciR: -- Tag is 1, i.e. Lib.I
_siv::P64 = P64[_siu::P64 + 7];
_giD::P64 = _siv::P64;
_giC::I64 = 1;
goto ciU;
Here one of the blocks `ciS` and `ciR` is executed and then the execution
continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch
`_giE` is not initialized, because it's "rubbish" in the STG and so we don't
generate an assignment during code generator. The code generator then panics
during the register allocations:
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 8.1.20160722 for x86_64-unknown-linux):
LocalReg's live-in to graph ciY {_giE::I64}
(`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a
pointer slot, we have to initialize it otherwise garbage collector follows the
pointer to some random place. So we only remove assignment if the "rubbish" arg
has unboxed type.)
This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize
rubbish slots. If the slot is for boxed types we use the existing `absentError`,
otherwise we initialize the slot with literal 0.
Reviewers: simonpj, erikd, austin, simonmar, bgamari
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements primitive unboxed sum types, as described in
https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
Main changes are:
- Add new syntax for unboxed sums types, terms and patterns. Hidden
behind `-XUnboxedSums`.
- Add unlifted unboxed sum type constructors and data constructors,
extend type and pattern checkers and desugarer.
- Add new RuntimeRep for unboxed sums.
- Extend unarise pass to translate unboxed sums to unboxed tuples right
before code generation.
- Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
code generation when sum values are involved.
- Add user manual section for unboxed sums.
Some other changes:
- Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
`MultiValAlt` to be able to use those with both sums and tuples.
- Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
wrong, given an `Any` `TyCon`, there's no way to tell what its kind
is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
- Fix some bugs on the way: #12375.
Not included in this patch:
- Update Haddock for new the new unboxed sum syntax.
- `TemplateHaskell` support is left as future work.
For reviewers:
- Front-end code is mostly trivial and adapted from unboxed tuple code
for type checking, pattern checking, renaming, desugaring etc.
- Main translation routines are in `RepType` and `UnariseStg`.
Documentation in `UnariseStg` should be enough for understanding
what's going on.
Credits:
- Johan Tibell wrote the initial front-end and interface file
extensions.
- Simon Peyton Jones reviewed this patch many times, wrote some code,
and helped with debugging.
Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
simonmar, hvr, erikd
Reviewed By: simonpj
Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
thomie, mpickering
Differential Revision: https://phabricator.haskell.org/D2259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a flag -split-sections that does similar things to
-split-objs, but using sections in single object files instead of
relying on the Satanic Splitter and other abominations. This is very
similar to the GCC flags -ffunction-sections and -fdata-sections.
The --gc-sections linker flag, which allows unused sections to actually
be removed, is added to all link commands (if the linker supports it) so
that space savings from having base compiled with sections can be
realized.
Supported both in LLVM and the native code-gen, in theory for all
architectures, but really tested on x86 only.
In the GHC build, a new SplitSections variable enables -split-sections
for relevant parts of the build.
Test Plan: validate with both settings of SplitSections
Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari
Reviewed By: simonmar, thomie, bgamari
Subscribers: hsyl20, erikd, kgardas, thomie
Differential Revision: https://phabricator.haskell.org/D1242
GHC Trac Issues: #8405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned
the STG registers F1 and D1 the same hardware register (xmm1), and
the same for the registers F2 and D2, etc. When mixing calls to
functions involving Float#s and Double#s, this can cause wrong Cmm
optimizations that assume the F1 and D1 registers are independent.
Reviewers: simonpj, austin
Reviewed By: austin
Subscribers: simonpj, thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D993
GHC Trac Issues: #10521
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-implements the code generation for case expressions at the Stg →
Cmm level, both for data type cases as well as for integral literal
cases. (Cases on float are still treated as before).
The goal is to allow for fancier strategies in implementing them, for a
cleaner separation of the strategy from the gritty details of Cmm, and
to run this later than the Common Block Optimization, allowing for one
way to attack #10124. The new module CmmSwitch contains a number of
notes explaining this changes. For example, it creates larger
consecutive jump tables than the previous code, if possible.
nofib shows little significant overall improvement of runtime. The
rather large wobbling comes from changes in the code block order
(see #8082, not much we can do about it). But the decrease in code size
alone makes this worthwhile.
```
Program Size Allocs Runtime Elapsed TotalMem
Min -1.8% 0.0% -6.1% -6.1% -2.9%
Max -0.7% +0.0% +5.6% +5.7% +7.8%
Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0%
```
Compilation time increases slightly:
```
-1 s.d. ----- -2.0%
+1 s.d. ----- +2.5%
Average ----- +0.3%
```
The test case T783 regresses a lot, but it is the only one exhibiting
any regression. The cause is the changed order of branches in an
if-then-else tree, which makes the hoople data flow analysis traverse
the blocks in a suboptimal order. Reverting that gets rid of this
regression, but has a consistent, if only very small (+0.2%), negative
effect on runtime. So I conclude that this test is an extreme outlier
and no reason to change the code.
Differential Revision: https://phabricator.haskell.org/D720
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
as obviously, the code does _not_ adhere to the comment, and yet the
compiler does _not_ go into an infinite loop.
Test Plan: Run validate
Reviewers: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D719
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves the scoping problem of CmmTick nodes: If we just put
CmmTicks into blocks we have no idea what exactly they are meant to
cover. Here we introduce tick scopes, which allow us to create
sub-scopes and merged scopes easily.
Notes:
* Given that the code often passes Cmm around "head-less", we have to
make sure that its intended scope does not get lost. To keep the amount
of passing-around to a minimum we define a CmmAGraphScoped type synonym
here that just bundles the scope with a portion of Cmm to be assembled
later.
* We introduce new scopes at somewhat random places, aligning with
getCode calls. This works surprisingly well, but we might have to
add new scopes into the mix later on if we find things too be too
coarse-grained.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds CmmTick nodes to Cmm code. This is relatively
straight-forward, but also not very useful, as many blocks will simply
end up with no annotations whatosever.
Notes:
* We use this design over, say, putting ticks into the entry node of all
blocks, as it seems to work better alongside existing optimisations.
Now granted, the reason for this is that currently GHC's main Cmm
optimisations seem to mainly reorganize and merge code, so this might
change in the future.
* We have the Cmm parser generate a few source notes as well. This is
relatively easy to do - worst part is that it complicates the CmmParse
implementation a bit.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
|
| |
I'd like to be able to pack together non-pointer fields that are less
than a word in size, and this is a necessary prerequisite.
|
| |
|
|
|
|
| |
This reverts commit 2f5db98e90cf0cff1a11971c85f108a7480528ed.
|
| |
|
|
|
|
|
|
| |
Clang doesn't like whitespace between macro and arguments.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
| |
monoidal for submitting.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
| |
Patch offered by Boris Sukholitko <boriss@gmail.com>
Trac #7757
|
|
|
|
|
|
|
|
|
| |
In OldCmm, the false case of a conditional was a fallthrough. In Cmm,
conditionals have both true and false successors. When we convert Cmm to LLVM,
we now first re-order Cmm blocks so that the false successor of a conditional
occurs next in the list of basic blocks, i.e., it is a fallthrough, just like it
(necessarily) did in OldCmm. Surprisingly, this can make a big performance
difference.
|
|
|
|
|
| |
Vector values are now always passed on the stack. This isn't particularly
efficient, but it will have to do for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the OldCmm data type and the CmmCvt pass that converts
new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been
converted to consume new Cmm.
The main difference between the two data types is that conditional
branches in new Cmm have both true/false successors, whereas in OldCmm
the false case was a fallthrough. To generate slightly better code we
occasionally need to invert a conditional to ensure that the
branch-not-taken becomes a fallthrough; this was previously done in
CmmCvt, and it is now done in CmmContFlowOpt.
We could go further and use the Hoopl Block representation for native
code, which would mean that we could use Hoopl's postorderDfs and
analyses for native code, but for now I've left it as is, using the
old ListGraph representation for native code.
|
| |
|