| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to check `GrdVec`s arising from multiple clauses and guards in
isolation. That resulted in a split between `pmCheck` and
`pmCheckGuards`, the implementations of which were similar, but subtly
different in detail. Also the throttling mechanism described in
`Note [Countering exponential blowup]` ultimately got quite complicated
because it had to cater for both checking functions.
This patch realises that pattern match checking doesn't just consider
single guarded RHSs, but that it's always a whole set of clauses, each
of which can have multiple guarded RHSs in turn. We do so by
translating a list of `Match`es to a `GrdTree`:
```haskell
data GrdTree
= Rhs !RhsInfo
| Guard !PmGrd !GrdTree -- captures lef-to-right match semantics
| Sequence !GrdTree !GrdTree -- captures top-to-bottom match semantics
| Empty -- For -XEmptyCase, neutral element of Sequence
```
Then we have a function `checkGrdTree` that matches a given `GrdTree`
against an incoming set of values, represented by `Deltas`:
```haskell
checkGrdTree :: GrdTree -> Deltas -> CheckResult
...
```
Throttling is isolated to the `Sequence` case and becomes as easy as one
would expect: When the union of uncovered values becomes too big, just
return the original incoming `Deltas` instead (which is always a
superset of the union, thus a sound approximation).
The returned `CheckResult` contains two things:
1. The set of values that were not covered by any of the clauses, for
exhaustivity warnings.
2. The `AnnotatedTree` that enriches the syntactic structure of the
input program with divergence and inaccessibility information.
This is `AnnotatedTree`:
```haskell
data AnnotatedTree
= AccessibleRhs !RhsInfo
| InaccessibleRhs !RhsInfo
| MayDiverge !AnnotatedTree
| SequenceAnn !AnnotatedTree !AnnotatedTree
| EmptyAnn
```
Crucially, `MayDiverge` asserts that the tree may force diverging
values, so not all of its wrapped clauses can be redundant.
While the set of uncovered values can be used to generate the missing
equations for warning messages, redundant and proper inaccessible
equations can be extracted from `AnnotatedTree` by
`redundantAndInaccessibleRhss`.
For this to work properly, the interface to the Oracle had to change.
There's only `addPmCts` now, which takes a bag of `PmCt`s. There's a
whole bunch of `PmCt` variants to replace the different oracle functions
from before.
The new `AnnotatedTree` structure allows for more accurate warning
reporting (as evidenced by a number of changes spread throughout GHC's
code base), thus we fix #17465.
Fixes #17646 on the go.
Metric Decrease:
T11822
T9233
PmSeriesS
haddock.compiler
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For backends maintaining the CFG during codegen
we can now find loops and their nesting level.
This is based on the Cmm CFG and dominator analysis.
As a result we can estimate edge frequencies a lot better
for methods, resulting in far better code layout.
Speedup on nofib: ~1.5%
Increase in compile times: ~1.9%
To make this feasible this commit adds:
* Dominator analysis based on the Lengauer-Tarjan Algorithm.
* An algorithm estimating global edge frequences from branch
probabilities - In CFG.hs
A few static branch prediction heuristics:
* Expect to take the backedge in loops.
* Expect to take the branch NOT exiting a loop.
* Expect integer vs constant comparisons to be false.
We also treat heap/stack checks special for branch prediction
to avoid them being treated as loops.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Statements can change the basic block in which instructions
are placed during instruction selection.
We have to keep track of this switch of the current basic block
as we need this information in order to properly update the CFG.
This commit implements this change and fixes #17334.
We do so by having stmtToInstr return the new block id
if a statement changed the basic block.
|
|
|
|
|
| |
The tightens up the kinds a bit. I use type synnonyms to avoid adding
promotion ticks everywhere.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These kinds of imports are necessary in some cases such as
importing instances of typeclasses or intentionally creating
dependencies in the build system, but '-Wunused-imports' can't
detect when they are no longer needed. This commit removes the
unused ones currently in the code base (not including test files
or submodules), with the hope that doing so may increase
parallelism in the build system by removing unnecessary
dependencies.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
|
|
|
|
|
| |
* Remove `takeL/R 1` occurences by lastOL/headOL.
* Make BlockChain a OrdList newtype by removing the set of blocks.
Initially BlockChain contained both, a set for membership test
and a ordered list of blocks. The set is not used for any
performance sensitive lookups so we get rid of it.
|
|
|
|
|
| |
OrdList does the same thing and more so there is no reason
to have both.
|
|
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|