| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
See #20725.
The commit includes source-code changes and a test case.
|
|
|
|
| |
These generally expect a particular word size.
|
|
|
|
|
|
|
| |
As noted in Note [When in doubt, cast arguments as unsigned], we
must ensure that arguments have the correct signedness since some
operations (e.g. `%`) have different semantics depending upon
signedness.
|
|
|
|
| |
There were found by the test-primops testsuite.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
No-op assignments like R1 = R1 are not only wasteful. They can also
inhibit other optimizations like inlining assignments that read from
R1.
We now check for assignments being a no-op before and after we
simplify the RHS in Cmm sink which should eliminate most of these
no-ops.
|
| |
|
|
|
|
|
|
|
| |
We don't want regressions like e8f7734d8a052f99b03e1123466dc9f47b48c311
to regress.
Co-Authored-By: Sylvain Henry <hsyl20@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first change makes the array ones use the proper fixed-size types,
which also means that just like before, they can be used without
explicit conversions with the boxed sized types. (Before, it was Int# /
Word# on both sides, now it is fixed sized on both sides).
For the second change, don't use "extend" or "narrow" in some of the
user-facing primops names for conversions.
- Names like `narrowInt32#` are misleading when `Int` is 32-bits.
- Names like `extendInt64#` are flat-out wrong when `Int is
32-bits.
- `narrow{Int,Word}<N>#` however map a type to itself, and so don't
suffer from this problem. They are left as-is.
These changes are batched together because Alex happend to use the array
ops. We can only use released versions of Alex at this time, sadly, and
I don't want to have to have a release thatwon't work for the final GHC
9.2. So by combining these we get all the changes for Alex done at once.
Bump hackage state in a few places, and also make that workflow slightly
easier for the future.
Bump minimum Alex version
Bump Cabal, array, bytestring, containers, text, and binary submodules
|
|
|
|
|
|
|
|
|
|
|
| |
We no compare these by doing 64bit subtraction and
checking the resulting flags.
We used to do this differently but the old approach was
broken when the high bits compared equal and the comparison
was one of >= or <=.
The new approach should be both correct and faster.
|
|
|
|
|
|
| |
Metric Decrease:
T12150
T12234
|
|
|
|
|
|
|
|
|
| |
This MachOp was introduced by 2c959a1894311e59cd2fd469c1967491c1e488f3
but a wildcard match in cmmMachOpFoldM hid the fact that it wasn't
handled. Ideally we would eliminate the match but this appears to be a
larger task.
Fixes #18141.
|
|
|
|
| |
In BSD grep this flag only affects directory recursion.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, it was a panic because it was handled above. But there must have
been an error in my reasoning (another caller?) because #17442 reported
the panic was hit.
But, rather than figuring out what happened, I can just make it
impossible by construction. By adding just a bit more bureaucracy in the
return types, I can handle TagToEnum in the same case as all the others,
so the big case is is now total, and the panic is removed.
Fixes #17442
|
|
|
|
|
| |
The tightens up the kinds a bit. I use type synnonyms to avoid adding
promotion ticks everywhere.
|
|
|
|
| |
separate file and add -ddump-cmm-verbose-by-proc to keep old behaviour (#16930)
|
|
|
|
|
| |
This eliminates most uses of run_command in the testsuite in favor of the more
structured makefile_test.
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This for some reason or the other and makes it into the final
binary. I've added the check to ContFlowOpt as that seems
like a logical place for this.
In a regular nofib run there were 30 occurences of this pattern.
Test Plan: ci
Reviewers: bgamari, simonmar, dfeuer, jrtc27, tdammers
Reviewed By: bgamari, simonmar
Subscribers: tdammers, dfeuer, rwbarton, thomie, carter
GHC Trac Issues: #15188
Differential Revision: https://phabricator.haskell.org/D4740
|
|
- Fix the naming and comments to indicate that we are calculating
*reverse* postorder (and not the standard postorder).
- Rewrite the calculation to avoid CPS code. I found it fairly
difficult to understand and the new one seems faster (according to
nofib, decreases compiler allocations by 0.2%)
- Remove `LabelsPtr`, which seems unnecessary and could be *really*
confusing. For instance, previously:
`postorder_dfs_from <block with label X>`
and
`postorder_dfs_from <label X>`
would actually mean quite different things (and give different
results).
- Change the `Dataflow` module to always use entry of the graph for
reverse postorder calculation. This should be the only change in
behavior of this commit.
Previously, if the caller provided initial facts for some of the
labels, we would use those labels for our postorder calculation.
However, I don't think that's correct in general - if the initial
facts did not contain the entry of the graph, we would never analyze
the blocks reachable from the entry but unreachable from the labels
provided with the initial facts. It seems that the only analysis that
used this was proc-point analysis, which I think would always include
the entry block (so I don't think there's any bug due to this).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4464
|