| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
| |
Previously, `-O1` and `-O2`, by way of their effect on the compilation
pipeline, they implicitly turned on constant folding
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes some abundant reboxing of `DynFlags` in
`GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
of #19407) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.
See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
`Note [No lazy, Unboxed demand in demand signature]`, but
`Note [Finalising boxity for demand signature]` is probably a better entry-point.
To support the fix for #19407, I had to change (what was)
`Note [Add demands for strict constructors]` a bit
(now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
it in `finaliseBoxity` (which is only called from demand analaysis) instead of
`wantToUnboxArg`.
I also had to resurrect `Note [Product demands for function body]` and rename
it to `Note [Unboxed demand on function bodies returning small products]` to
avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
See the updated Note for details.
A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as `InsideInlineableFun` flags
(needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
`wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
which would be awkward to export.
I spent some time figuring out the reason for why `T16197` failed prior to my
amendments to `Note [Unboxing evaluated arguments]`. After having it figured
out, I minimised it a bit and added `T16197b`, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.
The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.
Fixes #19871, #19407, #4267, #16859, #18907 and #13331.
Metric Increase:
T11545
T18698a
T18698b
Metric Decrease:
T12425
T16577
T18223
T18282
T4267
T9961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the common case this is a straight performance win
at a compile time cost so we enable it at -O2.
In rare cases it can lead to compile time regressions
because of changed inlining behaviour. Which can very
rarely also affect runtime performance.
Increasing the inlining threshold can help to avoid this
which is documented in the user guide.
In terms of measured results this reduced instructions executed
for nofib by 1%.
However for some cases (e.g. Cabal) enabling this
by default increases compile time by 2-3% so we enable it only
at -O2 where it's clear that a user is willing to trade compile
time for runtime.
Most of the testsuite runs without -O2 so there are few
perf changes.
Increases:
T12545/T18698: We perform more WW work because dicts are now treated strict.
T9198: Also some more work because functions are now subject to W/W
Decreases:
T14697: Compiling empty modules. Probably because of changes inside ghc.
T9203: I can't reproduce this improvement locally. Might be spurious.
-------------------------
Metric Decrease:
T12227
T14697
T9203
Metric Increase:
T9198
T12545
T18698a
T18698b
-------------------------
|
| |
|
|
|
|
| |
See #15304.
|
| |
|
|
|
|
|
|
|
|
|
| |
The update of the Outputable instance resulted in a slew of
documentation changes within Notes that used the old syntax.
The most important doc changes are to `Note [Demand notation]`
and the user's guide.
Fixes #19016.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new heuristic, controllable via two new flags to
better tune inlining behaviour.
The new flags are -funfolding-case-threshold and
-funfolding-case-scaling which are document both
in the user guide and in
Note [Avoid inlining into deeply nested cases].
Co-authored-by: Andreas Klebinger <klebinger.andreas@gmx.at>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add missing :since: for NondecreasingIndentation and OverlappingInstances
- Remove duplicated descriptions for Safe Haskell flags and
UndecidableInstances. Instead, the sections contain a link.
- compare-flags: Also check for options supported by ghci.
This uncovered two more that are not documented.
The flag -smp was removed.
- Formatting fixes
- Remove the warning about -XNoImplicitPrelude - it was written in 1996,
the extension is no longer dangerous.
- Fix misspelled :reverse: flags
Fixes #18958.
|
| |
|
|
|
|
|
|
|
|
| |
-fstg-lift-lams-rec-* and -fstg-lift-lams-non-rec-* were setting the same
field.
Fix manual: -fstg-lift-lams-non-rec-args is disabled by
-fstg-lift-lams-non-rec-args-any, there's no -fno-stg-lift-*.
|
|
|
|
| |
Fixes errors introduced by 3a55b3a2574f913d046f3a6f82db48d7f6df32e3.
|
|
|
|
| |
Be more clear on what this optimisation being on by default means
in terms of yields.
|
|
|
|
|
|
| |
“Yield points enabled” is confusing (and probably wrong?
I am not 100% sure what it means). Change it to a simple “on”.
Undo this change from 2c23fff2e03e77187dc4d01f325f5f43a0e7cad2.
|
|
|
|
|
|
|
|
|
| |
The demand signature notation has been undocumented for a long time.
The only source to understand it, apart from reading the `Outputable`
instance, has been an outdated wiki page.
Since the previous commits have reworked the demand lattice, I took
it as an opportunity to also write some documentation about notation.
|
| |
|
|
|
|
|
|
|
|
| |
Makes it possible for GHC to optimize away intermediate Generic representation
for more types.
Metric Increase:
T12227
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove UnliftedFFITypes from conf. Some time ago, this extension
was undocumented and we had to silence a warning.
This is no longer needed.
* Use r'' in conf.py. This fixes a Sphinx warning:
WARNING: Support for evaluating Python 2 syntax is deprecated and will be removed in Sphinx 4.0. Convert docs/users_guide/conf.py to Python 3 syntax.
* Mark GHCForeignImportPrim as documented
* Fix formatting in template_haskell.rst
* Remove 'recursive do' from the list of unsupported items in TH
|
|
|
|
|
|
|
|
|
|
|
| |
It's behaviour is now unconditionally enabled as
it's slightly beneficial.
There are almost no benchmarks which benefit from
disabling it, so it's not worth the keep this
configurable.
This fixes #18429.
|
|
|
|
|
| |
This was changed in 3d2991f8 but I neglected to update the
documentation. Fixes #18419.
|
|
|
|
|
|
|
|
|
|
| |
We should allow a wrapper with up to 82 parameters when the original
function had 82 parameters to begin with.
I verified that this made no difference on NoFib, but then again
it doesn't use huge records...
Fixes #18122.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When assigning registers we now first try registers we
assigned to in the past, instead of picking the "first"
one.
This is in extremely helpful when dealing with loops for
which variables are dead for part of the loop.
This is important for patterns like this:
foo = arg1
loop:
use(foo)
...
foo = getVal()
goto loop;
There we:
* assign foo to the register of arg1.
* use foo, it's dead after this use as it's overwritten after.
* do other things.
* look for a register to put foo in.
If we pick an arbitrary one it might differ from the register the
start of the loop expect's foo to be in.
To fix this we simply look for past register assignments for
the given variable. If we find one and the register is free we
use that register.
This reduces the need for fixup blocks which match the register
assignment between blocks. In the example above between the end
and the head of the loop.
This patch also moves branch weight estimation ahead of register
allocation and adds a flag to control it (cmm-static-pred).
* It means the linear allocator is more likely to assign the hotter
code paths first.
* If it assign these first we are:
+ Less likely to spill on the hot path.
+ Less likely to introduce fixup blocks on the hot path.
These two measure combined are surprisingly effective. Based on nofib
we get in the mean:
* -0.9% instructions executed
* -0.1% reads/writes
* -0.2% code size.
* -0.1% compiler allocations.
* -0.9% compile time.
* -0.8% runtime.
Most of the benefits are simply a result of removing redundant moves
and spills.
Reduced compiler allocations likely are the result of less code being
generated. (The added lookup is mostly non-allocating).
|
| |
|
| |
|
| |
|
|
|
|
| |
-fno-max-relevant-binds
|
| |
|
| |
|
|
|
|
|
| |
-O2 is the highest value of optimization.
-O3 will be reverted to -O2.
|
|
|
|
|
|
|
|
|
|
| |
The user's guide uses the `ghc-wiki` macro, and substitution rules
are complicated. So I manually edited `.rst` files without sed.
I changed `Commentary/Latedmd` only to a different page.
It is more appropriate as an example.
[ci skip]
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an optimization into the NCG: for large strings
(threshold configurable via -fbinary-blob-threshold=NNN flag), instead
of printing `.asciz "..."` in the generated ASM source, we print
`.incbin "tmpXXX.dat"` and we dump the contents of the string into a
temporary "tmpXXX.dat" file.
See the note for more details.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This implements a selective lambda-lifting pass late in the STG
pipeline.
Lambda lifting has the effect of avoiding closure allocation at the cost
of having to make former free vars available at call sites, possibly
enlarging closures surrounding call sites in turn.
We identify beneficial cases by means of an analysis that estimates
closure growth.
There's a Wiki page at
https://ghc.haskell.org/trac/ghc/wiki/LateLamLift.
Reviewers: simonpj, bgamari, simonmar
Reviewed By: simonpj
Subscribers: rwbarton, carter
GHC Trac Issues: #9476
Differential Revision: https://phabricator.haskell.org/D5224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements a new code layout algorithm.
It has been tested for x86 and is disabled on other platforms.
Performance varies slightly be CPU/Machine but in general seems to be better
by around 2%.
Nofib shows only small differences of about +/- ~0.5% overall depending on
flags/machine performance in other benchmarks improved significantly.
Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec,
containers, text and xeno.
While the magnitude of gains differed three different CPUs where tested with
all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell,
Skylake
* Library benchmark results summarized:
* containers: ~1.5% faster
* aeson: ~2% faster
* megaparsec: ~2-5% faster
* xml library benchmarks: 0.2%-1.1% faster
* vector-benchmarks: 1-4% faster
* text: 5.5% faster
On average GHC compile times go down, as GHC compiled with the new layout
is faster than the overhead introduced by using the new layout algorithm,
Things this patch does:
* Move code responsilbe for block layout in it's own module.
* Move the NcgImpl Class into the NCGMonad module.
* Extract a control flow graph from the input cmm.
* Update this cfg to keep it in sync with changes during
asm codegen. This has been tested on x64 but should work on x86.
Other platforms still use the old codelayout.
* Assign weights to the edges in the CFG based on type and limited static
analysis which are then used for block layout.
* Once we have the final code layout eliminate some redundant jumps.
In particular turn a sequences of:
jne .foo
jmp .bar
foo:
into
je bar
foo:
..
Test Plan: ci
Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton
GHC Trac Issues: #15124
Differential Revision: https://phabricator.haskell.org/D4726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Poor DPH and its vectoriser have long been languishing; sadly it seems there is
little chance that the effort will be rekindled. Every few years we discuss
what to do with this mass of code and at least once we have agreed that it
should be archived on a branch and removed from `master`. Here we do just that,
eliminating heaps of dead code in the process.
Here we drop the ParallelArrays extension, the vectoriser, and the `vector` and
`primitive` submodules.
Test Plan: Validate
Reviewers: simonpj, simonmar, hvr, goldfire, alanz
Reviewed By: simonmar
Subscribers: goldfire, rwbarton, thomie, mpickering, carter
Differential Revision: https://phabricator.haskell.org/D4761
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shortcutting during the asm stage of codegen is often redundant as most
cases get caught during the Cmm passes. For example during compilation
of all of nofib only 508 jumps are eleminated.
For this reason I moved the pass from -O1 to -O2. I also made it
toggleable with -fasm-shortcutting.
Test Plan: ci
Reviewers: bgamari
Reviewed By: bgamari
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4555
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Runs another specialisation pass towards the end of the optimisation
pipeline. This can catch specialisation opportunities which arose from
the previous specialisation pass or other inlining.
You might want to use this if you are you have a type class method
which returns a constrained type. For example, a type class where one
of the methods implements a traversal.
It is not enabled by default or any optimisation level. Only by
manually enabling the flag `-flate-specialise`.
Reviewers: bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an initial attempt at tackling the issue of how to order the
suggestions provided by the valid substitutions checker, by sorting
them by creating a graph of how they subsume each other. We'd like to
order them in such a manner that the most "relevant" suggestions are
displayed first, so that the suggestion that the user might be looking
for is displayed before more far-fetched suggestions (and thus also
displayed when they'd otherwise be cut-off by the
`-fmax-valid-substitutions` limit). The previous ordering was based on
the order in which the elements appear in the list of imports, which I
believe is less correlated with relevance than this ordering.
A drawback of this approach is that, since we now want to sort the
elements, we can no longer "bail out early" when we've hit the
`-fmax-valid-substitutions` limit.
Reviewers: bgamari, dfeuer
Reviewed By: dfeuer
Subscribers: dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier this year Edward Kmett requested [1] that we enable passing of
vector values in vector registers by default. The GHC calling convention
changes have been in LLVM for a number of years now so let's just flip
the switch.
[1] https://mail.haskell.org/pipermail/ghc-devs/2017-March/013905.html
Reviewers: austin
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is described in #14152, and can be summarized: Float the exit
path out of a joinrec, so that the simplifier can do more with it.
See the test case for a nice example.
The floating goes against what the simplifier usually does, hence we
need to be careful not inline them back.
The position of exitification in the pipeline was chosen after a small
amount of experimentation, but may need to be improved. For example,
exitification can allow rewrite rules to fire, but for that it would
have to happen before the `simpl_phases`.
Perf.haskell.org reports these nice performance wins:
Nofib allocations
fannkuch-redux 78446640 - 99.92% 64560
k-nucleotide 109466384 - 91.32% 9502040
simple 72424696 - 5.96% 68109560
Nofib instruction counts
fannkuch-redux 1744331636 - 3.86% 1676999519
k-nucleotide 2318221965 - 6.30% 2172067260
scs 1978470869 - 3.35% 1912263779
simple 669858104 - 3.38% 647206739
spectral-norm 186423292 - 5.37% 176411536
Differential Revision: https://phabricator.haskell.org/D3903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Begins to fix #14214.
[skip ci]
Test Plan: Read it.
Reviewers: austin
Subscribers: rwbarton, thomie
GHC Trac Issues: #14214
Differential Revision: https://phabricator.haskell.org/D4098
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes all dependencies the users guide had on `mkUserGuidePart`.
The generation of the flag reference table and the various pieces of the
man page is now entirely contained within the Spinx extension
`flags.py`. You can see the man page generation on the orphan page
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghc.html
The extension works by collecting all of the meta-data attached to the
`ghc-flag` directives and then formatting and displaying it at
`flag-print` directives. There is a single printing directive that can
be customized with two options, what format to display (table, list, or
block of flags) and an optional category to limit the output to
(verbosity, warnings, codegen, etc.).
New display formats can be added by creating a function
`generate_flag_xxx` (where `xxx` is a description of the format) which
takes a list of flags and a category and returns a new `xxx`. Then just
add a reference in the dispatch table `handlers`. That display can now
be run by passing `:type: xxx` to the `flag-print` directive.
`flags.py` contains two maps of settings that can be adjusted. The first
is a canonical list of flag categories, and the second sets default
categories for files.
The only functionality that Sphinx could not replace was the
`what_glasgow_exts_does.gen.rst` file. `mkUserGuidePart` actually just
reads the list of flags from `compiler/main/DynFlags.hs` which Sphinx
cannot do. As the flag is deprecated, I added the list as a static file
which can be updated manually.
Additionally, this patch updates every single documented flag with the
data from `mkUserGuidePart` to generate the reference table.
Fixes #11654 and, incidentally, #12155.
Reviewers: austin, bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #11654, #12155
Differential Revision: https://phabricator.haskell.org/D3839
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does three things:
1.) It simplifies the flag parsing code in `conf.py` to properly display
flag definitions created by `.. (ghc|rts)-flag::`. Additionally, all flag
references must include the associated arguments. Documentation has been
added to `editing-guide.rst` to explain this.
2.) It normalizes all flag definitions to a similar format. Notably, all
instances of `<>` have been replaced with `⟨⟩`. All references across the
users guide have been updated to match.
3.) It fixes a couple issues with the flag reference table's generation code,
which did not handle comma separated flags in the same cell and did not
properly reference flags with arguments.
Test Plan:
`SPHINXOPTS = -n` to activate "nitpicky" mode, which reports all broken
references. All remaining errors are references to flags without any
documentation.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13980
Differential Revision: https://phabricator.haskell.org/D3778
|