summaryrefslogtreecommitdiff
path: root/docs/users_guide/using-optimisation.rst
Commit message (Collapse)AuthorAgeFilesLines
* release notes: Changes to CPR analysisSebastian Graf2022-01-131-3/+53
|
* Fix typosKrzysztof Gogolewski2021-12-251-1/+1
|
* Add `Opt_CoreConstantFolding` to turn on constant folding (#20500)Gergo ERDI2021-12-091-0/+11
| | | | | Previously, `-O1` and `-O2`, by way of their effect on the compilation pipeline, they implicitly turned on constant folding
* Add specific optimization flag for Cmm control flow analysis (#20500)Gergo ERDI2021-11-251-1/+11
|
* DmdAnal: Implement Boxity Analysis (#19871)Sebastian Graf2021-10-241-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes some abundant reboxing of `DynFlags` in `GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic of #19407) by introducing a Boxity analysis to GHC, done as part of demand analysis. This allows to accurately capture ad-hoc unboxing decisions previously made in worker/wrapper in demand analysis now, where the boxity info can propagate through demand signatures. See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in `Note [No lazy, Unboxed demand in demand signature]`, but `Note [Finalising boxity for demand signature]` is probably a better entry-point. To support the fix for #19407, I had to change (what was) `Note [Add demands for strict constructors]` a bit (now `Note [Unboxing evaluated arguments]`). In particular, we now take care of it in `finaliseBoxity` (which is only called from demand analaysis) instead of `wantToUnboxArg`. I also had to resurrect `Note [Product demands for function body]` and rename it to `Note [Unboxed demand on function bodies returning small products]` to avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again. See the updated Note for details. A nice side-effect is that the worker/wrapper transformation no longer needs to look at strictness info and other bits such as `InsideInlineableFun` flags (needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects boxity info from argument demands and interprets them with a severely simplified `wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys` which would be awkward to export. I spent some time figuring out the reason for why `T16197` failed prior to my amendments to `Note [Unboxing evaluated arguments]`. After having it figured out, I minimised it a bit and added `T16197b`, which simply compares computed strictness signatures and thus should be far simpler to eyeball. The 12% ghc/alloc regression in T11545 is because of the additional `Boxity` field in `Poly` and `Prod` that results in more allocation during `lubSubDmd` and `plusSubDmd`. I made sure in the ticky profiles that the number of calls to those functions stayed the same. We can bear such an increase here, as we recently improved it by -68% (in b760c1f). T18698* regress slightly because there is more unboxing of dictionaries happening and that causes Lint (mostly) to allocate more. Fixes #19871, #19407, #4267, #16859, #18907 and #13331. Metric Increase: T11545 T18698a T18698b Metric Decrease: T12425 T16577 T18223 T18282 T4267 T9961
* Documentation: use https linksKrzysztof Gogolewski2021-09-081-2/+2
|
* Enable strict dicts by default at -O2.Andreas Klebinger2021-05-271-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the common case this is a straight performance win at a compile time cost so we enable it at -O2. In rare cases it can lead to compile time regressions because of changed inlining behaviour. Which can very rarely also affect runtime performance. Increasing the inlining threshold can help to avoid this which is documented in the user guide. In terms of measured results this reduced instructions executed for nofib by 1%. However for some cases (e.g. Cabal) enabling this by default increases compile time by 2-3% so we enable it only at -O2 where it's clear that a user is willing to trade compile time for runtime. Most of the testsuite runs without -O2 so there are few perf changes. Increases: T12545/T18698: We perform more WW work because dicts are now treated strict. T9198: Also some more work because functions are now subject to W/W Decreases: T14697: Compiling empty modules. Probably because of changes inside ghc. T9203: I can't reproduce this improvement locally. Might be spurious. ------------------------- Metric Decrease: T12227 T14697 T9203 Metric Increase: T9198 T12545 T18698a T18698b -------------------------
* users guide: Various other cleanupsBen Gamari2021-04-261-1/+0
|
* users-guide: Document deprecation of -funfolding-keeness-factorBen Gamari2021-04-261-0/+9
| | | | See #15304.
* Worker/wrapper: Consistent namesSebastian Graf2021-04-201-1/+1
|
* DmdAnal: Better syntax for demand signatures (#19016)Sebastian Graf2021-03-031-30/+30
| | | | | | | | | The update of the Outputable instance resulted in a slew of documentation changes within Notes that used the old syntax. The most important doc changes are to `Note [Demand notation]` and the user's guide. Fixes #19016.
* Reduce inlining in deeply-nested casesSimon Peyton Jones2021-02-091-0/+69
| | | | | | | | | | | | This adds a new heuristic, controllable via two new flags to better tune inlining behaviour. The new flags are -funfolding-case-threshold and -funfolding-case-scaling which are document both in the user guide and in Note [Avoid inlining into deeply nested cases]. Co-authored-by: Andreas Klebinger <klebinger.andreas@gmx.at>
* Fix typosBrian Wignall2021-02-061-2/+2
|
* Documentation fixesKrzysztof Gogolewski2021-01-301-5/+4
| | | | | | | | | | | | | | | - Add missing :since: for NondecreasingIndentation and OverlappingInstances - Remove duplicated descriptions for Safe Haskell flags and UndecidableInstances. Instead, the sections contain a link. - compare-flags: Also check for options supported by ghci. This uncovered two more that are not documented. The flag -smp was removed. - Formatting fixes - Remove the warning about -XNoImplicitPrelude - it was written in 1996, the extension is no longer dangerous. - Fix misspelled :reverse: flags Fixes #18958.
* Fix description of -fregs-graph (not implied by -O2, linked issue was closed)Benjamin Maurer2021-01-301-4/+1
|
* Fix parsing of -fstg-lift-lams-non-recKrzysztof Gogolewski2021-01-291-2/+2
| | | | | | | | -fstg-lift-lams-rec-* and -fstg-lift-lams-non-rec-* were setting the same field. Fix manual: -fstg-lift-lams-non-rec-args is disabled by -fstg-lift-lams-non-rec-args-any, there's no -fno-stg-lift-*.
* users guide: Fix syntax errorsBen Gamari2020-12-111-23/+21
| | | | Fixes errors introduced by 3a55b3a2574f913d046f3a6f82db48d7f6df32e3.
* doc: Extra-clarify -fomit-yieldsKirill Elagin2020-12-101-1/+1
| | | | Be more clear on what this optimisation being on by default means in terms of yields.
* doc: Clarify the default for -fomit-yieldsKirill Elagin2020-12-101-1/+1
| | | | | | “Yield points enabled” is confusing (and probably wrong? I am not 100% sure what it means). Change it to a simple “on”. Undo this change from 2c23fff2e03e77187dc4d01f325f5f43a0e7cad2.
* Update user's guide entry on demand analysis and worker/wrapperSebastian Graf2020-11-201-17/+158
| | | | | | | | | The demand signature notation has been undocumented for a long time. The only source to understand it, apart from reading the `Outputable` instance, has been an outdated wiki page. Since the previous commits have reworked the demand lattice, I took it as an opportunity to also write some documentation about notation.
* Update inlining flags documentationMatthew Pickering2020-11-031-13/+0
|
* Add flags for annotating Generic{,1} methods INLINE[1] (#11068)Andrzej Rybczak2020-10-151-0/+44
| | | | | | | | Makes it possible for GHC to optimize away intermediate Generic representation for more types. Metric Increase: T12227
* Various documentation fixesKrzysztof Gogolewski2020-09-251-1/+1
| | | | | | | | | | | * Remove UnliftedFFITypes from conf. Some time ago, this extension was undocumented and we had to silence a warning. This is no longer needed. * Use r'' in conf.py. This fixes a Sphinx warning: WARNING: Support for evaluating Python 2 syntax is deprecated and will be removed in Sphinx 4.0. Convert docs/users_guide/conf.py to Python 3 syntax. * Mark GHCForeignImportPrim as documented * Fix formatting in template_haskell.rst * Remove 'recursive do' from the list of unsupported items in TH
* Deprecate -fdmd-tx-dict-sel.Andreas Klebinger2020-07-221-2/+2
| | | | | | | | | | | It's behaviour is now unconditionally enabled as it's slightly beneficial. There are almost no benchmarks which benefit from disabling it, so it's not worth the keep this configurable. This fixes #18429.
* docs/users-guide: Update default -funfolding-use-threshold valueBen Gamari2020-07-141-2/+2
| | | | | This was changed in 3d2991f8 but I neglected to update the documentation. Fixes #18419.
* Make WorkWrap.Lib.isWorkerSmallEnough aware of the old aritySebastian Graf2020-05-261-3/+4
| | | | | | | | | | We should allow a wrapper with up to 82 parameters when the original function had 82 parameters to begin with. I verified that this made no difference on NoFib, but then again it doesn't use huge records... Fixes #18122.
* Refactor linear reg alloc to remember past assignments.Andreas Klebinger2020-05-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When assigning registers we now first try registers we assigned to in the past, instead of picking the "first" one. This is in extremely helpful when dealing with loops for which variables are dead for part of the loop. This is important for patterns like this: foo = arg1 loop: use(foo) ... foo = getVal() goto loop; There we: * assign foo to the register of arg1. * use foo, it's dead after this use as it's overwritten after. * do other things. * look for a register to put foo in. If we pick an arbitrary one it might differ from the register the start of the loop expect's foo to be in. To fix this we simply look for past register assignments for the given variable. If we find one and the register is free we use that register. This reduces the need for fixup blocks which match the register assignment between blocks. In the example above between the end and the head of the loop. This patch also moves branch weight estimation ahead of register allocation and adds a flag to control it (cmm-static-pred). * It means the linear allocator is more likely to assign the hotter code paths first. * If it assign these first we are: + Less likely to spill on the hot path. + Less likely to introduce fixup blocks on the hot path. These two measure combined are surprisingly effective. Based on nofib we get in the mean: * -0.9% instructions executed * -0.1% reads/writes * -0.2% code size. * -0.1% compiler allocations. * -0.9% compile time. * -0.8% runtime. Most of the benefits are simply a result of removing redundant moves and spills. Reduced compiler allocations likely are the result of less code being generated. (The added lookup is mostly non-allocating).
* Fix typos, via a Levenshtein-style correctorBrian Wignall2020-01-041-1/+1
|
* Rephrase note on full-lazinessLeif Metcalf2019-11-041-7/+6
|
* users-guide: Document -fworker-wrapperBen Gamari2019-10-081-0/+11
|
* users-guide: corrected -fmax-relevant-binds reverse to be ↵James Foster2019-07-191-3/+3
| | | | -fno-max-relevant-binds
* Minor spelling fixes to users guide.P.C. Shyamshankar2019-05-291-1/+1
|
* Apply suggestion to docs/users_guide/using-optimisation.rstGiles Anderson2019-04-151-1/+1
|
* Document how -O3 is handled by GHCGiles Anderson2019-04-151-0/+11
| | | | | -O2 is the highest value of optimization. -O3 will be reverted to -O2.
* users-guide: Update Wiki URLs to point to GitLabTakenobu Tani2019-03-191-1/+1
| | | | | | | | | | The user's guide uses the `ghc-wiki` macro, and substitution rules are complicated. So I manually edited `.rst` files without sed. I changed `Commentary/Latedmd` only to a different page. It is more appropriate as an example. [ci skip]
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-1/+1
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-141-0/+16
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* Fix broken links (#16125)Sven Tennie2019-01-051-6/+7
|
* Implement late lambda liftSebastian Graf2018-11-231-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a selective lambda-lifting pass late in the STG pipeline. Lambda lifting has the effect of avoiding closure allocation at the cost of having to make former free vars available at call sites, possibly enlarging closures surrounding call sites in turn. We identify beneficial cases by means of an analysis that estimates closure growth. There's a Wiki page at https://ghc.haskell.org/trac/ghc/wiki/LateLamLift. Reviewers: simonpj, bgamari, simonmar Reviewed By: simonpj Subscribers: rwbarton, carter GHC Trac Issues: #9476 Differential Revision: https://phabricator.haskell.org/D5224
* NCG: New code layout algorithm.Andreas Klebinger2018-11-171-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* vectorise: Put it out of its miseryBen Gamari2018-06-021-52/+0
| | | | | | | | | | | | | | | | | | | | | Poor DPH and its vectoriser have long been languishing; sadly it seems there is little chance that the effort will be rekindled. Every few years we discuss what to do with this mass of code and at least once we have agreed that it should be archived on a branch and removed from `master`. Here we do just that, eliminating heaps of dead code in the process. Here we drop the ParallelArrays extension, the vectoriser, and the `vector` and `primitive` submodules. Test Plan: Validate Reviewers: simonpj, simonmar, hvr, goldfire, alanz Reviewed By: simonmar Subscribers: goldfire, rwbarton, thomie, mpickering, carter Differential Revision: https://phabricator.haskell.org/D4761
* Make shortcutting at the asm stage toggleable and default for O2.Andreas Klebinger2018-04-131-0/+17
| | | | | | | | | | | | | | | | | | | Shortcutting during the asm stage of codegen is often redundant as most cases get caught during the Cmm passes. For example during compilation of all of nofib only 508 jumps are eleminated. For this reason I moved the pass from -O1 to -O2. I also made it toggleable with -fasm-shortcutting. Test Plan: ci Reviewers: bgamari Reviewed By: bgamari Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4555
* Fix syntax in -flate-specialise docsSimon Jakobi2018-03-291-0/+2
|
* Add -flate-specialise which runs a later specialisation passMatthew Pickering2018-03-191-0/+14
| | | | | | | | | | | | | | | | | | | | | Runs another specialisation pass towards the end of the optimisation pipeline. This can catch specialisation opportunities which arose from the previous specialisation pass or other inlining. You might want to use this if you are you have a type class method which returns a constrained type. For example, a type class where one of the methods implements a traversal. It is not enabled by default or any optimisation level. Only by manually enabling the flag `-flate-specialise`. Reviewers: bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4457
* Sort valid substitutions for typed holes by "relevance"Matthías Páll Gissurarson2018-01-261-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This is an initial attempt at tackling the issue of how to order the suggestions provided by the valid substitutions checker, by sorting them by creating a graph of how they subsume each other. We'd like to order them in such a manner that the most "relevant" suggestions are displayed first, so that the suggestion that the user might be looking for is displayed before more far-fetched suggestions (and thus also displayed when they'd otherwise be cut-off by the `-fmax-valid-substitutions` limit). The previous ordering was based on the order in which the elements appear in the list of imports, which I believe is less correlated with relevance than this ordering. A drawback of this approach is that, since we now want to sort the elements, we can no longer "bail out early" when we've hit the `-fmax-valid-substitutions` limit. Reviewers: bgamari, dfeuer Reviewed By: dfeuer Subscribers: dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4326
* llvmGen: Pass vector arguments in vector registers by defaultBen Gamari2017-11-021-0/+12
| | | | | | | | | | | | | | | Earlier this year Edward Kmett requested [1] that we enable passing of vector values in vector registers by default. The GHC calling convention changes have been in LLVM for a number of years now so let's just flip the switch. [1] https://mail.haskell.org/pipermail/ghc-devs/2017-March/013905.html Reviewers: austin Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4142
* Implement a dedicated exitfication pass #14152Joachim Breitner2017-10-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is described in #14152, and can be summarized: Float the exit path out of a joinrec, so that the simplifier can do more with it. See the test case for a nice example. The floating goes against what the simplifier usually does, hence we need to be careful not inline them back. The position of exitification in the pipeline was chosen after a small amount of experimentation, but may need to be improved. For example, exitification can allow rewrite rules to fire, but for that it would have to happen before the `simpl_phases`. Perf.haskell.org reports these nice performance wins: Nofib allocations fannkuch-redux 78446640 - 99.92% 64560 k-nucleotide 109466384 - 91.32% 9502040 simple 72424696 - 5.96% 68109560 Nofib instruction counts fannkuch-redux 1744331636 - 3.86% 1676999519 k-nucleotide 2318221965 - 6.30% 2172067260 scs 1978470869 - 3.35% 1912263779 simple 669858104 - 3.38% 647206739 spectral-norm 186423292 - 5.37% 176411536 Differential Revision: https://phabricator.haskell.org/D3903
* user-guide: Clarify default optimization flagsBen Gamari2017-10-251-6/+7
| | | | | | | | | | | | | | | | Begins to fix #14214. [skip ci] Test Plan: Read it. Reviewers: austin Subscribers: rwbarton, thomie GHC Trac Issues: #14214 Differential Revision: https://phabricator.haskell.org/D4098
* users_guide: Convert mkUserGuidePart generation to a Sphinx extensionPatrick Dougherty2017-08-181-2/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes all dependencies the users guide had on `mkUserGuidePart`. The generation of the flag reference table and the various pieces of the man page is now entirely contained within the Spinx extension `flags.py`. You can see the man page generation on the orphan page https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghc.html The extension works by collecting all of the meta-data attached to the `ghc-flag` directives and then formatting and displaying it at `flag-print` directives. There is a single printing directive that can be customized with two options, what format to display (table, list, or block of flags) and an optional category to limit the output to (verbosity, warnings, codegen, etc.). New display formats can be added by creating a function `generate_flag_xxx` (where `xxx` is a description of the format) which takes a list of flags and a category and returns a new `xxx`. Then just add a reference in the dispatch table `handlers`. That display can now be run by passing `:type: xxx` to the `flag-print` directive. `flags.py` contains two maps of settings that can be adjusted. The first is a canonical list of flag categories, and the second sets default categories for files. The only functionality that Sphinx could not replace was the `what_glasgow_exts_does.gen.rst` file. `mkUserGuidePart` actually just reads the list of flags from `compiler/main/DynFlags.hs` which Sphinx cannot do. As the flag is deprecated, I added the list as a static file which can be updated manually. Additionally, this patch updates every single documented flag with the data from `mkUserGuidePart` to generate the reference table. Fixes #11654 and, incidentally, #12155. Reviewers: austin, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #11654, #12155 Differential Revision: https://phabricator.haskell.org/D3839
* users-guide: Standardize and repair all flag referencesPatrick Dougherty2017-07-231-26/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does three things: 1.) It simplifies the flag parsing code in `conf.py` to properly display flag definitions created by `.. (ghc|rts)-flag::`. Additionally, all flag references must include the associated arguments. Documentation has been added to `editing-guide.rst` to explain this. 2.) It normalizes all flag definitions to a similar format. Notably, all instances of `<>` have been replaced with `⟨⟩`. All references across the users guide have been updated to match. 3.) It fixes a couple issues with the flag reference table's generation code, which did not handle comma separated flags in the same cell and did not properly reference flags with arguments. Test Plan: `SPHINXOPTS = -n` to activate "nitpicky" mode, which reports all broken references. All remaining errors are references to flags without any documentation. Reviewers: austin, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #13980 Differential Revision: https://phabricator.haskell.org/D3778