diff options
author | Sebastian Graf <sgraf1337@gmail.com> | 2019-08-19 13:01:49 +0000 |
---|---|---|
committer | Sebastian Graf <sebastian.graf@kit.edu> | 2021-02-04 16:07:55 +0100 |
commit | 52a045fc390f54968bfd6c17ee28fc7baef7708e (patch) | |
tree | 9c09a7c5b69b4feb78a7c883def782e736a6c7fb /docs/users_guide/using-optimisation.rst | |
parent | 89188be1ed7c1fb44e18f5ec68bf9750f425ac10 (diff) | |
download | haskell-wip/nested-cpr-2019.tar.gz |
Nested CPR analysis (#18174)wip/nested-cpr-2019
This commit extends CPR analysis to unbox nested constructors.
See `Note [Nested CPR]` for examples.
Unboxing a function's result beyond the first level risks making the
function more strict, rendering the transformation unsound.
See `Note [Nested CPR needs Termination information]`.
To justify unboxing anyway, Nested CPR interleaves a termination
analysis that is like a higher-order `exprOkForSpeculation`.
The termination analysis makes for the bulk of complexity in this patch.
In principle, we can use the results of that analysis in many more ways
in the future to do speculative execution.
Although there are quite a few examples in test cases that are now
properly optimised (e.g., `T1600`, `T18174`, `T18894`), the results on
NoFib are rather meager:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
cacheprof -0.3% -1.4%
compress2 -1.9% -0.9%
fannkuch-redux 0.0% -1.3%
gamteb -1.6% -0.3%
nucleic2 -1.2% -0.6%
sched -0.0% +0.9%
x2n1 -0.0% -5.0%
--------------------------------------------------------------------------------
Min -1.9% -5.0%
Max +0.1% +0.9%
Geometric Mean -0.1% -0.1%
```
Allocation while compiling NoFib increases by 0.5%.
Binary sizes on NoFib increase by 0.7%.
This patch manages to fix a few tickets:
Fixes #1600, #18174, #18109
`ghc/alloc` performance generally increases.
`run/alloc` metrics improve throughout.
Justifications for metric increases:
- `MultiLayerModules` increases due to #19293.
- I could reproduce the 2.5% increase on `T13701` on fedora in a `-O0`
perf-flavoured build. With `-fno-code` or `-O2` this patch is
faster. I investigated `-v2` output, nothing obvious. It's very
similar to #19293, so I'm just going to accept it.
- The +15% `ghc/alloc` increase on `T15164` in a registerised,
validate-flavoured build does not show up under `-dshow-passes` and
has no impact on runtime. #19311
- I verified that `T13253` simply does one more round of
Simplification after Nested CPR
- I looked at heap profiles for the `ghc/max_bytes_used` increases,
which didn't show any obvious offenders.
Metric Decrease:
T1969
T9203
T9233
T9872a
T9872b
T9872c
T9872d
T12425
T12545
Metric Increase ['bytes allocated']:
T13253
MultiLayerModules
Metric Increase ['bytes allocated'] (test_env='x86_64-linux-deb9-unreg-hadrian'):
T15164
Metric Increase ['bytes allocated'] (test_env='x86_64-linux-fedora27'):
T13701
Metric Increase ['max_bytes_used'] (test_env='x86_64-darwin'):
T9675
Metric Increase ['max_bytes_used'] (test_env='x86_64-linux-deb9-dwarf'):
T9675
Metric Increase ['max_bytes_used', 'peak_megabytes_allocated']:
T10370
Diffstat (limited to 'docs/users_guide/using-optimisation.rst')
-rw-r--r-- | docs/users_guide/using-optimisation.rst | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/docs/users_guide/using-optimisation.rst b/docs/users_guide/using-optimisation.rst index ee5b1de95e..92fd62cde7 100644 --- a/docs/users_guide/using-optimisation.rst +++ b/docs/users_guide/using-optimisation.rst @@ -306,6 +306,41 @@ by saying ``-fno-wombat``. Turn on CPR analysis in the demand analyser. +.. ghc-flag:: -fcase-binder-cpr-depth + :shortdesc: Maximum depth at which case binders have the CPR property. + :type: dynamic + :category: + + :default: 1 + + Normally, case binders get the CPR property if their scrutinee had it. + But depending on whether the case binder occurs on a cold path, it may make sense + to give it the CPR property unconditionally. + + This flag controls how deep inside a constructor application we still + consider CPR binders to have th CPR property. The default is 1, so the + following function will have the CPR property: :: + + f :: Bool -> Int -> Int + f False _ = 1 + f _ x@2 = x + f _ _ = 3 + + Note that ``x`` did not occur nested inside a constructor, so depth 1. + + On the other hand, the following function will *not* have the Nested CPR + property: :: + + g :: Bool -> Int -> (Int, Int) + g False _ = (1, 1) + g _ x@2 = (x, x) + g _ _ = (3, 3) + + Because ``x`` occurs nested inside a pair, so at depth 2. + + Depth 0 will never give any CPR binder the CPR property, unless the + scrutinee had it to begin with. + .. ghc-flag:: -fcse :shortdesc: Enable common sub-expression elimination. Implied by :ghc-flag:`-O`. :type: dynamic |