|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | |  | 
| | 
| 
| 
| | Update submodule: haddock | 
| | 
| 
| 
| | Clarify code added in #17852 and MR !2724 | 
| | 
| 
| 
| | Update haddock submodule | 
| | 
| 
| 
| | submodule updates: nofib, haddock | 
| | 
| 
| 
| | fixes #17852 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The reasons for that can be found in the wiki:
https://gitlab.haskell.org/ghc/ghc/wikis/nested-cpr/split-off-cpr
We now run CPR after demand analysis (except for after the final demand
analysis run just before code gen). CPR got its own dump flags
(`-ddump-cpr-anal`, `-ddump-cpr-signatures`), but not its own flag to
activate/deactivate. It will run with `-fstrictness`/`-fworker-wrapper`.
As explained on the wiki page, this step is necessary for a sane Nested
CPR analysis. And it has quite positive impact on compiler performance:
Metric Decrease:
    T9233
    T9675
    T9961
    T15263 | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit ce64b397777408731c6dd3f5c55ea8415f9f565b on the
grounds of the regression it would introduce in a couple of packages.
Fixes #17653.
Also undoes a slight metric increase in #13701 introduced by that commit
that we didn't see prior to !1983.
Metric Decrease:
    T13701 | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | * Add 'dumpAction' hook to DynFlags.
It allows GHC API users to catch dumped intermediate codes and
information. The format of the dump (Core, Stg, raw text, etc.) is now
reported allowing easier automatic handling.
* Add 'traceAction' hook to DynFlags.
Some dumps go through the trace mechanism (for instance unfoldings that
have been considered for inlining). This is problematic because:
1) dumps aren't written into files even with -ddump-to-file on
2) dumps are written on stdout even with GHC API
3) in this specific case, dumping depends on unsafe globally stored
DynFlags which is bad for GHC API users
We introduce 'traceAction' hook which allows GHC API to catch those
traces and to avoid using globally stored DynFlags.
* Avoid dumping empty logs via dumpAction/traceAction (but still write
empty files to keep the existing behavior) | 
| | 
| 
| 
| 
| 
| | A join point was getting too large an arity, leading to #17294.
I've tightened up the invariant: see
  CoreSyn, Note [Invariants on join points], invariant 2b | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This introduces three new modules:
 - basicTypes/Predicate.hs describes predicates, moving
   this logic out of Type. Predicates don't really exist
   in Core, and so don't belong in Type.
 - typecheck/TcOrigin.hs describes the origin of constraints
   and types. It was easy to remove from other modules and
   can often be imported instead of other, scarier modules.
 - typecheck/Constraint.hs describes constraints as used in
   the solver. It is taken from TcRnTypes.
No work other than module splitting is in this patch.
This is the first step toward homogeneous equality, which will
rely more strongly on predicates. And homogeneous equality is the
next step toward a dependently typed core language. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In #14998 I realised that the notion of speculative execution
*exactly matches* eager evaluation of expressions in a case alternative
where the scrutinee is an IO action.
Normally we have to `deferIO` any result from that single case
alternative to prevent this speculative execution, so we had a special
case in place in the demand analyser that would check if the scrutinee
was a prim-op, in which case we assumed that it would be ok to do the
eager evaluation.
Now we just check if the scrutinee is `exprOkForSpeculation`,
corresponding to the notion that we want to push evaluation of the
scrutinee *after* eagerly evaluating stuff from the case alternative.
This fixes #14988, because it resolves the last open Item 4 there. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Issue #17056 revealed that we were sometimes building a case
expression whose type field (in the Case constructor) was bogus.
Consider a phantom type synonym
   type S a = Int
and we want to form the case expression
   case x of K (a::*) -> (e :: S a)
We must not make the type field of the Case constructor be (S a)
because 'a' isn't in scope.  We must instead expand the synonym.
Changes in this patch:
* Expand synonyms in the new function CoreUtils.mkSingleAltCase.
* Use mkSingleAltCase in MkCore.wrapFloat, which was the proximate
  source of the bug (when called by exprIsConApp_maybe)
* Use mkSingleAltCase elsewhere
* Documentation
    CoreSyn   new invariant (6) in Note [Case expression invariants]
    CoreSyn   Note [Why does Case have a 'Type' field?]
    CoreUtils Note [Care with the type of a case expression]
* I improved Core Lint's error reporting, which was pretty
  confusing in this case, because it didn't mention that the offending
  type was the return type of a case expression.
* A little bit of cosmetic refactoring in CoreUtils | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This does four things:
1. Look at `idArity` instead of manifest lambdas to decide whether to use LetUp
2. Compute the strictness signature in LetDown assuming at least `idArity`
   incoming arguments
3. Remove the special case for trivial RHSs, which is subsumed by 2
4. Don't perform the W/W split when doing so would eta expand a binding.
   Otherwise we would eta expand PAPs, causing unnecessary churn in the
   Simplifier.
NoFib Results
--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
 fannkuch-redux          +0.3%      0.0%
             gg          -0.0%     -0.1%
       maillist          +0.2%     +0.2%
        minimax           0.0%     +0.8%
         pretty           0.0%     -0.1%
        reptile          -0.0%     -1.2%
--------------------------------------------------------------------------------
            Min          -0.0%     -1.2%
            Max          +0.3%     +0.8%
 Geometric Mean          +0.0%     -0.0% | 
| | 
| 
| 
| 
| | This moves all URL references to Trac tickets to their corresponding
GitLab counterparts. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Trac #10069 revealed that small NOINLINE functions didn't get split
into worker and wrapper. This was due to `certainlyWillInline`
saying that any unfoldings with a guidance of `UnfWhen` inline
unconditionally. That isn't the case for NOINLINE functions, so we
catch this case earlier now.
Nofib results:
--------------------------------------------------------------------------------
        Program         Allocs    Instrs
--------------------------------------------------------------------------------
 fannkuch-redux          -0.3%      0.0%
             gg          +0.0%     +0.1%
       maillist          -0.2%     -0.2%
        minimax           0.0%     -0.8%
--------------------------------------------------------------------------------
            Min          -0.3%     -0.8%
            Max          +0.0%     +0.1%
 Geometric Mean          -0.0%     -0.0%
Fixes #10069.
-------------------------
Metric Increase:
    T9233
------------------------- | 
| | |  | 
| | 
| 
| 
| | Also use 'id' | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| | This is a follow-up to
 d77501cd5b Improvements to demand analysis
I forgot to remove some now-dead code | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch collects a few improvements triggered by Trac #15696,
and fixing Trac #16029
* Stop making toCleanDmd behave specially for unlifted types.
  This special case was the cause of stupid behaviour in Trac
  #16029.  And to my joy I discovered the let/app invariant
  rendered it unnecessary.  (Maybe the special case pre-dated
  the let/app invariant.)
  Result: less special-case handling in the compiler, and
  better perf for the compiled code.
* In WwLib.mkWWstr_one, treat seqDmd like U(AAA).  It was not
  being so treated before, which again led to stupid code.
* Update and improve Notes
There are .stderr test wibbles because we get slightly different
strictness signatures for an argumment of unlifted type:
    <L,U> rather than <S,U>        for Int#
    <S,U> rather than <S(S),U(U)>  for Int | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Previously datacon strictness was accounted for when we demand analysed a case
analysis. However, this results in pessimistic demands in some cases. For
instance, consider the program (from T10482)
    data family Bar a
    data instance Bar (a, b) = BarPair !(Bar a) !(Bar b)
    newtype instance Bar Int = Bar Int
    foo :: Bar ((Int, Int), Int) -> Int -> Int
    foo f k =
      case f of
        BarPair x y -> case burble of
                          True -> case x of
                                    BarPair p q -> ...
                          False -> ...
We really should be able to assume that `p` is already evaluated since it came
from a strict field of BarPair.
However, as written the demand analyser can not conclude this since we may end
up in the False branch of the case on `burble` (which places no demand on `x`).
By accounting for the data con strictness later, applied to the demand of the
RHS, we get the strict demand signature we want.
See Note [Add demands for strict constructors] for a more comprehensive
discussion.
Test Plan: Validate
Reviewers: simonpj, osa1, goldfire
Subscribers: rwbarton, carter
GHC Trac Issues: #15696
Differential Revision: https://phabricator.haskell.org/D5226 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In a previous patch we replaced some built-in literal constructors
(MachInt, MachWord, etc.) with a single LitNumber constructor.
In this patch we replace the `Mach` prefix of the remaining constructors
with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
Sadly the name `LitString` was already taken for a kind of FastString
and it would become misleading to have both `LitStr` (literal
constructor renamed after `MachStr`) and `LitString` (FastString
variant). Hence this patch renames the FastString variant `PtrString`
(which is more accurate) and the literal string constructor now uses the
least surprising `LitString` name.
Both `Literal` and `LitString/PtrString` have recently seen breaking
changes so doing this kind of renaming now shouldn't harm much.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4881 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Test Plan: This validates
Reviewers: simonpj, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5225 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Trac #9279 reminded us that the worker wrapper transformation copes
really badly with absent unlifted boxed bindings.
As `Note [Absent errors]` in WwLib.hs points out, we can't just use
`absentError` for unlifted bindings because there is no bottom to hide
the error in.
So instead, we synthesise a new `RubbishLit` of type
`forall (a :: TYPE 'UnliftedRep). a`, which code-gen may subsitute for
any boxed value. We choose `()`, so that there is a good chance that
the program crashes instead instead of leading to corrupt data, should
absence analysis have been too optimistic (#11126).
Reviewers: simonpj, hvr, goldfire, bgamari, simonmar
Reviewed By: simonpj
Subscribers: osa1, rwbarton, carter
GHC Trac Issues: #15627, #9279, #4306, #11126
Differential Revision: https://phabricator.haskell.org/D5153 | 
| | 
| 
| 
| 
| 
| 
| | This is pure refactoring, just adding a couple of
definitions to BasicTypes, and using them.
Plus some whitespace stuff. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch adds foldl' to GhcPrelude and changes must occurences
of foldl to foldl'. This leads to better performance especially
for quick builds where GHC does not perform strictness analysis.
It does change strictness behaviour when we use foldl' to turn
a argument list into function applications. But this is only a
drawback if code looks ONLY at the last argument but not at the first.
And as the benchmarks show leads to fewer allocations in practice
at O2.
Compiler performance for Nofib:
O2 Allocations:
        -1 s.d.                -----            -0.0%
        +1 s.d.                -----            -0.0%
        Average                -----            -0.0%
O2 Compile Time:
        -1 s.d.                -----            -2.8%
        +1 s.d.                -----            +1.3%
        Average                -----            -0.8%
O0 Allocations:
        -1 s.d.                -----            -0.2%
        +1 s.d.                -----            -0.1%
        Average                -----            -0.2%
Test Plan: ci
Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
Reviewed By: bgamari, monoidal
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4929 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This commit causes significant performance regressions:
```
bytes allocated value is too high:
    Expected    T9872d(normal) bytes allocated: 578498120 +/-5%
    Lower bound T9872d(normal) bytes allocated: 549573214
    Upper bound T9872d(normal) bytes allocated: 607423026
    Actual      T9872d(normal) bytes allocated: 677179968
    Deviation   T9872d(normal) bytes allocated:      17.1 %
bytes allocated value is too high:
    Expected    T9872c(normal) bytes allocated: 3096670112 +/-5%
    Lower bound T9872c(normal) bytes allocated: 2941836606
    Upper bound T9872c(normal) bytes allocated: 3251503618
    Actual      T9872c(normal) bytes allocated: 3601872536
    Deviation   T9872c(normal) bytes allocated:       16.3 %
bytes allocated value is too high:
    Expected    T9872b(normal) bytes allocated: 3730686224 +/-5%
    Lower bound T9872b(normal) bytes allocated: 3544151912
    Upper bound T9872b(normal) bytes allocated: 3917220536
    Actual      T9872b(normal) bytes allocated: 4374298272
    Deviation   T9872b(normal) bytes allocated:       17.3 %
bytes allocated value is too high:
    Expected    T9872a(normal) bytes allocated: 2729927408 +/-5%
    Lower bound T9872a(normal) bytes allocated: 2593431037
    Upper bound T9872a(normal) bytes allocated: 2866423779
    Actual      T9872a(normal) bytes allocated: 3225788896
    Deviation   T9872a(normal) bytes allocated:       18.2 %
```
It's not clear that this was intentional so I'm going to revert for now.
This reverts commit 2110738b280543698407924a16ac92b6d804dc36. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Trac #15445 showed that a function with an automatically
generated specialisation RULE coudl be inlined before the
RULE had a chance to fire.
This patch attaches a NOINLINE[2] activation to the Id, to
stop this happening. | 
| | 
| 
| 
| 
| 
| | Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4775 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The worker/wrapper transform needs to determine the levity of the result to
determine whether it needs to introduce a lambda to preserve laziness of the
result. For this is previously used isUnliftedType. However, this may fail in
the presence of levity polymorphism.
We now instead use isLiftedType_maybe, assuming that a lambda is needed if the
levity of the result cannot be determined.
Fixes #15186.
Test Plan: make test=T15186
Reviewers: simonpj, goldfire, tdammers
Reviewed By: simonpj
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #15186
Differential Revision: https://phabricator.haskell.org/D4755 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Matthew Pickering uncovered a bad performance hole in the way
that single-method dictionaries work, described in Trac #14955.
See Note [Do not unpack class dictionaries] in WwLib.
I tried to fix this 6 years ago, but got it slightly wrong.  This patch
fixes it, which makes a dramatic improvement in the test case.
Nofib highlights: not much happening:
  Program           Size    Allocs   Runtime   Elapsed  TotalMem
-----------------------------------------------------------------
      VSM          -0.3%     +2.7%     -7.4%     -7.4%      0.0%
cacheprof          -0.0%     +0.1%     +0.3%     +0.7%      0.0%
  integer          -0.0%     +1.1%     +7.5%     +7.5%      0.0%
      tak          -0.1%     -0.2%     0.024     0.024      0.0%
-----------------------------------------------------------------
      Min          -4.4%     -0.2%     -7.4%     -7.4%     -8.0%
      Max          +0.6%     +2.7%     +7.5%     +7.5%      0.0%
Geom Mean          -0.1%     +0.0%     +0.1%     +0.1%     -0.2%
I investigated VSM.  The patch unpacks class dictionaries a bit more
than before (i.e. does so if there is no INLINABLE pragma). And that
gives better code in VSM (less dictionary selection etc), but one closure
gets one word bigger.
I'll accept these changes in exchange for more robust performance.
Some ghci.debugger output wobbled around (order of bindings
being displayed). I have no idea why; but I accepted the changes. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch has a single significant change:
  strictness wrapper functions are inlined earlier,
  in phase 2 rather than phase 0.
As shown by Trac #15056, this gives a better chance for RULEs to fire.
Before this change, a function that would have inlined early without
strictness analyss was instead inlining late. Result: applying
"optimisation" made the program worse.
This does not make too much difference in nofib, but I've stumbled
over the problem more than once, so even a "no-change" result would be
quite acceptable.  Here are the headlines:
--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
      cacheprof          -0.5%     -0.5%     +2.5%     +2.5%      0.0%
         fulsom          -1.0%     +2.6%     -0.1%     -0.1%      0.0%
           mate          -0.6%     +2.4%     -0.9%     -0.9%      0.0%
        veritas          -0.7%    -23.2%     0.002     0.002      0.0%
--------------------------------------------------------------------------------
            Min          -1.4%    -23.2%    -12.5%    -15.3%      0.0%
            Max          +0.6%     +2.6%     +4.4%     +4.3%    +19.0%
 Geometric Mean          -0.7%     -0.2%     -1.4%     -1.7%     +0.2%
* A worthwhile reduction in binary size.
* Runtimes are not to be trusted much but look as if they
  are moving the right way.
* A really big win in veritas, described in comment:1 of
  Trac #15056; more fusion rules fired.
* I investigated the losses in 'mate' and 'fulsom'; see #15056. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This dark corner was exposed by Trac #14285.  It involves the
interaction between absence analysis and INLINABLE pragmas.
There is a full explanation in Note [aBSENT_ERROR_ID] in MkCore,
which you can read there.  The changes in this patch are
* Make exprIsHNF return True for absentError, treating
  absentError like an honorary data constructor.
* Make absentError /not/ be diverging, unlike other error Ids.
This is all a bit horrible.
* While doing this I found that exprOkForSpeculation didn't
  have a case for value lambdas so I added one.  It's not
  really called on lifted types much, but it seems like the
  right thing | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989 | 
| | 
| 
| 
| 
| 
| 
| 
| | the worker/wrapper creates an artificial INLINE pragma, which caused CSE
to not do its work. We now recognize such artificial pragmas by using
`NoUserInline` instead of `Inline` as the `InlineSpec`.
Differential Revision: https://phabricator.haskell.org/D3939 | 
| | 
| 
| 
| | This patch fixes the buglet described in Trac #13890. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | I realised (Trac #13543) that we can improve demand analysis for
join point quite straightforwardly.
The idea is explained in
    Note [Demand analysis for join points]
in DmdAnal | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reduces peak memory usage by ~30% on my test case (DynFlags),
and (probably as a result of reduced GC work) decreases compilation
time by a few percent as well.
Also fix a bug in seqStrDmd so that demeand info is fully evaluated.
Reviewers: simonpj, austin, bgamari
Reviewed By: bgamari
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D3400 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The I/O hack for demand analysis has broader and arguably more
important implications than the note expressed. Broaden it.
[skip ci]
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: carter, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3307 |