| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This MR fixes #21694, #21755. It also makes sure that #21948 and
fix to #21694.
* For #21694 the underlying problem was that we were calling arityType
on an expression that had free join points. This is a Bad Bad Idea.
See Note [No free join points in arityType].
* To make "no free join points in arityType" work out I had to avoid
trying to use eta-expansion for runRW#. This entailed a few changes
in the Simplifier's treatment of runRW#. See
GHC.Core.Opt.Simplify.Iteration Note [No eta-expansion in runRW#]
* I also made andArityType work correctly with -fpedantic-bottoms;
see Note [Combining case branches: andWithTail].
* Rewrote Note [Combining case branches: optimistic one-shot-ness]
* arityType previously treated join points differently to other
let-bindings. This patch makes them unform; arityType analyses
the RHS of all bindings to get its ArityType, and extends am_sigs.
I realised that, now we have am_sigs giving the ArityType for
let-bound Ids, we don't need the (pre-dating) special code in
arityType for join points. But instead we need to extend the env for
Rec bindings, which weren't doing before. More uniform now. See
Note [arityType for let-bindings].
This meant we could get rid of ae_joins, and in fact get rid of
EtaExpandArity altogether. Simpler.
* And finally, it was the strange treatment of join-point Ids in
arityType (involving a fake ABot type) that led to a serious bug:
#21755. Fixed by this refactoring, which treats them uniformly;
but without breaking #18328.
In fact, the arity for recursive join bindings is pretty tricky;
see the long Note [Arity for recursive join bindings]
in GHC.Core.Opt.Simplify.Utils. That led to more refactoring,
including deciding that an Id could have an Arity that is bigger
than its JoinArity; see Note [Invariants on join points], item
2(b) in GHC.Core
* Make sure that the "demand threshold" for join points in DmdAnal
is no bigger than the join-arity. In GHC.Core.Opt.DmdAnal see
Note [Demand signatures are computed for a threshold arity based on idArity]
* I moved GHC.Core.Utils.exprIsDeadEnd into GHC.Core.Opt.Arity,
where it more properly belongs.
* Remove an old, redundant hack in FloatOut. The old Note was
Note [Bottoming floats: eta expansion] in GHC.Core.Opt.SetLevels.
Compile time improves very slightly on average:
Metrics: compile_time/bytes allocated
---------------------------------------------------------------------------------------
T18223(normal) ghc/alloc 725,808,720 747,839,216 +3.0% BAD
T6048(optasm) ghc/alloc 105,006,104 101,599,472 -3.2% GOOD
geo. mean -0.2%
minimum -3.2%
maximum +3.0%
For some reason Windows was better
T10421(normal) ghc/alloc 125,888,360 124,129,168 -1.4% GOOD
T18140(normal) ghc/alloc 85,974,520 83,884,224 -2.4% GOOD
T18698b(normal) ghc/alloc 236,764,568 234,077,288 -1.1% GOOD
T18923(normal) ghc/alloc 75,660,528 73,994,512 -2.2% GOOD
T6048(optasm) ghc/alloc 112,232,512 108,182,520 -3.6% GOOD
geo. mean -0.6%
I had a quick look at T18223 but it is knee deep in coercions and
the size of everything looks similar before and after. I decided
to accept that 3% increase in exchange for goodness elsewhere.
Metric Decrease:
T10421
T18140
T18698b
T18923
T6048
Metric Increase:
T18223
|
| |
|
|
|
|
| |
Closes #22092.
|
| |
|
|
|
|
|
|
|
| |
This patch improves the uniformity of error message formatting by
printing constraints in quotes, as we do for types.
Fix #21167
|
|
|
|
| |
Now we also filter the local rules (again) which fixes the issue.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 415468fef8a3e9181b7eca86de0e05c0cce31729.
This refactoring introduced quite a severe residency regression (900MB
live from 650MB live when compiling mmark), see #21993 for a reproducer
and more discussion.
Ticket #21993
|
|
|
|
|
|
|
|
| |
If these thunks are not forced then the entire unfolding for the binding
is live throughout the whole of CodeGen despite the fact it should have
been discarded.
Fixes #22071
|
|
|
|
| |
This is another symptom of #19619
|
|
|
|
|
|
| |
It's better to perform this projection from Id to Name strictly so we
don't retain an old Id (hence IdInfo, hence Unfolding, hence everything
etc)
|
|
|
|
|
|
|
|
|
| |
Since 2011 the object-joining implementation has had a hack to pass
`--build-id=none` to `ld` when supported, seemingly to work around a
linker bug. This hack is now unnecessary and may break downstream users
who expect objects to have valid build-ids. Remove it.
Closes #22060.
|
|
|
|
|
|
|
| |
This fixes #22065. We were failing to retain a quantifier that
was mentioned in the kind of another retained quantifier.
Easy to fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The -x option is used to manually specify which phase a file should be
started to be compiled from (even if it lacks the correct extension). I
just failed to implement this when refactoring the driver.
In particular Cabal calls GHC with `-E -cpp -x hs Foo.cpphs` to
preprocess source files using GHC.
I added a test to exercise this case.
Fixes #22044
|
| |
|
|
|
|
|
|
|
|
|
| |
Our aliasification logic would previously turn builtin LLVM variables
into aliases, which apparently confuses LLVM. This manifested in
initializers failing to be emitted, resulting in many profiling failures
with the LLVM backend.
Fixes #22019.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the code
{-# LANGUAGE OverloadedRecordUpdate #-}
operatorUpdate f = f{(+) = 1}
There are no exact print annotations for the parens around the +
symbol, nor does normal ppr print them.
This MR fixes that.
Closes #21805
Updates haddock submodule
|
| |
|
|
|
|
|
|
|
|
|
| |
Apple's ABI documentation [1] says: "The platforms reserve register x18.
Don’t use this register." While this wasn't problematic in previous
Darwin releases, macOS 13 appears to start zeroing this register
periodically. See #21964.
[1] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new command-line flag:
-fplugin-library=<file-path>;<unit-id>;<module>;<args>
used like this:
-fplugin-library=path/to/plugin.so;package-123;Plugin.Module;["Argument","List"]
It allows a plugin to be loaded directly from a shared library. With
this approach, GHC doesn't compile anything for the plugin and doesn't
load any .hi file for the plugin and its dependencies. As such GHC
doesn't need to support two environments (one for plugins, one for
target code), which was the more ambitious approach tracked in #14335.
Fix #20964
Co-authored-by: Josh Meredith <joshmeredith2008@gmail.com>
|
|
|
|
|
|
|
|
| |
The size_up_alloc function mistakenly considered any type that isn't
lifted to not allocate anything, which is wrong. What we want instead
is to check the type isn't boxed. This accounts for (BoxedRep Unlifted).
Fixes #21939
|
|
|
|
|
|
|
|
|
|
| |
* Remove hack when printing OccNames. No longer needed since e3dcc0d5
* Remove unused `pprCmms` and `instance Outputable Instr`
* Simplify `pprCLabel` (no need to pass platform)
* Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by
ImmLit, but that can take just a String instead.
* Remove instance `Outputable CLabel` - proper output of labels
needs a platform, and is done by the `OutputableP` instance
|
|
|
|
| |
This addresses one part of #21710.
|
| |
|
| |
|
|
|
|
|
|
|
| |
This eliminates the thread label HashTable and instead tracks this
information in the TSO, allowing us to use proper StgArrBytes arrays for
backing the label and greatly simplifying management of object lifetimes
when we expose them to the user with the coming `threadLabel#` primop.
|
|
|
|
|
|
|
| |
A user came to #ghc yesterday wondering how best to check whether they
were leaking threads. We ended up using the eventlog but it seems to me
like it would be generally useful if Haskell programs could query their
own threads.
|
|
|
|
|
|
| |
These two uses constructed maps, which is a case where foldl' is
generally more efficient since we avoid constructing an intermediate
O(n)-depth stack.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former behaviour of adding cost centres after optimization but
before unfoldings are created is not available via the flag
`prof-late-inline` instead.
I also reduced the overhead of -fprof-late* by pushing the cost centres
into lambdas. This means the cost centres will only account for
execution of functions and not their partial application.
Further I made LATE_CC cost centres it's own CC flavour so they now
won't clash with user defined ones if a user uses the same string for
a custom scc.
LateCC: Don't put cost centres inside constructor workers.
With -fprof-late they are rarely useful as the worker is usually
inlined. Even if the worker is not inlined or we use -fprof-late-linline
they are generally not helpful but bloat compile and run time
significantly. So we just don't add sccs inside constructor workers.
-------------------------
Metric Decrease:
T13701
-------------------------
|
|
|
|
|
|
|
|
|
|
|
| |
When profiling is enabled we must enter functions that might represent
thunks in order for their sccs to show up in the profile.
We might allocate even if the function is already evaluated in this
case. So we can't consider any potential function thunk to be a simple
scrut when profiling.
Not doing so caused profiled binaries to segfault.
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously ce8745952f99174ad9d3bdc7697fd086b47cdfb5 assumed that it was
safe to clobber the switch variable when generating code for a jump
table since we were at the end of a block. However, this assumption is
wrong; the register could be live in the jump target.
Fixes #21968.
|
| |
|
|
|
|
| |
We don't actually emit rodata16 sections anywhere.
|
|
|
|
|
|
|
|
| |
I realised hydration was completely irrelavant for this cache because
the ModDetails are pruned from the result. So now it simplifies things a
lot to just store the ModIface and Linkable, which we can put into the
cache straight away rather than wait for the final version of a
HomeModInfo to appear.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes quite a tricky leak where we would end up retaining
stale ModDetails due to rehydrating modules against non-finalised
interfaces.
== Loops with multiple boot files
It is possible for a module graph to have a loop (SCC, when ignoring boot files)
which requires multiple boot files to break. In this case we must perform the
necessary hydration steps before and after compiling modules which have boot files
which are described above for corectness but also perform an additional hydration step
at the end of the SCC to remove space leaks.
Consider the following example:
┌───────┐ ┌───────┐
│ │ │ │
│ A │ │ B │
│ │ │ │
└─────┬─┘ └───┬───┘
│ │
┌────▼─────────▼──┐
│ │
│ C │
└────┬─────────┬──┘
│ │
┌────▼──┐ ┌───▼───┐
│ │ │ │
│ A-boot│ │ B-boot│
│ │ │ │
└───────┘ └───────┘
A, B and C live together in a SCC. Say we compile the modules in order
A-boot, B-boot, C, A, B then when we compile A we will perform the hydration steps
(because A has a boot file). Therefore C will be hydrated relative to A, and the
ModDetails for A will reference C/A. Then when B is compiled C will be rehydrated again,
and so B will reference C/A,B, its interface will be hydrated relative to both A and B.
Now there is a space leak because say C is a very big module, there are now two different copies of
ModDetails kept alive by modules A and B.
The way to avoid this space leak is to rehydrate an entire SCC together at the
end of compilation so that all the ModDetails point to interfaces for .hs files.
In this example, when we hydrate A, B and C together then both A and B will refer to
C/A,B.
See #21900 for some more discussion.
-------------------------------------------------------
In addition to this simple case, there is also the potential for a leak
during parallel upsweep which is also fixed by this patch. Transcibed is
Note [ModuleNameSet, efficiency and space leaks]
Note [ModuleNameSet, efficiency and space leaks]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During unsweep the results of compiling modules are placed into a MVar, to find
the environment the module needs to compile itself in the MVar is consulted and
the HomeUnitGraph is set accordingly. The reason we do this is that precisely tracking
module dependencies and recreating the HUG from scratch each time is very expensive.
In serial mode (-j1), this all works out fine because a module can only be compiled after
its dependencies have finished compiling and not interleaved with compiling module loops.
Therefore when we create the finalised or no loop interfaces, the HUG only contains
finalised interfaces.
In parallel mode, we have to be more careful because the HUG variable can contain
non-finalised interfaces which have been started by another thread. In order to avoid
a space leak where a finalised interface is compiled against a HPT which contains a
non-finalised interface we have to restrict the HUG to only the visible modules.
The visible modules is recording in the ModuleNameSet, this is propagated upwards
whilst compiling and explains which transitive modules are visible from a certain point.
This set is then used to restrict the HUG before the module is compiled to only
the visible modules and thus avoiding this tricky space leak.
Efficiency of the ModuleNameSet is of utmost importance because a union occurs for
each edge in the module graph. Therefore the set is represented directly as an IntSet
which provides suitable performance, even using a UniqSet (which is backed by an IntMap) is
too slow. The crucial test of performance here is the time taken to a do a no-op build in --make mode.
See test "jspace" for an example which used to trigger this problem.
Fixes #21900
|
|
|
|
|
|
| |
I observed some unforced thunks in the NameCache which were retaining a
whole Id, which ends up retaining a Type.. which ends up retaining old
copies of HscEnv containing stale HomeModInfo.
|
|
|
|
|
|
| |
This will only have a (very) modest impact on memory but we don't want
to retain old copies of DynFlags hanging around so best to force this
value.
|
|
|
|
| |
This fixes #21236.
|
|
|
|
|
|
| |
Previously, we had to disable defer-type-errors in splices because of #7276.
But this fix is no longer necessary, the test T7276 no longer segfaults
and is now correctly deferred.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the TCvSubst data type and instead uses Subst as
the environment for both term and type level substitution. This
change is partially motivated by the existential type proposal,
which will introduce types that contain expressions and therefore
forces us to carry around an "IdSubstEnv" even when substituting for
types. It also reduces the amount of code because "Subst" and
"TCvSubst" share a lot of common operations. There isn't any
noticeable impact on performance (geo. mean for ghc/alloc is around
0.0% but we have -94 loc and one less data type to worry abount).
Currently, the "TCvSubst" data type for substitution on types is
identical to the "Subst" data type except the former doesn't store
"IdSubstEnv". Using "Subst" for type-level substitution means there
will be a redundant field stored in the data type. However, in cases
where the substitution starts from the expression, using "Subst" for
type-level substitution saves us from having to project "Subst" into a
"TCvSubst". This probably explains why the allocation is mostly even
despite the redundant field.
The patch deletes "TCvSubst" and moves "Subst" and its relevant
functions from "GHC.Core.Subst" into "GHC.Core.TyCo.Subst".
Substitution on expressions is still defined in "GHC.Core.Subst" so we
don't have to expose the definition of "Expr" in the hs-boot file that
"GHC.Core.TyCo.Subst" must import to refer to "IdSubstEnv" (whose
codomain is "CoreExpr"). Most functions named fooTCvSubst are renamed
into fooSubst with a few exceptions (e.g. "isEmptyTCvSubst" is a
distinct function from "isEmptySubst"; the former ignores the
emptiness of "IdSubstEnv"). These exceptions mainly exist for
performance reasons and will go away when "Expr" and "Type" are
mutually recursively defined (we won't be able to take those
shortcuts if we can't make the assumption that expressions don't
appear in types).
|
|
|
|
|
| |
Instead of `` `cast` <Co:11> :: (Some -> Really -> Large Type)``
simply print `` `cast` <Co:11> :: ... ``
|
|
|
|
| |
Fixes #21866
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was an assert error, as Gergo pointed out in #21896.
I fixed this by adding an InScopeSet argument to tcUnifyTyWithTFs.
And also to GHC.Core.Unify.niFixTCvSubst.
I also took the opportunity to get a couple more InScopeSets right,
and to change some substTyUnchecked into substTy.
This MR touches a lot of other files, but only because I also took the
opportunity to introduce mkInScopeSetList, and use it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch addresses #21831, point 2. See
Note [generaliseDictPats] in SpecConstr
I took the opportunity to refactor the construction of specialisation
rules a bit, so that the rule name says what type we are specialising
at.
Surprisingly, there's a 20% decrease in compile time for test
perf/compiler/T18223. I took a look at it, and the code size seems the
same throughout. I did a quick ticky profile which seemed to show a
bit less substitution going on. Hmm. Maybe it's the "don't do
eta-expansion in stable unfoldings" patch, which is part of the
same MR as this patch.
Anyway, since it's a move in the right direction, I didn't think it
was worth looking into further.
Metric Decrease:
T18223
|
|
|
|
|
|
| |
I think this change will make little difference except to reduce
clutter. But that's it -- if it causes problems we can switch it
on again.
|
|
|
|
| |
This fixes (1) in #21831. Easy, obviously correct.
|