summaryrefslogtreecommitdiff
path: root/compiler
Commit message (Collapse)AuthorAgeFilesLines
* Fix arityType: -fpedantic-bottoms, join points, etcwip/T21694aSimon Peyton Jones2022-08-2512-492/+708
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This MR fixes #21694, #21755. It also makes sure that #21948 and fix to #21694. * For #21694 the underlying problem was that we were calling arityType on an expression that had free join points. This is a Bad Bad Idea. See Note [No free join points in arityType]. * To make "no free join points in arityType" work out I had to avoid trying to use eta-expansion for runRW#. This entailed a few changes in the Simplifier's treatment of runRW#. See GHC.Core.Opt.Simplify.Iteration Note [No eta-expansion in runRW#] * I also made andArityType work correctly with -fpedantic-bottoms; see Note [Combining case branches: andWithTail]. * Rewrote Note [Combining case branches: optimistic one-shot-ness] * arityType previously treated join points differently to other let-bindings. This patch makes them unform; arityType analyses the RHS of all bindings to get its ArityType, and extends am_sigs. I realised that, now we have am_sigs giving the ArityType for let-bound Ids, we don't need the (pre-dating) special code in arityType for join points. But instead we need to extend the env for Rec bindings, which weren't doing before. More uniform now. See Note [arityType for let-bindings]. This meant we could get rid of ae_joins, and in fact get rid of EtaExpandArity altogether. Simpler. * And finally, it was the strange treatment of join-point Ids in arityType (involving a fake ABot type) that led to a serious bug: #21755. Fixed by this refactoring, which treats them uniformly; but without breaking #18328. In fact, the arity for recursive join bindings is pretty tricky; see the long Note [Arity for recursive join bindings] in GHC.Core.Opt.Simplify.Utils. That led to more refactoring, including deciding that an Id could have an Arity that is bigger than its JoinArity; see Note [Invariants on join points], item 2(b) in GHC.Core * Make sure that the "demand threshold" for join points in DmdAnal is no bigger than the join-arity. In GHC.Core.Opt.DmdAnal see Note [Demand signatures are computed for a threshold arity based on idArity] * I moved GHC.Core.Utils.exprIsDeadEnd into GHC.Core.Opt.Arity, where it more properly belongs. * Remove an old, redundant hack in FloatOut. The old Note was Note [Bottoming floats: eta expansion] in GHC.Core.Opt.SetLevels. Compile time improves very slightly on average: Metrics: compile_time/bytes allocated --------------------------------------------------------------------------------------- T18223(normal) ghc/alloc 725,808,720 747,839,216 +3.0% BAD T6048(optasm) ghc/alloc 105,006,104 101,599,472 -3.2% GOOD geo. mean -0.2% minimum -3.2% maximum +3.0% For some reason Windows was better T10421(normal) ghc/alloc 125,888,360 124,129,168 -1.4% GOOD T18140(normal) ghc/alloc 85,974,520 83,884,224 -2.4% GOOD T18698b(normal) ghc/alloc 236,764,568 234,077,288 -1.1% GOOD T18923(normal) ghc/alloc 75,660,528 73,994,512 -2.2% GOOD T6048(optasm) ghc/alloc 112,232,512 108,182,520 -3.6% GOOD geo. mean -0.6% I had a quick look at T18223 but it is knee deep in coercions and the size of everything looks similar before and after. I decided to accept that 3% increase in exchange for goodness elsewhere. Metric Decrease: T10421 T18140 T18698b T18923 T6048 Metric Increase: T18223
* driver: don't actually merge objects when ar -L worksCheng Shao2022-08-241-1/+1
|
* Unbreak Haddock comments in `GHC.Core.Opt.WorkWrap.Utils`.M Farkas-Dyck2022-08-241-13/+12
| | | | Closes #22092.
* 19217 Implicitly quantify type variables in :kind commandSasha Bogicevic2022-08-191-6/+13
|
* Print constraints in quotes (#21167)Swann Moreau2022-08-191-1/+1
| | | | | | | This patch improves the uniformity of error message formatting by printing constraints in quotes, as we do for types. Fix #21167
* Fix #22048 where we failed to drop rules for -fomit-interface-pragmas.Andreas Klebinger2022-08-191-1/+2
| | | | Now we also filter the local rules (again) which fixes the issue.
* tc: warn about lazy annotations on unlifted arguments (fixes #21951)Zachary Wood2022-08-193-0/+24
|
* Revert "Refactor SpecConstr to use treat bindings uniformly"Matthew Pickering2022-08-191-260/+264
| | | | | | | | | | This reverts commit 415468fef8a3e9181b7eca86de0e05c0cce31729. This refactoring introduced quite a severe residency regression (900MB live from 650MB live when compiling mmark), see #21993 for a reproducer and more discussion. Ticket #21993
* Force unfoldings when they are cleaned-up in Tidy and CorePrepMatthew Pickering2022-08-192-3/+7
| | | | | | | | If these thunks are not forced then the entire unfolding for the binding is live throughout the whole of CodeGen despite the fact it should have been discarded. Fixes #22071
* Force `getOccFS bndr` to avoid retaining reference to Bndr.Matthew Pickering2022-08-191-2/+5
| | | | This is another symptom of #19619
* Make ru_fn field strict to avoid retaining IdsMatthew Pickering2022-08-192-2/+3
| | | | | | It's better to perform this projection from Id to Name strictly so we don't retain an old Id (hence IdInfo, hence Unfolding, hence everything etc)
* compiler: Drop --build-id=none hackBen Gamari2022-08-184-15/+1
| | | | | | | | | Since 2011 the object-joining implementation has had a hack to pass `--build-id=none` to `ld` when supported, seemingly to work around a linker bug. This hack is now unnecessary and may break downstream users who expect objects to have valid build-ids. Remove it. Closes #22060.
* Be more careful in chooseInferredQuantifiersSimon Peyton Jones2022-08-182-30/+46
| | | | | | | This fixes #22065. We were failing to retain a quantifier that was mentioned in the kind of another retained quantifier. Easy to fix.
* driver: Honour -x optionMatthew Pickering2022-08-182-40/+33
| | | | | | | | | | | | | The -x option is used to manually specify which phase a file should be started to be compiled from (even if it lacks the correct extension). I just failed to implement this when refactoring the driver. In particular Cabal calls GHC with `-E -cpp -x hs Foo.cpphs` to preprocess source files using GHC. I added a test to exercise this case. Fixes #22044
* CmmToAsm/AArch64: correct a typoCheng Shao2022-08-161-1/+1
|
* CmmToLlvm: Don't aliasify builtin LLVM variablesBen Gamari2022-08-161-4/+16
| | | | | | | | | Our aliasification logic would previously turn builtin LLVM variables into aliases, which apparently confuses LLVM. This manifested in initializers failing to be emitted, resulting in many profiling failures with the LLVM backend. Fixes #22019.
* EPA: DotFieldOcc does not have exact print annotationsAlan Zimmerman2022-08-1129-69/+135
| | | | | | | | | | | | | | | | | For the code {-# LANGUAGE OverloadedRecordUpdate #-} operatorUpdate f = f{(+) = 1} There are no exact print annotations for the parens around the + symbol, nor does normal ppr print them. This MR fixes that. Closes #21805 Updates haddock submodule
* Note [Trimming auto-rules]: State that this improves compiler perf.Andreas Klebinger2022-08-101-0/+7
|
* ncg/aarch64: Don't use x18 register on AArch64/Darwinnormalcoder2022-08-101-0/+8
| | | | | | | | | Apple's ABI documentation [1] says: "The platforms reserve register x18. Don’t use this register." While this wasn't problematic in previous Darwin releases, macOS 13 appears to start zeroing this register periodically. See #21964. [1] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
* Add support for external static plugins (#20964)Sylvain Henry2022-08-105-10/+219
| | | | | | | | | | | | | | | | | | | | This patch adds a new command-line flag: -fplugin-library=<file-path>;<unit-id>;<module>;<args> used like this: -fplugin-library=path/to/plugin.so;package-123;Plugin.Module;["Argument","List"] It allows a plugin to be loaded directly from a shared library. With this approach, GHC doesn't compile anything for the plugin and doesn't load any .hi file for the plugin and its dependencies. As such GHC doesn't need to support two environments (one for plugins, one for target code), which was the more ambitious approach tracked in #14335. Fix #20964 Co-authored-by: Josh Meredith <joshmeredith2008@gmail.com>
* Fix size_up_alloc to account for UnliftedDatatypessheaf2022-08-091-4/+3
| | | | | | | | The size_up_alloc function mistakenly considered any type that isn't lifted to not allocate anything, which is wrong. What we want instead is to check the type isn't boxed. This accounts for (BoxedRep Unlifted). Fixes #21939
* Cleanups around pretty-printingKrzysztof Gogolewski2022-08-0912-62/+27
| | | | | | | | | | * Remove hack when printing OccNames. No longer needed since e3dcc0d5 * Remove unused `pprCmms` and `instance Outputable Instr` * Simplify `pprCLabel` (no need to pass platform) * Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by ImmLit, but that can take just a String instead. * Remove instance `Outputable CLabel` - proper output of labels needs a platform, and is done by the `OutputableP` instance
* dataToTag#: Skip runtime tag check if argument is infered taggedAndreas Klebinger2022-08-082-11/+37
| | | | This addresses one part of #21710.
* NCG(x86): Compile add+shift as lea if possible.wip/andreask/add_mul_leaAndreas Klebinger2022-08-081-0/+36
|
* Add a primop to query the label of a threadBen Gamari2022-08-062-0/+11
|
* rts: Move thread labels into TSOBen Gamari2022-08-061-1/+3
| | | | | | | This eliminates the thread label HashTable and instead tracks this information in the TSO, allowing us to use proper StgArrBytes arrays for backing the label and greatly simplifying management of object lifetimes when we expose them to the user with the coming `threadLabel#` primop.
* Add primop to list threadsBen Gamari2022-08-062-0/+21
| | | | | | | A user came to #ghc yesterday wondering how best to check whether they were leaking threads. We ended up using the eventlog but it seems to me like it would be generally useful if Haskell programs could query their own threads.
* compiler: Eliminate two uses of foldr in favor of foldl'Ben Gamari2022-08-062-2/+2
| | | | | | These two uses constructed maps, which is a case where foldl' is generally more efficient since we avoid constructing an intermediate O(n)-depth stack.
* Change `-fprof-late` to insert cost centres after unfolding creation.Andreas Klebinger2022-08-0610-38/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | The former behaviour of adding cost centres after optimization but before unfoldings are created is not available via the flag `prof-late-inline` instead. I also reduced the overhead of -fprof-late* by pushing the cost centres into lambdas. This means the cost centres will only account for execution of functions and not their partial application. Further I made LATE_CC cost centres it's own CC flavour so they now won't clash with user defined ones if a user uses the same string for a custom scc. LateCC: Don't put cost centres inside constructor workers. With -fprof-late they are rarely useful as the worker is usually inlined. Even if the worker is not inlined or we use -fprof-late-linline they are generally not helpful but bloat compile and run time significantly. So we just don't add sccs inside constructor workers. ------------------------- Metric Decrease: T13701 -------------------------
* StgToCmm: Fix isSimpleScrut when profiling is enabled.Andreas Klebinger2022-08-063-27/+32
| | | | | | | | | | | When profiling is enabled we must enter functions that might represent thunks in order for their sccs to show up in the profile. We might allocate even if the function is already evaluated in this case. So we can't consider any potential function thunk to be a simple scrut when profiling. Not doing so caused profiled binaries to segfault.
* Make dropTail comment a haddock commentAndreas Klebinger2022-08-061-1/+1
|
* codeGen/X86: Don't clobber switch variable in switch generationBen Gamari2022-08-051-2/+3
| | | | | | | | | Previously ce8745952f99174ad9d3bdc7697fd086b47cdfb5 assumed that it was safe to clobber the switch variable when generating code for a jump table since we were at the end of a block. However, this assumption is wrong; the register could be live in the jump target. Fixes #21968.
* findExternalRules: Don't needlessly traverse the list of rules.Andreas Klebinger2022-08-041-4/+3
|
* cmm: Remove unused ReadOnlyData16Cheng Shao2022-08-045-16/+0
| | | | We don't actually emit rodata16 sections anywhere.
* Store interfaces in ModIfaceCache more directlyMatthew Pickering2022-08-042-73/+49
| | | | | | | | I realised hydration was completely irrelavant for this cache because the ModDetails are pruned from the result. So now it simplifies things a lot to just store the ModIface and Linkable, which we can put into the cache straight away rather than wait for the final version of a HomeModInfo to appear.
* Fix leaks in --make mode when there are module loopsMatthew Pickering2022-08-046-96/+307
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes quite a tricky leak where we would end up retaining stale ModDetails due to rehydrating modules against non-finalised interfaces. == Loops with multiple boot files It is possible for a module graph to have a loop (SCC, when ignoring boot files) which requires multiple boot files to break. In this case we must perform the necessary hydration steps before and after compiling modules which have boot files which are described above for corectness but also perform an additional hydration step at the end of the SCC to remove space leaks. Consider the following example: ┌───────┐ ┌───────┐ │ │ │ │ │ A │ │ B │ │ │ │ │ └─────┬─┘ └───┬───┘ │ │ ┌────▼─────────▼──┐ │ │ │ C │ └────┬─────────┬──┘ │ │ ┌────▼──┐ ┌───▼───┐ │ │ │ │ │ A-boot│ │ B-boot│ │ │ │ │ └───────┘ └───────┘ A, B and C live together in a SCC. Say we compile the modules in order A-boot, B-boot, C, A, B then when we compile A we will perform the hydration steps (because A has a boot file). Therefore C will be hydrated relative to A, and the ModDetails for A will reference C/A. Then when B is compiled C will be rehydrated again, and so B will reference C/A,B, its interface will be hydrated relative to both A and B. Now there is a space leak because say C is a very big module, there are now two different copies of ModDetails kept alive by modules A and B. The way to avoid this space leak is to rehydrate an entire SCC together at the end of compilation so that all the ModDetails point to interfaces for .hs files. In this example, when we hydrate A, B and C together then both A and B will refer to C/A,B. See #21900 for some more discussion. ------------------------------------------------------- In addition to this simple case, there is also the potential for a leak during parallel upsweep which is also fixed by this patch. Transcibed is Note [ModuleNameSet, efficiency and space leaks] Note [ModuleNameSet, efficiency and space leaks] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ During unsweep the results of compiling modules are placed into a MVar, to find the environment the module needs to compile itself in the MVar is consulted and the HomeUnitGraph is set accordingly. The reason we do this is that precisely tracking module dependencies and recreating the HUG from scratch each time is very expensive. In serial mode (-j1), this all works out fine because a module can only be compiled after its dependencies have finished compiling and not interleaved with compiling module loops. Therefore when we create the finalised or no loop interfaces, the HUG only contains finalised interfaces. In parallel mode, we have to be more careful because the HUG variable can contain non-finalised interfaces which have been started by another thread. In order to avoid a space leak where a finalised interface is compiled against a HPT which contains a non-finalised interface we have to restrict the HUG to only the visible modules. The visible modules is recording in the ModuleNameSet, this is propagated upwards whilst compiling and explains which transitive modules are visible from a certain point. This set is then used to restrict the HUG before the module is compiled to only the visible modules and thus avoiding this tricky space leak. Efficiency of the ModuleNameSet is of utmost importance because a union occurs for each edge in the module graph. Therefore the set is represented directly as an IntSet which provides suitable performance, even using a UniqSet (which is backed by an IntMap) is too slow. The crucial test of performance here is the time taken to a do a no-op build in --make mode. See test "jspace" for an example which used to trigger this problem. Fixes #21900
* Force name selectors to ensure no reference to Ids enter the NameCacheMatthew Pickering2022-08-041-2/+2
| | | | | | I observed some unforced thunks in the NameCache which were retaining a whole Id, which ends up retaining a Type.. which ends up retaining old copies of HscEnv containing stale HomeModInfo.
* Force safeInferred to avoid retaining extra copy of DynFlagsMatthew Pickering2022-08-041-1/+2
| | | | | | This will only have a (very) modest impact on memory but we don't want to retain old copies of DynFlags hanging around so best to force this value.
* Add a note about about W/W for unlifting strict argumentsAndreas Klebinger2022-08-042-1/+48
| | | | This fixes #21236.
* Fix TH + defer-type-errors interaction (#21920)Krzysztof Gogolewski2022-08-041-6/+0
| | | | | | Previously, we had to disable defer-type-errors in splices because of #7276. But this fix is no longer necessary, the test T7276 no longer segfaults and is now correctly deferred.
* Remove TCvSubst and use Subst for both term and type-level substYiyun Liu2022-08-0447-718/+642
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the TCvSubst data type and instead uses Subst as the environment for both term and type level substitution. This change is partially motivated by the existential type proposal, which will introduce types that contain expressions and therefore forces us to carry around an "IdSubstEnv" even when substituting for types. It also reduces the amount of code because "Subst" and "TCvSubst" share a lot of common operations. There isn't any noticeable impact on performance (geo. mean for ghc/alloc is around 0.0% but we have -94 loc and one less data type to worry abount). Currently, the "TCvSubst" data type for substitution on types is identical to the "Subst" data type except the former doesn't store "IdSubstEnv". Using "Subst" for type-level substitution means there will be a redundant field stored in the data type. However, in cases where the substitution starts from the expression, using "Subst" for type-level substitution saves us from having to project "Subst" into a "TCvSubst". This probably explains why the allocation is mostly even despite the redundant field. The patch deletes "TCvSubst" and moves "Subst" and its relevant functions from "GHC.Core.Subst" into "GHC.Core.TyCo.Subst". Substitution on expressions is still defined in "GHC.Core.Subst" so we don't have to expose the definition of "Expr" in the hs-boot file that "GHC.Core.TyCo.Subst" must import to refer to "IdSubstEnv" (whose codomain is "CoreExpr"). Most functions named fooTCvSubst are renamed into fooSubst with a few exceptions (e.g. "isEmptyTCvSubst" is a distinct function from "isEmptySubst"; the former ignores the emptiness of "IdSubstEnv"). These exceptions mainly exist for performance reasons and will go away when "Expr" and "Type" are mutually recursively defined (we won't be able to take those shortcuts if we can't make the assumption that expressions don't appear in types).
* Add -dsuppress-coercion-types to make coercions even smaller.Andreas Klebinger2022-08-024-3/+14
| | | | | Instead of `` `cast` <Co:11> :: (Some -> Really -> Large Type)`` simply print `` `cast` <Co:11> :: ... ``
* driver: Don't create LinkNodes when -no-link is enabledMatthew Pickering2022-07-281-1/+1
| | | | Fixes #21866
* Regression test for #21848Simon Peyton Jones2022-07-261-1/+1
|
* Avoid as pipeline when compiling cCheng Shao2022-07-253-14/+22
|
* Add location to cc phaseCheng Shao2022-07-253-6/+6
|
* Get the in-scope set right in FamInstEnv.injectiveBranchesSimon Peyton Jones2022-07-2513-130/+159
| | | | | | | | | | | | | There was an assert error, as Gergo pointed out in #21896. I fixed this by adding an InScopeSet argument to tcUnifyTyWithTFs. And also to GHC.Core.Unify.niFixTCvSubst. I also took the opportunity to get a couple more InScopeSets right, and to change some substTyUnchecked into substTy. This MR touches a lot of other files, but only because I also took the opportunity to introduce mkInScopeSetList, and use it.
* Teach SpecConstr about typeDeterminesValueSimon Peyton Jones2022-07-256-140/+196
| | | | | | | | | | | | | | | | | | | | | | This patch addresses #21831, point 2. See Note [generaliseDictPats] in SpecConstr I took the opportunity to refactor the construction of specialisation rules a bit, so that the rule name says what type we are specialising at. Surprisingly, there's a 20% decrease in compile time for test perf/compiler/T18223. I took a look at it, and the code size seems the same throughout. I did a quick ticky profile which seemed to show a bit less substitution going on. Hmm. Maybe it's the "don't do eta-expansion in stable unfoldings" patch, which is part of the same MR as this patch. Anyway, since it's a move in the right direction, I didn't think it was worth looking into further. Metric Decrease: T18223
* Switch off eta-expansion in rules and unfoldingsSimon Peyton Jones2022-07-251-8/+14
| | | | | | I think this change will make little difference except to reduce clutter. But that's it -- if it causes problems we can switch it on again.
* Fix isEvaldUnfolding and isValueUnfoldingSimon Peyton Jones2022-07-251-0/+2
| | | | This fixes (1) in #21831. Easy, obviously correct.