summaryrefslogtreecommitdiff
path: root/compiler/GHC/Cmm.hs
Commit message (Collapse)AuthorAgeFilesLines
* Scrub some partiality in `GHC.Cmm.Info.Build`: `doSRTs` takes a `[(CAFSet, ↵M Farkas-Dyck2022-12-201-0/+12
| | | | CmmDecl)]` but truly wants a `[(CAFSet, CmmStatics)]`.
* compiler: annotate CmmFileEmbed with blob lengthCheng Shao2022-11-111-4/+4
| | | | | This patch adds the blob length field to CmmFileEmbed. The wasm32 NCG needs to know the precise size of each data segment.
* Introduce a standard thunk for allocating stringsÖmer Sinan Ağacan2022-10-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently for a top-level closure in the form hey = unpackCString# x we generate code like this: Main.hey_entry() // [R1] { info_tbls: [(c2T4, label: Main.hey_info rep: HeapRep static { Thunk } srt: Nothing)] stack_info: arg_space: 8 updfr_space: Just 8 } {offset c2T4: // global _rqm::P64 = R1; if ((Sp + 8) - 24 < SpLim) (likely: False) goto c2T5; else goto c2T6; c2T5: // global R1 = _rqm::P64; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; c2T6: // global (_c2T1::I64) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, _rqm::P64); if (_c2T1::I64 == 0) goto c2T3; else goto c2T2; c2T3: // global call (I64[_rqm::P64])() args: 8, res: 0, upd: 8; c2T2: // global I64[Sp - 16] = stg_bh_upd_frame_info; I64[Sp - 8] = _c2T1::I64; R2 = hey1_r2Gg_bytes; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } } This code is generated for every string literal. Only difference between top-level closures like this is the argument for the bytes of the string (hey1_r2Gg_bytes in the code above). With this patch we introduce a standard thunk in the RTS, called stg_MK_STRING_info, that does what `unpackCString# x` does, except it gets the bytes address from the payload. Using this, for the closure above, we generate this: Main.hey_closure" { Main.hey_closure: const stg_MK_STRING_info; const 0; // padding for indirectee const 0; // static link const 0; // saved info const hey1_r1Gg_bytes; // the payload } This is much smaller in code. Metric Decrease: T10421 T11195 T12150 T12425 T16577 T18282 T18698a T18698b Co-Authored By: Ben Gamari <ben@well-typed.com>
* Cleanups around pretty-printingKrzysztof Gogolewski2022-08-091-7/+1
| | | | | | | | | | * Remove hack when printing OccNames. No longer needed since e3dcc0d5 * Remove unused `pprCmms` and `instance Outputable Instr` * Simplify `pprCLabel` (no need to pass platform) * Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by ImmLit, but that can take just a String instead. * Remove instance `Outputable CLabel` - proper output of labels needs a platform, and is done by the `OutputableP` instance
* cmm: Remove unused ReadOnlyData16Cheng Shao2022-08-041-3/+0
| | | | We don't actually emit rodata16 sections anywhere.
* cmm: Move toBlockList to GHC.CmmBen Gamari2022-07-161-1/+5
|
* cmm: Eliminate orphan Outputable instancesBen Gamari2022-07-161-2/+177
| | | | | | | Here we reorganize `GHC.Cmm` to eliminate the orphan `Outputable` and `OutputableP` instances for the Cmm AST. This makes it significantly easier to use the Cmm pretty-printers in tracing output without incurring module import cycles.
* Refactor handling of global initializersBen Gamari2022-04-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | GHC uses global initializers for a number of things including cost-center registration, info-table provenance registration, and setup of foreign exports. Previously, the global initializer arrays which referenced these initializers would live in the object file of the C stub, which would then be merged into the main object file of the module. Unfortunately, this approach is no longer tenable with the move to Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does not support object merging (that is, the -r flag). Instead we are now rather packaging a module's object files into a static library. However, this is problematic in the case of initializers as there are no references to the C stub object in the archive, meaning that the linker may drop the object from the final link. This patch refactors our handling of global initializers to instead place initializer arrays within the object file of the module to which they belong. We do this by introducing a Cmm data declaration containing the initializer array in the module's Cmm stream. While the initializer functions themselves remain in separate C stub objects, the reference from the module's object ensures that they are not dropped from the final link. In service of #21068.
* Use Info Table Provenances to decode cloned stack (#18163)Sven Tennie2021-09-231-3/+2
| | | | | | | | | | | | | | | | Emit an Info Table Provenance Entry (IPE) for every stack represeted info table if -finfo-table-map is turned on. To decode a cloned stack, lookupIPE() is used. It provides a mapping between info tables and their source location. Please see these notes for details: - [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)] - [Mapping Info Tables to Source Positions] Metric Increase: T12545
* [Sized Cmm] properly retain sizes.Moritz Angermann2020-11-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us with properly sized primitives in the codegenerator instead of pretending they are all full machine words. This came up when implementing darwinpcs for arm64. The darwinpcs reqires us to pack function argugments in excess of registers on the stack. While most procedure call standards (pcs) assume arguments are just passed in 8 byte slots; and thus the caller does not know the exact signature to make the call, darwinpcs requires us to adhere to the prototype, and thus have the correct sizes. If we specify CInt in the FFI call, it should correspond to the C int, and not just be Word sized, when it's only half the size. This does change the expected output of T16402 but the new result is no less correct as it eliminates the narrowing (instead of the `and` as was previously done). Bumps the array, bytestring, text, and binary submodules. Co-Authored-By: Ben Gamari <ben@well-typed.com> Metric Increase: T13701 T14697
* Generalize OutputablePSylvain Henry2020-09-171-4/+7
| | | | | Add a type parameter for the environment required by OutputableP. It avoids tying Platform with OutputableP.
* Introduce OutputablePSylvain Henry2020-09-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Some types need a Platform value to be pretty-printed: CLabel, Cmm types, instructions, etc. Before this patch they had an Outputable instance and the Platform value was obtained via sdocWithDynFlags. It meant that the *renderer* of the SDoc was responsible of passing the appropriate Platform value (e.g. via the DynFlags given to showSDoc). It put the burden of passing the Platform value on the renderer while the generator of the SDoc knows the Platform it is generating the SDoc for and there is no point passing a different Platform at rendering time. With this patch, we introduce a new OutputableP class: class OutputableP a where pdoc :: Platform -> a -> SDoc With this class we still have some polymorphism as we have with `ppr` (i.e. we can use `pdoc` on a variety of types instead of having a dedicated `pprXXX` function for each XXX type). One step closer removing `sdocWithDynFlags` (#10143) and supporting several platforms (#14335).
* DynFlags: refactor GHC.CmmToAsm (#17957, #10143)Sylvain Henry2020-08-181-2/+10
| | | | | | | | | | | | | | | | | This patch removes the use of `sdocWithDynFlags` from GHC.CmmToAsm.*.Ppr To do that I've had to make some refactoring: * X86' and PPC's `Instr` are no longer `Outputable` as they require a `Platform` argument * `Instruction` class now exposes `pprInstr :: Platform -> instr -> SDoc` * as a consequence, I've refactored some modules to avoid .hs-boot files * added (derived) functor instances for some datatypes parametric in the instruction type. It's useful for pretty-printing as we just have to map `pprInstr` before pretty-printing the container datatype.
* Rip out CmmStackInfo(updfr_space)Ben Gamari2020-05-281-3/+0
| | | | | As noted in #18232, this field is currently completely unused and moreover doesn't have a clear meaning.
* llvmGen: Consider Relocatable read-only data as not constantReferences: #18137Tuan Le2020-05-211-12/+28
|
* Modules: Utils and Data (#13009)Sylvain Henry2020-04-261-2/+2
| | | | | | | Update Haddock submodule Metric Increase: haddock.compiler
* Refactor CmmStaticsSylvain Henry2020-04-031-16/+21
| | | | | | | | | | In !2959 we noticed that there was some redundant code (in GHC.Cmm.Utils and GHC.Cmm.StgToCmm.Utils) used to deal with `CmmStatics` datatype (before SRT generation) and `RawCmmStatics` datatype (after SRT generation). This patch removes this redundant code by using a single GADT for (Raw)CmmStatics.
* Move blob handling into StgToCmmSylvain Henry2020-04-031-3/+5
| | | | | | | Move handling of big literal strings from CmmToAsm to StgToCmm. It avoids the use of `sdocWithDynFlags` (cf #10143). We might need to move this handling even higher in the pipeline in the future (cf #17960): this patch will make it easier.
* Modules: Types (#13009)Sylvain Henry2020-03-291-2/+2
| | | | | | | Update Haddock submodule Metric Increase: haddock.compiler
* Module hierarchy: ByteCode and Runtime (cf #13009)Sylvain Henry2020-02-121-1/+1
| | | | Update haddock submodule
* Do CafInfo/SRT analysis in CmmÖmer Sinan Ağacan2020-01-311-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs *after* CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] | Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 | | GHC HEAD | This MR | Diff | | --------------- | -------------- | -------------- | ------ | | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% | | Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% | | Max residency | 421,225,624 | 424,324,264 | +0.73% | * -O1 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% | | Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% | | Max residency | 982,643,440 | 991,663,432 | +0.91% | * -O2 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% | | Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% | | Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% | Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-0/+231