summaryrefslogtreecommitdiff
path: root/compiler/GHC/Cmm
Commit message (Collapse)AuthorAgeFilesLines
* Minor refactor around FastStringsKrzysztof Gogolewski2022-11-052-12/+13
| | | | | | | Pass FastStrings to functions directly, to make sure the rule for fsLit "literal" fires. Remove SDoc indirection in GHCi.UI.Tags and GHC.Unit.Module.Graph.
* Export pprTrace and friends from GHC.Prelude.Andreas Klebinger2022-11-031-1/+0
| | | | | Introduces GHC.Prelude.Basic which can be used in modules which are a dependency of the ppr code.
* Minor SDoc-related cleanupKrzysztof Gogolewski2022-10-283-13/+27
| | | | | | | | | | | * Rename pprCLabel to pprCLabelStyle, and use the name pprCLabel for a function using CStyle (analogous to pprAsmLabel) * Move LabelStyle to the CLabel module, it no longer needs to be in Outputable. * Move calls to 'text' right next to literals, to make sure the text/str rule is triggered. * Remove FastString/String roundtrip in Tc.Deriv.Generate * Introduce showSDocForUser', which abstracts over a pattern in GHCi.UI
* Introduce a standard thunk for allocating stringsÖmer Sinan Ağacan2022-10-223-24/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently for a top-level closure in the form hey = unpackCString# x we generate code like this: Main.hey_entry() // [R1] { info_tbls: [(c2T4, label: Main.hey_info rep: HeapRep static { Thunk } srt: Nothing)] stack_info: arg_space: 8 updfr_space: Just 8 } {offset c2T4: // global _rqm::P64 = R1; if ((Sp + 8) - 24 < SpLim) (likely: False) goto c2T5; else goto c2T6; c2T5: // global R1 = _rqm::P64; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; c2T6: // global (_c2T1::I64) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, _rqm::P64); if (_c2T1::I64 == 0) goto c2T3; else goto c2T2; c2T3: // global call (I64[_rqm::P64])() args: 8, res: 0, upd: 8; c2T2: // global I64[Sp - 16] = stg_bh_upd_frame_info; I64[Sp - 8] = _c2T1::I64; R2 = hey1_r2Gg_bytes; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } } This code is generated for every string literal. Only difference between top-level closures like this is the argument for the bytes of the string (hey1_r2Gg_bytes in the code above). With this patch we introduce a standard thunk in the RTS, called stg_MK_STRING_info, that does what `unpackCString# x` does, except it gets the bytes address from the payload. Using this, for the closure above, we generate this: Main.hey_closure" { Main.hey_closure: const stg_MK_STRING_info; const 0; // padding for indirectee const 0; // static link const 0; // saved info const hey1_r1Gg_bytes; // the payload } This is much smaller in code. Metric Decrease: T10421 T11195 T12150 T12425 T16577 T18282 T18698a T18698b Co-Authored By: Ben Gamari <ben@well-typed.com>
* remove a no-warn directive from GHC.Cmm.ContFlowOptCurran McConnell2022-10-212-6/+9
| | | | | | | | | | | | | | | | | This patch is motivated by the desire to remove the {-# OPTIONS_GHC -fno-warn-incomplete-patterns #-} directive at the top of GHC.Cmm.ContFlowOpt. (Based on the text in this coding standards doc, I understand it's a goal of the project to remove such directives.) I chose this task because I'm a new contributor to GHC, and it seemed like a good way to get acquainted with the patching process. In order to address the warning that arose when I removed the no-warn directive, I added a case to removeUnreachableBlocksProc to handle the CmmData constructor. Clearly, since this partial function has not been erroring out in the wild, its inputs are always in practice wrapped by the CmmProc constructor. Therefore the CmmData case is handled by a precise panic (which is an improvement over the partial pattern match from before).
* Scrub various partiality involving lists (again).M Farkas-Dyck2022-10-191-4/+4
| | | | Lets us avoid some use of `head` and `tail`, and some panics.
* Cmm Lint: relax SIMD register assignment checksheaf2022-10-191-3/+15
| | | | | | | | | | As noted in #22297, SIMD vector registers can be used to store different kinds of values, e.g. xmm1 can be used both to store integer and floating point values. The Cmm type system doesn't properly account for this, so we weaken the Cmm register assignment lint check to only compare widths when comparing a vector type with its allocated vector register.
* Remove SIMD conversionssheaf2022-10-191-5/+5
| | | | | | | | | | | This patch makes it so that packing/unpacking SIMD vectors always uses the right sized types, e.g. unpacking a Word16X4# will give a tuple of Word16#s. As a result, we can get rid of the conversion instructions that were previously required. Fixes #22296
* Add VecSlot for unboxed sums of SIMD vectorsDai2022-10-191-1/+2
| | | | | | | | | This patch adds the missing `VecRep` case to `primRepSlot` function and all the necessary machinery to carry this new `VecSlot` through code generation. This allows programs involving unboxed sums of SIMD vectors to be written and compiled. Fixes #22187
* Make `Functor` a superclass of `TrieMap`, which lets us derive the `map` ↵M Farkas-Dyck2022-10-181-1/+0
| | | | functions.
* Make Cmm Lint messages use dump styleKrzysztof Gogolewski2022-10-111-1/+2
| | | | | | | | | | | Lint errors indicate an internal error in GHC, so it makes sense to use it instead of the user style. This is consistent with Core Lint and STG Lint: https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Core/Lint.hs#L429 https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Stg/Lint.hs#L144 Fixes #22218.
* Refactor IPE initializationBen Gamari2022-10-112-5/+13
| | | | | | | | | | | | | | | Here we refactor the representation of info table provenance information in object code to significantly reduce its size and link-time impact. Specifically, we deduplicate strings and represent them as 32-bit offsets into a common string table. In addition, we rework the registration logic to eliminate allocation from the registration path, which is run from a static initializer where things like allocation are technically undefined behavior (although it did previously seem to work). For similar reasons we eliminate lock usage from registration path, instead relying on atomic CAS. Closes #22077.
* CLabel: fix isInfoTableLabelCheng Shao2022-10-111-0/+7
| | | | isInfoTableLabel does not take Cmm info table into account. This patch is required for data section layout of wasm32 NCG to work.
* Scrub various partiality involving empty lists.M Farkas-Dyck2022-09-302-6/+7
| | | | Avoids some uses of `head` and `tail`, and some panics when an argument is null.
* Avoid Data.List.group; prefer Data.List.NonEmpty.groupBodigrim2022-09-281-4/+3
| | | | | This allows to avoid further partiality, e. g., map head . group is replaced by map NE.head . NE.group, and there are less panic calls.
* Minor refactor around OutputableKrzysztof Gogolewski2022-09-222-5/+5
| | | | | | | * Replace 'text . show' and 'ppr' with 'int'. * Remove Outputable.hs-boot, no longer needed * Use pprWithCommas * Factor out instructions in AArch64 codegen
* Clean up some. In particular:M Farkas-Dyck2022-09-174-42/+27
| | | | | | | | | | • Delete some dead code, largely under `GHC.Utils`. • Clean up a few definitions in `GHC.Utils.(Misc, Monad)`. • Clean up `GHC.Types.SrcLoc`. • Derive stock `Functor, Foldable, Traversable` for more types. • Derive more instances for newtypes. Bump haddock submodule.
* Fix typosKrzysztof Gogolewski2022-09-141-1/+1
|
* Fix typosEric Lindblad2022-09-146-9/+9
| | | | | | | This fixes various typos and spelling mistakes in the compiler. Fixes #21891
* Add native delimited continuations to the RTSAlexis King2022-09-111-5/+4
| | | | | | | | | | | | | | | | | | | | | This patch implements GHC proposal 313, "Delimited continuation primops", by adding native support for delimited continuations to the GHC RTS. All things considered, the patch is relatively small. It almost exclusively consists of changes to the RTS; the compiler itself is essentially unaffected. The primops come with fairly extensive Haddock documentation, and an overview of the implementation strategy is given in the Notes in rts/Continuation.c. This first stab at the implementation prioritizes simplicity over performance. Most notably, every continuation is always stored as a single, contiguous chunk of stack. If one of these chunks is particularly large, it can result in poor performance, as the current implementation does not attempt to cleverly squeeze a subset of the stack frames into the existing stack: it must fit all at once. If this proves to be a performance issue in practice, a cleverer strategy would be a worthwhile target for future improvements.
* Minor SDoc cleanupKrzysztof Gogolewski2022-09-072-5/+3
| | | | | | | Change calls to renderWithContext with showSDocOneLine; it's more efficient and explanatory. Remove polyPatSig (unused)
* Remove label style from printing contextKrzysztof Gogolewski2022-08-265-16/+20
| | | | | | | | | | | | Previously, the SDocContext used for code generation contained information whether the labels should use Asm or C style. However, at every individual call site, this is known statically. This removes the parameter to 'PprCode' and replaces every 'pdoc' used to print a label in code style with 'pprCLabel' or 'pprAsmLabel'. The OutputableP instance is now used only for dumps. The output of T15155 changes, it now uses the Asm style (which is faithful to what actually happens).
* Scrub some partiality in `CommonBlockElim`.M Farkas-Dyck2022-08-251-10/+9
|
* Cleanups around pretty-printingKrzysztof Gogolewski2022-08-091-10/+7
| | | | | | | | | | * Remove hack when printing OccNames. No longer needed since e3dcc0d5 * Remove unused `pprCmms` and `instance Outputable Instr` * Simplify `pprCLabel` (no need to pass platform) * Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by ImmLit, but that can take just a String instead. * Remove instance `Outputable CLabel` - proper output of labels needs a platform, and is done by the `OutputableP` instance
* compiler: Eliminate two uses of foldr in favor of foldl'Ben Gamari2022-08-062-2/+2
| | | | | | These two uses constructed maps, which is a case where foldl' is generally more efficient since we avoid constructing an intermediate O(n)-depth stack.
* Change `-fprof-late` to insert cost centres after unfolding creation.Andreas Klebinger2022-08-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The former behaviour of adding cost centres after optimization but before unfoldings are created is not available via the flag `prof-late-inline` instead. I also reduced the overhead of -fprof-late* by pushing the cost centres into lambdas. This means the cost centres will only account for execution of functions and not their partial application. Further I made LATE_CC cost centres it's own CC flavour so they now won't clash with user defined ones if a user uses the same string for a custom scc. LateCC: Don't put cost centres inside constructor workers. With -fprof-late they are rarely useful as the worker is usually inlined. Even if the worker is not inlined or we use -fprof-late-linline they are generally not helpful but bloat compile and run time significantly. So we just don't add sccs inside constructor workers. ------------------------- Metric Decrease: T13701 -------------------------
* cmm: Move toBlockList to GHC.CmmBen Gamari2022-07-163-5/+0
|
* cmm: Eliminate orphan Outputable instancesBen Gamari2022-07-1615-1110/+812
| | | | | | | Here we reorganize `GHC.Cmm` to eliminate the orphan `Outputable` and `OutputableP` instances for the Cmm AST. This makes it significantly easier to use the Cmm pretty-printers in tracing output without incurring module import cycles.
* cmm: Add surface syntax for MO_MulMayOfloBen Gamari2022-06-151-0/+2
|
* Enable USE_INLINE_SRT_FIELD on ARM64Sylvain Henry2022-05-301-11/+2
| | | | | | | It was previously disabled because of: - a confusion about "SRT inlining" (see removed comment in this commit) - a linker bug (overflow) in the handling of ARM64_RELOC_SUBTRACTOR relocation: fixed by a previous commit.
* Some fixes to SRT documentationSylvain Henry2022-05-301-25/+50
| | | | | | | | | - reordered the 3 SRT implementation cases from the most general to the most specific one: USE_SRT_POINTER -> USE_SRT_OFFSET -> USE_INLINE_SRT_FIELD - added requirements for each - found and documented a confusion about "SRT inlining" not supported with MachO. (It is fixed in the following commit)
* Change `Backend` type and remove direct dependencieswip/backend-as-recordNorman Ramsey2022-05-211-8/+1
| | | | | | | | | | | | | | | | | | | With this change, `Backend` becomes an abstract type (there are no more exposed value constructors). Decisions that were formerly made by asking "is the current back end equal to (or different from) this named value constructor?" are now made by interrogating the back end about its properties, which are functions exported by `GHC.Driver.Backend`. There is a description of how to migrate code using `Backend` in the user guide. Clients using the GHC API can find a backdoor to access the Backend datatype in GHC.Driver.Backend.Internal. Bumps haddock submodule. Fixes #20927
* document fields of `DominatorSet`Andreas Klebinger2022-05-201-2/+2
|
* add HasDebugCallStack; remove unneeded extensionsNorman Ramsey2022-05-201-16/+19
|
* add dominator-tree functionNorman Ramsey2022-05-201-0/+12
|
* add dominator analysis of `CmmGraph`Norman Ramsey2022-05-201-0/+200
| | | | | | | | | | | | This commit adds module `GHC.Cmm.Dominators`, which provides a wrapper around two existing algorithms in GHC: the Lengauer-Tarjan dominator analysis from the X86 back end and the reverse postorder ordering from the Cmm Dataflow framework. Issue #20726 proposes that we evaluate some alternatives for dominator analysis, but for the time being, the best path forward is simply to use the existing analysis on `CmmGraph`s. This commit addresses a bullet in #21200.
* Don't store LlvmConfig into DynFlagsSylvain Henry2022-05-171-15/+15
| | | | | | | | | | | | | | | | | | | | | LlvmConfig contains information read from llvm-passes and llvm-targets files in GHC's top directory. Reading these files is done only when needed (i.e. when the LLVM backend is used) and cached for the whole compiler session. This patch changes the way this is done: - Split LlvmConfig into LlvmConfig and LlvmConfigCache - Store LlvmConfigCache in HscEnv instead of DynFlags: there is no good reason to store it in DynFlags. As it is fixed per session, we store it in the session state instead (HscEnv). - Initializing LlvmConfigCache required some changes to driver functions such as newHscEnv. I've used the opportunity to untangle initHscEnv from initGhcMonad (in top-level GHC module) and to move it to GHC.Driver.Main, close to newHscEnv. - I've also made `cmmPipeline` independent of HscEnv in order to remove the call to newHscEnv in regalloc_unit_tests.
* codeGen: Ensure that static datacon apps are included in SRTsBen Gamari2022-05-171-43/+86
| | | | | | | | | | | | | | | | When generating an SRT for a recursive group, GHC.Cmm.Info.Build.oneSRT filters out recursive references, as described in Note [recursive SRTs]. However, doing so for static functions would be unsound, for the reason described in Note [Invalid optimisation: shortcutting]. However, the same argument applies to static data constructor applications, as we discovered in #20959. Fix this by ensuring that static data constructor applications are included in recursive SRTs. The approach here is not entirely satisfactory, but it is a starting point. Fixes #20959.
* CafAnal: Improve code clarityBen Gamari2022-05-172-100/+131
| | | | | | | | | | Here we implement a few measures to improve the clarity of the CAF analysis implementation. Specifically: * Use CafInfo instead of Bool since the former is more descriptive * Rename CAFLabel to CAFfyLabel, since not all CAFfyLabels are in fact CAFs * Add numerous comments
* Move CmmParserConfig and PDConfig into GHC.Cmm.Parser.ConfigAndre Marianiello2022-05-123-11/+27
|
* Decouple dynflags in Cmm parser (related to #17957)Andre Marianiello2022-05-122-23/+29
|
* Add a Note describing lack of object merging on WindowsBen Gamari2022-04-061-6/+6
| | | | See #21068.
* Refactor handling of global initializersBen Gamari2022-04-014-2/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | GHC uses global initializers for a number of things including cost-center registration, info-table provenance registration, and setup of foreign exports. Previously, the global initializer arrays which referenced these initializers would live in the object file of the C stub, which would then be merged into the main object file of the module. Unfortunately, this approach is no longer tenable with the move to Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does not support object merging (that is, the -r flag). Instead we are now rather packaging a module's object files into a static library. However, this is problematic in the case of initializers as there are no references to the C stub object in the archive, meaning that the linker may drop the object from the final link. This patch refactors our handling of global initializers to instead place initializer arrays within the object file of the module to which they belong. We do this by introducing a Cmm data declaration containing the initializer array in the module's Cmm stream. While the initializer functions themselves remain in separate C stub objects, the reference from the module's object ensures that they are not dropped from the final link. In service of #21068.
* Fix all invalid haddock comments in the compilerZubin Duggal2022-03-293-12/+12
| | | | Fixes #20935 and #20924
* Avoid redundant imports of GHC.Driver.SessionSylvain Henry2022-03-232-2/+0
| | | | | | Remove GHC.Driver.Session imports that weren't considered as redundant because of the reexport of PlatformConstants. Also remove this reexport as modules using this datatype should import GHC.Platform instead.
* Ticky profiling improvements.Matthew Pickering2022-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | This adds a number of changes to ticky-ticky profiling. When an executable is profiled with IPE profiling it's now possible to associate id-related ticky counters to their source location. This works by emitting the info table address as part of the counter which can be looked up in the IPE table. Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks which are precompiled into the RTS. This means reduced cache locality and increased code size. But it allows better attribution of execution cost to specific source locations instead of simple attributing it to the standard thunk. ticky-ticky now uses the `arg` field to emit additional information about counters in json format. When ticky-ticky is used in combination with the eventlog eventlog2html can be used to generate a html table from the eventlog similar to the old text output for ticky-ticky.
* CLabel cleanup:Andreas Klebinger2022-02-282-14/+6
| | | | | | | | Remove these smart constructors for these reasons: * mkLocalClosureTableLabel : Does the same as the non-local variant. * mkLocalClosureLabel : Does the same as the non-local variant. * mkLocalInfoTableLabel : Decide if we make a local label based on the name and just use mkInfoTableLabel everywhere.
* Suggestions due to hlintMatthew Pickering2022-02-242-2/+0
| | | | | It turns out this job hasn't been running for quite a while (perhaps ever) so there are quite a few failures when running the linter locally.
* Tag inference work.Andreas Klebinger2022-02-123-10/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does three major things: * Enforce the invariant that all strict fields must contain tagged pointers. * Try to predict the tag on bindings in order to omit tag checks. * Allows functions to pass arguments unlifted (call-by-value). The former is "simply" achieved by wrapping any constructor allocations with a case which will evaluate the respective strict bindings. The prediction is done by a new data flow analysis based on the STG representation of a program. This also helps us to avoid generating redudant cases for the above invariant. StrictWorkers are created by W/W directly and SpecConstr indirectly. See the Note [Strict Worker Ids] Other minor changes: * Add StgUtil module containing a few functions needed by, but not specific to the tag analysis. ------------------------- Metric Decrease: T12545 T18698b T18140 T18923 LargeRecord Metric Increase: LargeRecord ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13253-spj T13379 T15164 T18282 T18304 T18698a T1969 T20049 T3294 T4801 T5321FD T5321Fun T783 T9233 T9675 T9961 T19695 WWRec -------------------------
* Introduce alignment to CmmStoreBen Gamari2022-02-047-15/+22
|