summaryrefslogtreecommitdiff
path: root/compiler/GHC/StgToCmm
Commit message (Collapse)AuthorAgeFilesLines
* A bunch of changes related to eta reductionSimon Peyton Jones2022-05-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a large collection of changes all relating to eta reduction, originally triggered by #18993, but there followed a long saga. Specifics: * Move state-hack stuff from GHC.Types.Id (where it never belonged) to GHC.Core.Opt.Arity (which seems much more appropriate). * Add a crucial mkCast in the Cast case of GHC.Core.Opt.Arity.eta_expand; helps with T18223 * Add clarifying notes about eta-reducing to PAPs. See Note [Do not eta reduce PAPs] * I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity, where it properly belongs. See Note [Eta reduce PAPs] * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for when eta-expansion is wanted, to make wantEtaExpansion, and all that same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was previously inconsistent, but it's doing the same thing. * I did a substantial refactor of ArityType; see Note [ArityType]. This allowed me to do away with the somewhat mysterious takeOneShots; more generally it allows arityType to describe the function, leaving its clients to decide how to use that information. I made ArityType abstract, so that clients have to use functions to access it. * Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called mkLam before) aware of the floats that the simplifier builds up, so that it can still do eta-reduction even if there are some floats. (Previously that would not happen.) That means passing the floats to rebuildLam, and an extra check when eta-reducting (etaFloatOk). * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info in the idDemandInfo of the binder, as well as the CallArity info. The occurrence analyser did this but we were failing to take advantage here. In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity; see Note [Combining arityType with demand info], and functions idDemandOneShots and combineWithDemandOneShots. (These changes partly drove my refactoring of ArityType.) * In GHC.Core.Opt.Arity.findRhsArity * I'm now taking account of the demand on the binder to give extra one-shot info. E.g. if the fn is always called with two args, we can give better one-shot info on the binders than if we just look at the RHS. * Don't do any fixpointing in the non-recursive case -- simple short cut. * Trim arity inside the loop. See Note [Trim arity inside the loop] * Make SimpleOpt respect the eta-reduction flag (Some associated refactoring here.) * I made the CallCtxt which the Simplifier uses distinguish between recursive and non-recursive right-hand sides. data CallCtxt = ... | RhsCtxt RecFlag | ... It affects only one thing: - We call an RHS context interesting only if it is non-recursive see Note [RHS of lets] in GHC.Core.Unfold * Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification. See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep. Other incidental changes * Fix a fairly long-standing outright bug in the ApplyToVal case of GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the tail of 'dmds' in the recursive call, which meant the demands were All Wrong. I have no idea why this has not caused problems before now. * Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg Metrics: compile_time/bytes allocated Test Metric Baseline New value Change --------------------------------------------------------------------------------------- MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2% T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0% geo. mean -0.1% Nofib: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- banner +0.0% +0.4% -8.9% -8.7% 0.0% exact-reals +0.0% -7.4% -36.3% -37.4% 0.0% fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0% fft2 -0.1% -0.2% -17.8% -19.2% 0.0% fluid +0.0% -1.3% -2.1% -2.1% 0.0% gg -0.0% +2.2% -0.2% -0.1% 0.0% spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0% tak +0.0% -0.3% -9.8% -9.8% 0.0% x2n1 +0.0% -0.2% -3.2% -3.2% 0.0% -------------------------------------------------------------------------------- Min -3.5% -7.4% -58.7% -59.9% 0.0% Max +0.1% +2.2% +32.9% +32.9% 0.0% Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0% Metric Decrease: MultiLayerModulesTH_OneShot T18223 T3064 T15185 T14766 Metric Increase: T9630
* Make debug a `Bool` not an `Int` in `StgToCmmConfig`John Ericson2022-05-232-5/+5
| | | | | | | We don't need any more resolution than this. Rename the field to `stgToCmmEmitDebugInfo` to indicate it is no longer conveying any "level" information.
* Make floating-point abs IEEE 754 compliantARATA Mizuki2022-05-112-40/+2
| | | | | | The old code used by via-C backend didn't handle the sign bit of NaN. See #21043.
* Fix all invalid haddock comments in the compilerZubin Duggal2022-03-291-2/+2
| | | | Fixes #20935 and #20924
* Reverse dependency between StgToCmm and Runtime.Heap.LayoutSylvain Henry2022-03-231-41/+7
|
* Avoid redundant imports of GHC.Driver.SessionSylvain Henry2022-03-231-2/+0
| | | | | | Remove GHC.Driver.Session imports that weren't considered as redundant because of the reexport of PlatformConstants. Also remove this reexport as modules using this datatype should import GHC.Platform instead.
* Improve -dtag-inference-checks checks.Andreas Klebinger2022-03-112-13/+38
| | | | | FUN closures don't get tagged when evaluated. So no point in checking their tags.
* GenStgAlt 3-tuple synonym --> Record typedoyougnu2022-03-022-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit alters GenStgAlt from a type synonym to a Record with field accessors. In pursuit of #21078, this is not a required change but cleans up several areas for nicer code in the upcoming js-backend, and in GHC itself. GenStgAlt: 3-tuple -> record Stg.Utils: GenStgAlt 3-tuple -> record Stg.Stats: StgAlt 3-tuple --> record Stg.InferTags.Rewrite: StgAlt 3-tuple -> record Stg.FVs: GenStgAlt 3-tuple -> record Stg.CSE: GenStgAlt 3-tuple -> record Stg.InferTags: GenStgAlt 3-tuple --> record Stg.Debug: GenStgAlt 3-tuple --> record Stg.Lift.Analysis: GenStgAlt 3-tuple --> record Stg.Lift: GenStgAlt 3-tuple --> record ByteCode.Instr: GenStgAlt 3-tuple --> record Stg.Syntax: add GenStgAlt helper functions Stg.Unarise: GenStgAlt 3-tuple --> record Stg.BcPrep: GenStgAlt 3-tuple --> record CoreToStg: GenStgAlt 3-tuple --> record StgToCmm.Expr: GenStgAlt 3-tuple --> record StgToCmm.Bind: GenStgAlt 3-tuple --> record StgToByteCode: GenStgAlt 3-tuple --> record Stg.Lint: GenStgAlt 3-tuple --> record Stg.Syntax: strictify GenStgAlt GenStgAlt: add haddock, some cleanup fixup: remove calls to pure, single ViewPattern StgToByteCode: use case over viewpatterns
* StgToCmm.cgTopBinding: no isNCG, use binBlobThreshdoyougnu2022-03-021-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | This is a one line change. It is a fixup from MR!7325, was pointed out in review of MR!7442, specifically: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7442#note_406581 The change removes isNCG check from cgTopBinding. Instead it changes the type of binBlobThresh in DynFlags from Word to Maybe Word, where a Just 0 or a Nothing indicates an infinite threshold and thus the disable CmmFileEmbed case in the original check. This improves the cohesion of the module because more NCG related Backend stuff is moved into, and checked in, StgToCmm.Config. Note, that the meaning of a Just 0 or a Nothing in binBlobThresh is indicated in a comment next to its field in GHC.StgToCmm.Config. DynFlags: binBlobThresh: Word -> Maybe Word StgToCmm.Config: binBlobThesh add not ncg check DynFlags.binBlob: move Just 0 check to dflags init StgToCmm.binBlob: only check isNCG, Just 0 check to dflags StgToCmm.Config: strictify binBlobThresh
* Ticky profiling improvements.Matthew Pickering2022-03-026-45/+150
| | | | | | | | | | | | | | | | | | | | This adds a number of changes to ticky-ticky profiling. When an executable is profiled with IPE profiling it's now possible to associate id-related ticky counters to their source location. This works by emitting the info table address as part of the counter which can be looked up in the IPE table. Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks which are precompiled into the RTS. This means reduced cache locality and increased code size. But it allows better attribution of execution cost to specific source locations instead of simple attributing it to the standard thunk. ticky-ticky now uses the `arg` field to emit additional information about counters in json format. When ticky-ticky is used in combination with the eventlog eventlog2html can be used to generate a html table from the eventlog similar to the old text output for ticky-ticky.
* driver: Properly add an edge between a .hs and its hs-boot fileMatthew Pickering2022-03-011-7/+0
| | | | | | | | | | | | | | As noted in #21071 we were missing adding this edge so there were situations where the .hs file would get compiled before the .hs-boot file which leads to issues with -j. I fixed this properly by adding the edge in downsweep so the definition of nodeDependencies can be simplified to avoid adding this dummy edge in. There are plenty of tests which seem to have these redundant boot files anyway so no new test. #21094 tracks the more general issue of identifying redundant hs-boot and SOURCE imports.
* CLabel cleanup:Andreas Klebinger2022-02-282-12/+3
| | | | | | | | Remove these smart constructors for these reasons: * mkLocalClosureTableLabel : Does the same as the non-local variant. * mkLocalClosureLabel : Does the same as the non-local variant. * mkLocalInfoTableLabel : Decide if we make a local label based on the name and just use mkInfoTableLabel everywhere.
* Ticky: Gate tag-inference dummy ticky-counters behind a flag.Andreas Klebinger2022-02-252-5/+9
| | | | | | | | | | Tag inference included a way to collect stats about avoided tag-checks. This was dony by emitting "dummy" ticky entries with counts corresponding to predicted/unpredicated tag checks. This behaviour for ticky is now gated behind -fticky-tag-checks. I also documented ticky-LNE in the process.
* ghci: show helpful error message when loading module with SIMD vector ↵nineonine2022-02-241-0/+1
| | | | | | | operations (#20214) Previously, when trying to load module with SIMD vector operations, ghci would panic in 'GHC.StgToByteCode.findPushSeq'. Now, a more helpful message is displayed.
* Tag inference work.Andreas Klebinger2022-02-1210-35/+497
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does three major things: * Enforce the invariant that all strict fields must contain tagged pointers. * Try to predict the tag on bindings in order to omit tag checks. * Allows functions to pass arguments unlifted (call-by-value). The former is "simply" achieved by wrapping any constructor allocations with a case which will evaluate the respective strict bindings. The prediction is done by a new data flow analysis based on the STG representation of a program. This also helps us to avoid generating redudant cases for the above invariant. StrictWorkers are created by W/W directly and SpecConstr indirectly. See the Note [Strict Worker Ids] Other minor changes: * Add StgUtil module containing a few functions needed by, but not specific to the tag analysis. ------------------------- Metric Decrease: T12545 T18698b T18140 T18923 LargeRecord Metric Increase: LargeRecord ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13253-spj T13379 T15164 T18282 T18304 T18698a T1969 T20049 T3294 T4801 T5321FD T5321Fun T783 T9233 T9675 T9961 T19695 WWRec -------------------------
* Add late cost centre supportAndreas Klebinger2022-02-101-2/+3
| | | | | This allows cost centres to be inserted after the core optimization pipeline has run.
* StgToCmm: Get rid of GHC.Driver.Session importsJohn Ericson2022-02-084-6/+0
| | | | | `DynFlags` is gone, but let's move a few trivial things around to get rid of its module too.
* Fix some notesMatthew Pickering2022-02-083-4/+4
|
* Fix array primop alignmentBen Gamari2022-02-042-21/+35
|
* Introduce alignment to CmmStoreBen Gamari2022-02-042-2/+3
|
* Introduce alignment in CmmLoadBen Gamari2022-02-046-21/+24
|
* cmm: Introduce cmmLoadBWord and cmmLoadGCWordBen Gamari2022-02-046-17/+16
|
* Add commentsBen Gamari2022-02-041-0/+2
|
* Fix a few Note inconsistenciesBen Gamari2022-02-016-12/+10
|
* Consistently upper-case "Note ["Ben Gamari2022-02-012-3/+3
| | | | | | This was achieved with git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
* codeGen: Fix two buglets in -fbounds-check logicBen Gamari2022-01-311-10/+10
| | | | | | | @Bodigrim noticed that the `compareByteArray#` bounds-checking logic had flipped arguments and an off-by-one. For the sake of clarity I also refactored occurrences of `cmmOffset` to rather use `cmmOffsetB`. I suspect the former should be retired.
* StgToCmm: decouple DynFlags, add StgToCmmConfigdoyougnu2022-01-3117-526/+581
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | StgToCmm: add Config, remove CgInfoDownwards StgToCmm: runC api change to take StgToCmmConfig StgToCmm: CgInfoDownad -> StgToCmmConfig StgToCmm.Monad: update getters/setters/withers StgToCmm: remove CallOpts in StgToCmm.Closure StgToCmm: remove dynflag references StgToCmm: PtrOpts removed StgToCmm: add TMap to config, Prof - dynflags StgToCmm: add omit yields to config StgToCmm.ExtCode: remove redundant import StgToCmm.Heap: remove references to dynflags StgToCmm: codeGen api change, DynFlags -> Config StgToCmm: remove dynflags in Env and StgToCmm StgToCmm.DataCon: remove dynflags references StgToCmm: remove dynflag references in DataCon StgToCmm: add backend avx flags to config StgToCmm.Prim: remove dynflag references StgToCmm.Expr: remove dynflag references StgToCmm.Bind: remove references to dynflags StgToCmm: move DoAlignSanitisation to Cmm.Type StgToCmm: remove PtrOpts in Cmm.Parser.y DynFlags: update ipInitCode api StgToCmm: Config Module is single source of truth StgToCmm: Lazy config breaks IORef deadlock testsuite: bump countdeps threshold StgToCmm.Config: strictify fields except UpdFrame Strictifying UpdFrameOffset causes the RTS build with stage1 to deadlock. Additionally, before the deadlock performance of the RTS is noticeably slower. StgToCmm.Config: add field descriptions StgToCmm: revert strictify on Module in config testsuite: update CountDeps tests StgToCmm: update comment, fix exports Specifically update comment about loopification passed into dynflags then stored into stgToCmmConfig. And remove getDynFlags from Monad.hs exports Types.Name: add pprFullName function StgToCmm.Ticky: use pprFullname, fixup ExtCode imports Cmm.Info: revert cmmGetClosureType removal StgToCmm.Bind: use pprFullName, Config update comments StgToCmm: update closureDescription api StgToCmm: SAT altHeapCheck StgToCmm: default render for Info table, ticky Use default rendering contexts for info table and ticky ticky, which should be independent of command line input. testsuite: bump count deps pprFullName: flag for ticky vs normal style output convertInfoProvMap: remove unused parameter StgToCmm.Config: add backend flags to config StgToCmm.Config: remove Backend from Config StgToCmm.Prim: refactor Backend call sites StgToCmm.Prim: remove redundant imports StgToCmm.Config: refactor vec compatibility check StgToCmm.Config: add allowQuotRem2 flag StgToCmm.Ticky: print internal names with parens StgToCmm.Bind: dispatch ppr based on externality StgToCmm: Add pprTickyname, Fix ticky naming Accidently removed the ctx for ticky SDoc output. The only relevant flag is sdocPprDebug which was accidental set to False due to using defaultSDocContext without altering the flag. StgToCmm: remove stateful fields in config fixup: config: remove redundant imports StgToCmm: move Sequel type to its own module StgToCmm: proliferate getCallMethod updated api StgToCmm.Monad: add FCodeState to Monad Api StgToCmm: add second reader monad to FCode fixup: Prim.hs: missed a merge conflict fixup: Match countDeps tests to HEAD StgToCmm.Monad: withState -> withCgState To disambiguate it from mtl withState. This withState shouldn't be returning the new state as a value. However, fixing this means tackling the knot tying in CgState and so is very difficult since it changes when the thunk of the knot is forced which either leads to deadlock or to compiler panic.
* Levity-polymorphic arrays and mutable variablessheaf2022-01-262-44/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the following types levity-polymorphic in their last argument: - Array# a, SmallArray# a, Weak# b, StablePtr# a, StableName# a - MutableArray# s a, SmallMutableArray# s a, MutVar# s a, TVar# s a, MVar# s a, IOPort# s a The corresponding primops are also made levity-polymorphic, e.g. `newArray#`, `readArray#`, `writeMutVar#`, `writeIOPort#`, etc. Additionally, exception handling functions such as `catch#`, `raise#`, `maskAsyncExceptions#`,... are made levity/representation-polymorphic. Now that Array# and MutableArray# also work with unlifted types, we can simply re-define ArrayArray# and MutableArrayArray# in terms of them. This means that ArrayArray# and MutableArrayArray# are no longer primitive types, but simply unlifted newtypes around Array# and MutableArrayArray#. This completes the implementation of the Pointer Rep proposal https://github.com/ghc-proposals/ghc-proposals/pull/203 Fixes #20911 ------------------------- Metric Increase: T12545 ------------------------- ------------------------- Metric Decrease: T12545 -------------------------
* codeGen: Introduce flag to bounds-check array accessesBen Gamari2021-12-211-0/+118
| | | | | | | Here we introduce code generator support for instrument array primops with bounds checking, enabled with the `-fcheck-prim-bounds` flag. Introduced to debug #20769.
* Change isUnliftedTyCon to marshalablePrimTyCon (#20401)Krzysztof Gogolewski2021-12-171-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | isUnliftedTyCon was used in three places: Ticky, Template Haskell and FFI checks. It was straightforward to remove it from Ticky and Template Haskell. It is now used in FFI only and renamed to marshalablePrimTyCon. Previously, it was fetching information from a field in PrimTyCon called is_unlifted. Instead, I've changed the code to compute liftedness based on the kind. isFFITy and legalFFITyCon are removed. They were only referred from an old comment that I removed. There were three functions to define a PrimTyCon, but the only difference was that they were setting is_unlifted to True or False. Everything is now done in mkPrimTyCon. I also added missing integer types in Ticky.hs, I think it was an oversight. Fixes #20401
* Require all dirty_MUT_VAR callers to do explicit stg_MUT_VAR_CLEAN_info ↵nineonine2021-12-021-4/+11
| | | | comparison (#20088)
* Add specific optimization flag for fast PAP calls (#6084, #20500)Gergo ERDI2021-11-251-1/+1
|
* Increase type sharingBen Gamari2021-11-172-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #20541 by making mkTyConApp do more sharing of types. In particular, replace * BoxedRep Lifted ==> LiftedRep * BoxedRep Unlifted ==> UnliftedRep * TupleRep '[] ==> ZeroBitRep * TYPE ZeroBitRep ==> ZeroBitType In each case, the thing on the right is a type synonym for the thing on the left, declared in ghc-prim:GHC.Types. See Note [Using synonyms to compress types] in GHC.Core.Type. The synonyms for ZeroBitRep and ZeroBitType are new, but absolutely in the same spirit as the other ones. (These synonyms are mainly for internal use, though the programmer can use them too.) I also renamed GHC.Core.Ty.Rep.isVoidTy to isZeroBitTy, to be compatible with the "zero-bit" nomenclature above. See discussion on !6806. There is a tricky wrinkle: see GHC.Core.Types Note [Care using synonyms to compress types] Compiler allocation decreases by up to 0.8%.
* compiler: Fix racy ticker counter registrationBen Gamari2021-10-043-25/+47
| | | | | | | | | Previously registration of ticky entry counters was racy, performing a read-modify-write to add the new counter to the ticky_entry_ctrs list. This could result in the list becoming cyclic if multiple threads entered the same closure simultaneously. Fixes #20451.
* Use Info Table Provenances to decode cloned stack (#18163)Sven Tennie2021-09-233-5/+11
| | | | | | | | | | | | | | | | Emit an Info Table Provenance Entry (IPE) for every stack represeted info table if -finfo-table-map is turned on. To decode a cloned stack, lookupIPE() is used. It provides a mapping between info tables and their source location. Please see these notes for details: - [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)] - [Mapping Info Tables to Source Positions] Metric Increase: T12545
* Introduce stack snapshotting / cloning (#18741)Sven Tennie2021-09-231-0/+1
| | | | | | | | | | | | | | Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack). The cloning interface consists of two functions, that clone either the treads own stack (cloneMyStack) or another threads stack (cloneThreadStack). The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is useful for analyses as it prevents concurrent modifications. For technical details, please see Note [Stack Cloning]. Co-authored-by: Ben Gamari <bgamari.foss@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
* Let LLVM and C handle > native size arithmeticJohn Ericson2021-09-081-0/+2
| | | | | | NCG needs to call slow FFI functions where we "borrow" the C compiler's implementation, but there is no reason why we need to do that for LLVM, or the unregisterized backend where everything is via C anyways!
* AArch64 NCG: Emit FABS instructions for fabsFloat# and fabsDouble#ARATA Mizuki2021-08-281-2/+3
| | | | Closes #20275
* Make Int64#/Word64# unconditionally availableJohn Ericson2021-08-191-9/+0
| | | | | | | | This prepares us to actually use them when the native size is 64 bits too. I more than saitisfied my curiosity finding they were gated since 47774449c9d66b768a70851fe82c5222c1f60689.
* StgToCmm: use correct bounds for switches on sized valueswip/byte-switchSylvain Henry2021-08-171-6/+13
| | | | | | StgToCmm was only using literals signedness to determine whether using Int and Word range in Cmm switches. Now that we have sized literals (Int8#, Int16#, etc.), it needs to take their ranges into account.
* Optimize Info Table Provenance Entries (IPEs) Map creation and lookupSven Tennie2021-08-111-1/+1
| | | | | | | | | | | | | | | Using a hash map reduces the complexity of lookupIPE(), making it non linear. On registration each IPE list is added to a temporary IPE lists buffer, reducing registration time. The hash map is built lazily on first lookup. IPE event output to stderr is added with tests. For details, please see Note [The Info Table Provenance Entry (IPE) Map]. A performance test for IPE registration and lookup can be found here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5724#note_370806
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-094-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* PrimOps: Add CAS op for all int sizesPeter Trommler2021-08-021-1/+17
| | | | | | | | | | | PPC NCG: Implement CAS inline for 32 and 64 bit testsuite: Add tests for smaller atomic CAS X86 NCG: Catch calls to CAS C fallback Primops: Add atomicCasWord[8|16|32|64]Addr# Add tests for atomicCasWord[8|16|32|64]Addr# Add changelog entry for new primops X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch ghc-prim: 64-bit CAS C fallback on all archs
* Generalise reallyUnsafePtrEquality# and use itsheaf2021-07-231-25/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fixes #9192 and #17126 updates containers submodule 1. Changes the type of the primop `reallyUnsafePtrEquality#` to the most general version possible (heterogeneous as well as levity-polymorphic): > reallyUnsafePtrEquality# > :: forall {l :: Levity} {k :: Levity} > (a :: TYPE (BoxedRep l)) (b :: TYPE (BoxedRep k)) > . a -> b -> Int# 2. Adds a new internal module, `GHC.Ext.PtrEq`, which contains pointer equality operations that are now subsumed by `reallyUnsafePtrEquality#`. These functions are then re-exported by `GHC.Exts` (so that no function goes missing from the export list of `GHC.Exts`, which is user-facing). More specifically, `GHC.Ext.PtrEq` defines: - A new function: * reallyUnsafePtrEquality :: forall (a :: Type). a -> a -> Int# - Library definitions of ex-primops: * `sameMutableArray#` * `sameSmallMutableArray` * `sameMutableByteArray#` * `sameMutableArrayArray#` * `sameMutVar#` * `sameTVar#` * `sameMVar#` * `sameIOPort#` * `eqStableName#` - New functions for comparing non-mutable arrays: * `sameArray#` * `sameSmallArray#` * `sameByteArray#` * `sameArrayArray#` These were requested in #9192. Generally speaking, existing libraries that use `reallyUnsafePtrEquality#` will continue to work with the new, levity-polymorphic version. But not all! Some (`containers`, `unordered-containers`, `dependent-map`) contain the following: > unsafeCoerce# reallyUnsafePtrEquality# a b If we make `reallyUnsafePtrEquality#` levity-polymorphic, this code fails the current GHC representation-polymorphism checks. We agreed that the right solution here is to modify the library; in this case by deleting the call to `unsafeCoerce#`, since `reallyUnsafePtrEquality#` is now type-heterogeneous too.
* Add Word64#/Int64# primopsSylvain Henry2021-07-151-1/+67
| | | | | | | | | | | | | | | | | | | | | | | Word64#/Int64# are only used on 32-bit architectures. Before this patch, operations on these types were directly using the FFI. Now we use real primops that are then lowered into ccalls. The advantage of doing this is that we can now perform constant folding on Word64#/Int64# (#19024). Most of this work was done by John Ericson in !3658. However this patch doesn't go as far as e.g. changing Word64 to always be using Word64#. Noticeable performance improvements T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD Metric Decrease: T9203 haddock.Cabal haddock.base
* Support unlifted datatypes in GHCiLuite Stegeman2021-07-021-0/+4
| | | | fixes #19628
* codeGen: Fix header size for array write barriersGHC GitLab CI2021-06-261-3/+3
| | | | | | | | | Previously the code generator's logic for invoking the nonmoving write barrier was inconsistent with the write barrier itself. Namely, the code generator treated the header size argument as being in words whereas the barrier expected bytes. This was the cause of #19715. Fixes #19715.
* Put tracing functions into their own moduleSylvain Henry2021-06-222-14/+21
| | | | | | | | Now that Outputable is independent of DynFlags, we can put tracing functions using SDocs into their own module that doesn't transitively depend on any GHC.Driver.* module. A few modules needed to be moved to avoid loops in DEBUG mode.
* Re-do rubbish literalsSimon Peyton Jones2021-06-051-3/+7
| | | | | | | | | | As #19882 pointed out, we were simply doing rubbish literals wrong. (I'll refrain from explaining the wrong-ness here -- see the ticket.) This patch fixes it by adding a Type (of kind RuntimeRep) as field of LitRubbish, rather than [PrimRep]. The Note [Rubbish literals] in GHC.Types.Literal explains the details.
* Adds AArch64 Native Code GeneratorMoritz Angermann2021-06-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which we add a new code generator to the Glasgow Haskell Compiler. This codegen supports ELF and Mach-O targets, thus covering Linux, macOS, and BSDs in principle. It was tested only on macOS and Linux. The NCG follows a similar structure as the other native code generators we already have, and should therfore be realtively easy to follow. It supports most of the features required for a proper native code generator, but does not claim to be perfect or fully optimised. There are still opportunities for optimisations. Metric Decrease: ManyAlternatives ManyConstructors MultiLayerModules PmSeriesG PmSeriesS PmSeriesT PmSeriesV T10421 T10421a T10858 T11195 T11276 T11303b T11374 T11822 T12227 T12545 T12707 T13035 T13253 T13253-spj T13379 T13701 T13719 T14683 T14697 T15164 T15630 T16577 T17096 T17516 T17836 T17836b T17977 T17977b T18140 T18282 T18304 T18478 T18698a T18698b T18923 T1969 T3064 T5030 T5321FD T5321Fun T5631 T5642 T5837 T783 T9198 T9233 T9630 T9872d T9961 WWRec Metric Increase: T4801