|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is a large collection of changes all relating to eta
reduction, originally triggered by #18993, but there followed
a long saga.
Specifics:
* Move state-hack stuff from GHC.Types.Id (where it never belonged)
  to GHC.Core.Opt.Arity (which seems much more appropriate).
* Add a crucial mkCast in the Cast case of
  GHC.Core.Opt.Arity.eta_expand; helps with T18223
* Add clarifying notes about eta-reducing to PAPs.
  See Note [Do not eta reduce PAPs]
* I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity,
  where it properly belongs.  See Note [Eta reduce PAPs]
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for
  when eta-expansion is wanted, to make wantEtaExpansion, and all that
  same function in GHC.Core.Opt.Simplify.simplStableUnfolding.  It was
  previously inconsistent, but it's doing the same thing.
* I did a substantial refactor of ArityType; see Note [ArityType].
  This allowed me to do away with the somewhat mysterious takeOneShots;
  more generally it allows arityType to describe the function, leaving
  its clients to decide how to use that information.
  I made ArityType abstract, so that clients have to use functions
  to access it.
* Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called
  mkLam before) aware of the floats that the simplifier builds up, so
  that it can still do eta-reduction even if there are some floats.
  (Previously that would not happen.)  That means passing the floats
  to rebuildLam, and an extra check when eta-reducting (etaFloatOk).
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info
  in the idDemandInfo of the binder, as well as the CallArity info. The
  occurrence analyser did this but we were failing to take advantage here.
  In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity;
  see Note [Combining arityType with demand info], and functions
  idDemandOneShots and combineWithDemandOneShots.
  (These changes partly drove my refactoring of ArityType.)
* In GHC.Core.Opt.Arity.findRhsArity
  * I'm now taking account of the demand on the binder to give
    extra one-shot info.  E.g. if the fn is always called with two
    args, we can give better one-shot info on the binders
    than if we just look at the RHS.
  * Don't do any fixpointing in the non-recursive
    case -- simple short cut.
  * Trim arity inside the loop. See Note [Trim arity inside the loop]
* Make SimpleOpt respect the eta-reduction flag
  (Some associated refactoring here.)
* I made the CallCtxt which the Simplifier uses distinguish between
  recursive and non-recursive right-hand sides.
     data CallCtxt = ... | RhsCtxt RecFlag | ...
  It affects only one thing:
     - We call an RHS context interesting only if it is non-recursive
       see Note [RHS of lets] in GHC.Core.Unfold
* Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification.
  See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep.
Other incidental changes
* Fix a fairly long-standing outright bug in the ApplyToVal case of
  GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the
  tail of 'dmds' in the recursive call, which meant the demands were All
  Wrong.  I have no idea why this has not caused problems before now.
* Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg
Metrics: compile_time/bytes allocated
                               Test    Metric       Baseline      New value Change
---------------------------------------------------------------------------------------
MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,743,297,692  2,619,762,992  -4.5% GOOD
                     T18223(normal) ghc/alloc  1,103,161,360    972,415,992 -11.9% GOOD
                      T3064(normal) ghc/alloc    201,222,500    184,085,360  -8.5% GOOD
                      T8095(normal) ghc/alloc  3,216,292,528  3,254,416,960  +1.2%
                      T9630(normal) ghc/alloc  1,514,131,032  1,557,719,312  +2.9%  BAD
                 parsing001(normal) ghc/alloc    530,409,812    525,077,696  -1.0%
geo. mean                                 -0.1%
Nofib:
       Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
         banner          +0.0%     +0.4%     -8.9%     -8.7%      0.0%
    exact-reals          +0.0%     -7.4%    -36.3%    -37.4%      0.0%
 fannkuch-redux          +0.0%     -0.1%     -1.0%     -1.0%      0.0%
           fft2          -0.1%     -0.2%    -17.8%    -19.2%      0.0%
          fluid          +0.0%     -1.3%     -2.1%     -2.1%      0.0%
             gg          -0.0%     +2.2%     -0.2%     -0.1%      0.0%
  spectral-norm          +0.1%     -0.2%      0.0%      0.0%      0.0%
            tak          +0.0%     -0.3%     -9.8%     -9.8%      0.0%
           x2n1          +0.0%     -0.2%     -3.2%     -3.2%      0.0%
--------------------------------------------------------------------------------
            Min          -3.5%     -7.4%    -58.7%    -59.9%      0.0%
            Max          +0.1%     +2.2%    +32.9%    +32.9%      0.0%
 Geometric Mean          -0.0%     -0.1%    -14.2%    -14.8%     -0.0%
Metric Decrease:
    MultiLayerModulesTH_OneShot
    T18223
    T3064
    T15185
    T14766
Metric Increase:
    T9630 | 
| | 
| 
| 
| 
| 
| 
| | We don't need any more resolution than this.
Rename the field to `stgToCmmEmitDebugInfo` to indicate it is no longer
conveying any "level" information. | 
| | 
| 
| 
| 
| 
| | The old code used by via-C backend didn't handle the sign bit of NaN.
See #21043. | 
| | 
| 
| 
| | Fixes #20935 and #20924 | 
| | |  | 
| | 
| 
| 
| 
| 
| | Remove GHC.Driver.Session imports that weren't considered as redundant
because of the reexport of PlatformConstants. Also remove this reexport
as modules using this datatype should import GHC.Platform instead. | 
| | 
| 
| 
| 
| | FUN closures don't get tagged when evaluated. So no point in checking their
tags. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This commit alters GenStgAlt from a type synonym to a Record with field
accessors. In pursuit of #21078, this is not a required change but cleans
up several areas for nicer code in the upcoming js-backend, and in GHC
itself.
GenStgAlt: 3-tuple -> record
Stg.Utils: GenStgAlt 3-tuple -> record
Stg.Stats: StgAlt 3-tuple --> record
Stg.InferTags.Rewrite: StgAlt 3-tuple -> record
Stg.FVs: GenStgAlt 3-tuple -> record
Stg.CSE: GenStgAlt 3-tuple -> record
Stg.InferTags: GenStgAlt 3-tuple --> record
Stg.Debug: GenStgAlt 3-tuple --> record
Stg.Lift.Analysis: GenStgAlt 3-tuple --> record
Stg.Lift: GenStgAlt 3-tuple --> record
ByteCode.Instr: GenStgAlt 3-tuple --> record
Stg.Syntax: add GenStgAlt helper functions
Stg.Unarise: GenStgAlt 3-tuple --> record
Stg.BcPrep: GenStgAlt 3-tuple --> record
CoreToStg: GenStgAlt 3-tuple --> record
StgToCmm.Expr: GenStgAlt 3-tuple --> record
StgToCmm.Bind: GenStgAlt 3-tuple --> record
StgToByteCode: GenStgAlt 3-tuple --> record
Stg.Lint: GenStgAlt 3-tuple --> record
Stg.Syntax: strictify GenStgAlt
GenStgAlt: add haddock, some cleanup
fixup: remove calls to pure, single ViewPattern
StgToByteCode: use case over viewpatterns | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is a one line change. It is a fixup from MR!7325, was pointed out
in review of MR!7442, specifically: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7442#note_406581
The change removes isNCG check from cgTopBinding. Instead it changes the
type of binBlobThresh in DynFlags from Word to Maybe Word, where a Just
0 or a Nothing indicates an infinite threshold and thus the disable
CmmFileEmbed case in the original check.
This improves the cohesion of the module because more NCG related
Backend stuff is moved into, and checked in, StgToCmm.Config. Note, that
the meaning of a Just 0 or a Nothing in binBlobThresh is indicated in a
comment next to its field in GHC.StgToCmm.Config.
DynFlags: binBlobThresh: Word -> Maybe Word
StgToCmm.Config: binBlobThesh add not ncg check
DynFlags.binBlob: move Just 0 check to dflags init
StgToCmm.binBlob: only check isNCG, Just 0 check to dflags
StgToCmm.Config: strictify binBlobThresh | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This adds a number of changes to ticky-ticky profiling.
When an executable is profiled with IPE profiling it's now possible to
associate id-related ticky counters to their source location.
This works by emitting the info table address as part of the counter
which can be looked up in the IPE table.
Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks
which are precompiled into the RTS. This means reduced cache locality
and increased code size. But it allows better attribution of execution
cost to specific source locations instead of simple attributing it to
the standard thunk.
ticky-ticky now uses the `arg` field to emit additional information
about counters in json format. When ticky-ticky is used in combination
with the eventlog eventlog2html can be used to generate a html table
from the eventlog similar to the old text output for ticky-ticky. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | As noted in #21071 we were missing adding this edge so there were
situations where the .hs file would get compiled before the .hs-boot
file which leads to issues with -j.
I fixed this properly by adding the edge in downsweep so the definition
of nodeDependencies can be simplified to avoid adding this dummy edge
in.
There are plenty of tests which seem to have these redundant boot files
anyway so no new test. #21094 tracks the more general issue of
identifying redundant hs-boot and SOURCE imports. | 
| | 
| 
| 
| 
| 
| 
| 
| | Remove these smart constructors for these reasons:
* mkLocalClosureTableLabel : Does the same as the non-local variant.
* mkLocalClosureLabel      : Does the same as the non-local variant.
* mkLocalInfoTableLabel    : Decide if we make a local label based on the name
                             and just use mkInfoTableLabel everywhere. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Tag inference included a way to collect stats about avoided tag-checks.
This was dony by emitting "dummy" ticky entries with counts corresponding
to predicted/unpredicated tag checks.
This behaviour for ticky is now gated behind -fticky-tag-checks.
I also documented ticky-LNE in the process. | 
| | 
| 
| 
| 
| 
| 
| | operations (#20214)
Previously, when trying to load module with SIMD vector operations, ghci would panic
in 'GHC.StgToByteCode.findPushSeq'. Now, a more helpful message is displayed. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
  not specific to the tag analysis.
-------------------------
Metric Decrease:
	T12545
	T18698b
	T18140
	T18923
        LargeRecord
Metric Increase:
        LargeRecord
	ManyAlternatives
	ManyConstructors
	T10421
	T12425
	T12707
	T13035
	T13056
	T13253
	T13253-spj
	T13379
	T15164
	T18282
	T18304
	T18698a
	T1969
	T20049
	T3294
	T4801
	T5321FD
	T5321Fun
	T783
	T9233
	T9675
	T9961
	T19695
	WWRec
------------------------- | 
| | 
| 
| 
| 
| | This allows cost centres to be inserted after the core optimization
pipeline has run. | 
| | 
| 
| 
| 
| | `DynFlags` is gone, but let's move a few trivial things around to get
rid of its module too. | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| | This was achieved with
    git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g' | 
| | 
| 
| 
| 
| 
| 
| | @Bodigrim noticed that the `compareByteArray#` bounds-checking logic had
flipped arguments and an off-by-one. For the sake of clarity I also
refactored occurrences of `cmmOffset` to rather use `cmmOffsetB`. I
suspect the former should be retired. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | StgToCmm: add Config, remove CgInfoDownwards
StgToCmm: runC api change to take StgToCmmConfig
StgToCmm: CgInfoDownad -> StgToCmmConfig
StgToCmm.Monad: update getters/setters/withers
StgToCmm: remove CallOpts in StgToCmm.Closure
StgToCmm: remove dynflag references
StgToCmm: PtrOpts removed
StgToCmm: add TMap to config, Prof - dynflags
StgToCmm: add omit yields to config
StgToCmm.ExtCode: remove redundant import
StgToCmm.Heap: remove references to dynflags
StgToCmm: codeGen api change, DynFlags -> Config
StgToCmm: remove dynflags in Env and StgToCmm
StgToCmm.DataCon: remove dynflags references
StgToCmm: remove dynflag references in DataCon
StgToCmm: add backend avx flags to config
StgToCmm.Prim: remove dynflag references
StgToCmm.Expr: remove dynflag references
StgToCmm.Bind: remove references to dynflags
StgToCmm: move DoAlignSanitisation to Cmm.Type
StgToCmm: remove PtrOpts in Cmm.Parser.y
DynFlags: update ipInitCode api
StgToCmm: Config Module is single source of truth
StgToCmm: Lazy config breaks IORef deadlock
testsuite: bump countdeps threshold
StgToCmm.Config: strictify fields except UpdFrame
Strictifying UpdFrameOffset causes the RTS build with stage1 to
deadlock. Additionally, before the deadlock performance of the RTS
is noticeably slower.
StgToCmm.Config: add field descriptions
StgToCmm: revert strictify on Module in config
testsuite: update CountDeps tests
StgToCmm: update comment, fix exports
Specifically update comment about loopification passed into dynflags
then stored into stgToCmmConfig. And remove getDynFlags from
Monad.hs exports
Types.Name: add pprFullName function
StgToCmm.Ticky: use pprFullname, fixup ExtCode imports
Cmm.Info: revert cmmGetClosureType removal
StgToCmm.Bind: use pprFullName, Config update comments
StgToCmm: update closureDescription api
StgToCmm: SAT altHeapCheck
StgToCmm: default render for Info table, ticky
Use default rendering contexts for info table and ticky ticky, which should be independent of command line input.
testsuite: bump count deps
pprFullName: flag for ticky vs normal style output
convertInfoProvMap: remove unused parameter
StgToCmm.Config: add backend flags to config
StgToCmm.Config: remove Backend from Config
StgToCmm.Prim: refactor Backend call sites
StgToCmm.Prim: remove redundant imports
StgToCmm.Config: refactor vec compatibility check
StgToCmm.Config: add allowQuotRem2 flag
StgToCmm.Ticky: print internal names with parens
StgToCmm.Bind: dispatch ppr based on externality
StgToCmm: Add pprTickyname, Fix ticky naming
Accidently removed the ctx for ticky SDoc output. The only relevant flag
is sdocPprDebug which was accidental set to False due to using
defaultSDocContext without altering the flag.
StgToCmm: remove stateful fields in config
fixup: config: remove redundant imports
StgToCmm: move Sequel type to its own module
StgToCmm: proliferate getCallMethod updated api
StgToCmm.Monad: add FCodeState to Monad Api
StgToCmm: add second reader monad to FCode
fixup: Prim.hs: missed a merge conflict
fixup: Match countDeps tests to HEAD
StgToCmm.Monad: withState -> withCgState
To disambiguate it from mtl withState. This withState shouldn't be
returning the new state as a value. However, fixing this means tackling
the knot tying in CgState and so is very difficult since it changes when
the thunk of the knot is forced which either leads to deadlock or to
compiler panic. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch makes the following types levity-polymorphic in their
last argument:
  - Array# a, SmallArray# a, Weak# b, StablePtr# a, StableName# a
  - MutableArray# s a, SmallMutableArray# s a,
    MutVar# s a, TVar# s a, MVar# s a, IOPort# s a
The corresponding primops are also made levity-polymorphic, e.g.
`newArray#`, `readArray#`, `writeMutVar#`, `writeIOPort#`, etc.
Additionally, exception handling functions such as `catch#`, `raise#`,
`maskAsyncExceptions#`,... are made levity/representation-polymorphic.
Now that Array# and MutableArray# also work with unlifted types,
we can simply re-define ArrayArray# and MutableArrayArray# in terms
of them. This means that ArrayArray# and MutableArrayArray# are no
longer primitive types, but simply unlifted newtypes around Array# and
MutableArrayArray#.
This completes the implementation of the Pointer Rep proposal
  https://github.com/ghc-proposals/ghc-proposals/pull/203
Fixes #20911
-------------------------
Metric Increase:
    T12545
-------------------------
-------------------------
Metric Decrease:
    T12545
------------------------- | 
| | 
| 
| 
| 
| 
| 
| | Here we introduce code generator support for instrument array primops
with bounds checking, enabled with the `-fcheck-prim-bounds` flag.
Introduced to debug #20769. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | isUnliftedTyCon was used in three places: Ticky, Template Haskell
and FFI checks.
It was straightforward to remove it from Ticky and Template Haskell.
It is now used in FFI only and renamed to marshalablePrimTyCon.
Previously, it was fetching information from a field
in PrimTyCon called is_unlifted. Instead, I've changed the code
to compute liftedness based on the kind.
isFFITy and legalFFITyCon are removed. They were only referred from
an old comment that I removed.
There were three functions to define a PrimTyCon, but the only difference
was that they were setting is_unlifted to True or False.
Everything is now done in mkPrimTyCon.
I also added missing integer types in Ticky.hs, I think it was an oversight.
Fixes #20401 | 
| | 
| 
| 
| | comparison (#20088) | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Fixes #20541 by making mkTyConApp do more sharing of types.
In particular, replace
* BoxedRep Lifted    ==>  LiftedRep
* BoxedRep Unlifted  ==>  UnliftedRep
* TupleRep '[]       ==>  ZeroBitRep
* TYPE ZeroBitRep    ==>  ZeroBitType
In each case, the thing on the right is a type synonym
for the thing on the left, declared in ghc-prim:GHC.Types.
See Note [Using synonyms to compress types] in GHC.Core.Type.
The synonyms for ZeroBitRep and ZeroBitType are new, but absolutely
in the same spirit as the other ones.   (These synonyms are mainly
for internal use, though the programmer can use them too.)
I also renamed GHC.Core.Ty.Rep.isVoidTy to isZeroBitTy, to be
compatible with the "zero-bit" nomenclature above.  See discussion
on !6806.
There is a tricky wrinkle: see GHC.Core.Types
  Note [Care using synonyms to compress types]
Compiler allocation decreases by up to 0.8%. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Previously registration of ticky entry counters was racy, performing a
read-modify-write to add the new counter to the ticky_entry_ctrs list.
This could result in the list becoming cyclic if multiple threads
entered the same closure simultaneously.
Fixes #20451. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Emit an Info Table Provenance Entry (IPE) for every stack represeted info table
if -finfo-table-map is turned on.
To decode a cloned stack, lookupIPE() is used. It provides a mapping between
info tables and their source location.
Please see these notes for details:
- [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)]
- [Mapping Info Tables to Source Positions]
Metric Increase:
    T12545 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack).
The cloning interface consists of two functions, that clone either the treads
own stack (cloneMyStack) or another threads stack (cloneThreadStack).
The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is
useful for analyses as it prevents concurrent modifications.
For technical details, please see Note [Stack Cloning].
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com> | 
| | 
| 
| 
| 
| 
| | NCG needs to call slow FFI functions where we "borrow" the C compiler's
implementation, but there is no reason why we need to do that for LLVM,
or the unregisterized backend where everything is via C anyways! | 
| | 
| 
| 
| | Closes #20275 | 
| | 
| 
| 
| 
| 
| 
| 
| | This prepares us to actually use them when the native size is 64 bits
too.
I more than saitisfied my curiosity finding they were gated since
47774449c9d66b768a70851fe82c5222c1f60689. | 
| | 
| 
| 
| 
| 
| | StgToCmm was only using literals signedness to determine whether using
Int and Word range in Cmm switches. Now that we have sized literals
(Int8#, Int16#, etc.), it needs to take their ranges into account. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Using a hash map reduces the complexity of lookupIPE(), making it non linear.
On registration each IPE list is added to a temporary IPE lists buffer, reducing
registration time. The hash map is built lazily on first lookup.
IPE event output to stderr is added with tests.
For details, please see
Note [The Info Table Provenance Entry (IPE) Map].
A performance test for IPE registration and lookup can be found here:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5724#note_370806 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
 - No symlinking for sake of windows, so we hard-link at configure time.
 - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
   being moved to `compiler/`) so as not to confuse anyone, since it is
   next to Haskell files.
 - Blanket `-Iincludes` is gone in both build systems, include paths now
   more strictly respect per-package dependencies.
 - `deriveConstants` has been taught to not require a `--target-os` flag
   when generating the platform-agnostic Haskell type. Make takes
   advantage of this, but Hadrian has yet to. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | fixes #9192 and #17126
updates containers submodule
1. Changes the type of the primop `reallyUnsafePtrEquality#` to the most
general version possible (heterogeneous as well as levity-polymorphic):
> reallyUnsafePtrEquality#
>   :: forall {l :: Levity} {k :: Levity}
>        (a :: TYPE (BoxedRep l)) (b :: TYPE (BoxedRep k))
>   . a -> b -> Int#
2. Adds a new internal module, `GHC.Ext.PtrEq`, which contains pointer
equality operations that are now subsumed by `reallyUnsafePtrEquality#`.
These functions are then re-exported by `GHC.Exts` (so that no function
goes missing from the export list of `GHC.Exts`, which is user-facing).
More specifically, `GHC.Ext.PtrEq` defines:
  - A new function:
    * reallyUnsafePtrEquality :: forall (a :: Type). a -> a -> Int#
  - Library definitions of ex-primops:
     * `sameMutableArray#`
     * `sameSmallMutableArray`
     * `sameMutableByteArray#`
     * `sameMutableArrayArray#`
     * `sameMutVar#`
     * `sameTVar#`
     * `sameMVar#`
     * `sameIOPort#`
     * `eqStableName#`
  - New functions for comparing non-mutable arrays:
     * `sameArray#`
     * `sameSmallArray#`
     * `sameByteArray#`
     * `sameArrayArray#`
  These were requested in #9192.
Generally speaking, existing libraries that
use `reallyUnsafePtrEquality#` will continue to work with the new,
levity-polymorphic version. But not all!
Some (`containers`, `unordered-containers`, `dependent-map`) contain
the following:
> unsafeCoerce# reallyUnsafePtrEquality# a b
If we make `reallyUnsafePtrEquality#` levity-polymorphic, this code
fails the current GHC representation-polymorphism checks.
We agreed that the right solution here is to modify the library;
in this case by deleting the call to `unsafeCoerce#`,
since `reallyUnsafePtrEquality#` is now type-heterogeneous too. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Word64#/Int64# are only used on 32-bit architectures. Before this patch,
operations on these types were directly using the FFI. Now we use real
primops that are then lowered into ccalls.
The advantage of doing this is that we can now perform constant folding on
Word64#/Int64# (#19024).
Most of this work was done by John Ericson in !3658. However this patch
doesn't go as far as e.g. changing Word64 to always be using Word64#.
Noticeable performance improvements
          T9203(normal) run/alloc    89870808.0    66662456.0 -25.8% GOOD
  haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD
   haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD
Metric Decrease:
    T9203
    haddock.Cabal
    haddock.base | 
| | 
| 
| 
| | fixes #19628 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Previously the code generator's logic for invoking the nonmoving write
barrier was inconsistent with the write barrier itself. Namely, the code
generator treated the header size argument as being in words whereas the
barrier expected bytes. This was the cause of #19715.
Fixes #19715. | 
| | 
| 
| 
| 
| 
| 
| 
| | Now that Outputable is independent of DynFlags, we can put tracing
functions using SDocs into their own module that doesn't transitively
depend on any GHC.Driver.* module.
A few modules needed to be moved to avoid loops in DEBUG mode. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | As #19882 pointed out, we were simply doing rubbish literals wrong.
(I'll refrain from explaining the wrong-ness here -- see the ticket.)
This patch fixes it by adding a Type (of kind RuntimeRep) as field of
LitRubbish, rather than [PrimRep].
The Note [Rubbish literals] in GHC.Types.Literal explains the details. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle.  It was tested only on macOS and
Linux.  The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised.  There
are still opportunities for optimisations.
Metric Decrease:
    ManyAlternatives
    ManyConstructors
    MultiLayerModules
    PmSeriesG
    PmSeriesS
    PmSeriesT
    PmSeriesV
    T10421
    T10421a
    T10858
    T11195
    T11276
    T11303b
    T11374
    T11822
    T12227
    T12545
    T12707
    T13035
    T13253
    T13253-spj
    T13379
    T13701
    T13719
    T14683
    T14697
    T15164
    T15630
    T16577
    T17096
    T17516
    T17836
    T17836b
    T17977
    T17977b
    T18140
    T18282
    T18304
    T18478
    T18698a
    T18698b
    T18923
    T1969
    T3064
    T5030
    T5321FD
    T5321Fun
    T5631
    T5642
    T5837
    T783
    T9198
    T9233
    T9630
    T9872d
    T9961
    WWRec
Metric Increase:
    T4801 |