| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
|
|
|
| |
Unfortunately this will require more work; register allocation is
quite broken.
This reverts commit acd795583625401c5554f8e04ec7efca18814011.
|
|
|
|
|
|
|
| |
This adds support for constructing vector types from Float#, Double# etc
and performing arithmetic operations on them
Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
* makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
behavior in 32bit haskell code
* removes the 80bit floating point representation from the supported float sizes
* theres still 1 tiny bit of x87 support needed,
for handling float and double return values in FFI calls wrt the C ABI on x86_32,
but this one piece does not leak into the rest of NCG.
* Lots of code thats not been touched in a long time got deleted as a
consequence of all of this
all in all, this change paves the way towards a lot of future further
improvements in how GHC handles floating point computations, along with
making the native code gen more accessible to a larger pool of contributors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
primops.txt.pp gets two new sections for two new primitive types for
signed and unsigned 8-bit integers (Int8# and Word8 respectively) along
with basic arithmetic and comparison operations. PrimRep/RuntimeRep get
two new constructors for them. All of the primops translate into the
existing MachOPs.
For CmmCalls the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then truncate
them back their original width.
x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
This is the second attempt at merging this, after the first attempt in
D4475 had to be backed out due to regressions on i386.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate (on both x86-{32,64})
Reviewers: bgamari, hvr, goldfire, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5258
|
|
|
|
|
|
|
|
|
| |
This unfortunately broke i386 support since it introduced references to
byte-sized registers that don't exist on that architecture.
Reverts binary submodule
This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step of implementing:
https://github.com/ghc-proposals/ghc-proposals/pull/74
The main highlights/changes:
- `primops.txt.pp` gets two new sections for two new primitive types
for signed and unsigned 8-bit integers (`Int8#` and `Word8`
respectively) along with basic arithmetic and comparison
operations. `PrimRep`/`RuntimeRep` get two new constructors for
them. All of the primops translate into the existing `MachOP`s.
- For `CmmCall`s the codegen will now zero-extend the values at call
site (so that they can be moved to the right register) and then
truncate them back their original width.
- x86 native codegen needed some updates, since it wasn't able to deal
with the new widths, but all the changes are quite localized. LLVM
backend seems to just work.
Bumps binary submodule.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate with new tests
Reviewers: hvr, goldfire, bgamari, simonmar
Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds foldl' to GhcPrelude and changes must occurences
of foldl to foldl'. This leads to better performance especially
for quick builds where GHC does not perform strictness analysis.
It does change strictness behaviour when we use foldl' to turn
a argument list into function applications. But this is only a
drawback if code looks ONLY at the last argument but not at the first.
And as the benchmarks show leads to fewer allocations in practice
at O2.
Compiler performance for Nofib:
O2 Allocations:
-1 s.d. ----- -0.0%
+1 s.d. ----- -0.0%
Average ----- -0.0%
O2 Compile Time:
-1 s.d. ----- -2.8%
+1 s.d. ----- +1.3%
Average ----- -0.8%
O0 Allocations:
-1 s.d. ----- -0.2%
+1 s.d. ----- -0.1%
Average ----- -0.2%
Test Plan: ci
Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal
Reviewed By: bgamari, monoidal
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An info table with an SRT normally looks like this:
StgWord64 srt_offset
StgClosureInfo layout
StgWord32 layout
StgWord32 has_srt
But we only need 32 bits for srt_offset on x86_64, because the small
memory model requires that code segments are at most 2GB. So we can
optimise this to
StgClosureInfo layout
StgWord32 layout
StgWord32 srt_offset
saving a word. We can tell whether the info table has an SRT or not,
because zero is not a valid srt_offset, so zero still indicates that
there's no SRT.
Test Plan:
* validate
* For results, see D4632.
Reviewers: bgamari, niteria, osa1, erikd
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes it possible to generate a static 32-bit relative label
offset on x86_64. Currently we can only generate word-sized label
offsets.
This will be used in D4634 to shrink info tables. See D4632 for more
details.
Test Plan: See D4632
Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4633
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes a bunch of unnecessary includes of `HsVersions.h` along
with unnecessary CPP (e.g., due to checking for DEBUG which can be
achieved by looking at `debugIsOn`)
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: ./validate
Reviewers: bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4462
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches the compiler/ component to get compiled with
-XNoImplicitPrelude and a `import GhcPrelude` is inserted in all
modules.
This is motivated by the upcoming "Prelude" re-export of
`Semigroup((<>))` which would cause lots of name clashes in every
modulewhich imports also `Outputable`
Reviewers: austin, goldfire, bgamari, alanz, simonmar
Reviewed By: bgamari
Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari
Differential Revision: https://phabricator.haskell.org/D3989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the two functions strict in the accumulator - it seems that
there are only two users of those functions: `CmmLive` and `CmmSink`
and in both cases the strict fold fits better.
The commit also removes a few unused functions (`filterRegsUsed`),
instances (for `Maybe` and `RegSet`) and gets rid of unnecessary
inculde of `HsVersions.h`.
The performance effect of avoiding unnecessary thunks is mostly
negligible, although we do allocate a tiny bit less (nofib's section
on compile allocations):
```
-1 s.d. ----- -0.2%
+1 s.d. ----- -0.1%
Average ----- -0.2%
```
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
Test Plan: validate
Reviewers: simonmar, austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2723
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea behind adding special "rubbish" arguments was in unboxed sum types
depending on the tag some arguments are not used and we don't want to move some
special values (like 0 for literals and some special pointer for boxed slots)
for those arguments (to stack locations or registers). "StgRubbishArg" was an
indicator to the code generator that the value won't be used. During Stg-to-Cmm
we were then not generating any move or store instructions at all.
This caused problems in the register allocator because some variables were only
initialized in some code paths. As an example, suppose we have this STG: (after
unarise)
Lib.$WT =
\r [dt_sit]
case
case dt_sit of {
Lib.F dt_siv [Occ=Once] ->
(#,,#) [1# dt_siv StgRubbishArg::GHC.Prim.Int#];
Lib.I dt_siw [Occ=Once] ->
(#,,#) [2# StgRubbishArg::GHC.Types.Any dt_siw];
}
of
dt_six
{ (#,,#) us_giC us_giD us_giE -> Lib.T [us_giC us_giD us_giE];
};
This basically unpacks a sum type to an unboxed sum with 3 fields, and then
moves the unboxed sum to a constructor (`Lib.T`).
This is the Cmm for the inner case expression (case expression in the scrutinee
position of the outer case):
ciN:
...
-- look at dt_sit's tag
if (_ciT::P64 != 1) goto ciS; else goto ciR;
ciS: -- Tag is 2, i.e. Lib.F
_siw::I64 = I64[_siu::P64 + 6];
_giE::I64 = _siw::I64;
_giD::P64 = stg_RUBBISH_ENTRY_info;
_giC::I64 = 2;
goto ciU;
ciR: -- Tag is 1, i.e. Lib.I
_siv::P64 = P64[_siu::P64 + 7];
_giD::P64 = _siv::P64;
_giC::I64 = 1;
goto ciU;
Here one of the blocks `ciS` and `ciR` is executed and then the execution
continues to `ciR`, but only `ciS` initializes `_giE`, in the other branch
`_giE` is not initialized, because it's "rubbish" in the STG and so we don't
generate an assignment during code generator. The code generator then panics
during the register allocations:
ghc-stage1: panic! (the 'impossible' happened)
(GHC version 8.1.20160722 for x86_64-unknown-linux):
LocalReg's live-in to graph ciY {_giE::I64}
(`_giD` is also "rubbish" in `ciS`, but it's still initialized because it's a
pointer slot, we have to initialize it otherwise garbage collector follows the
pointer to some random place. So we only remove assignment if the "rubbish" arg
has unboxed type.)
This patch removes `StgRubbishArg` and `CmmArg`. We now always initialize
rubbish slots. If the slot is for boxed types we use the existing `absentError`,
otherwise we initialize the slot with literal 0.
Reviewers: simonpj, erikd, austin, simonmar, bgamari
Reviewed By: erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements primitive unboxed sum types, as described in
https://ghc.haskell.org/trac/ghc/wiki/UnpackedSumTypes.
Main changes are:
- Add new syntax for unboxed sums types, terms and patterns. Hidden
behind `-XUnboxedSums`.
- Add unlifted unboxed sum type constructors and data constructors,
extend type and pattern checkers and desugarer.
- Add new RuntimeRep for unboxed sums.
- Extend unarise pass to translate unboxed sums to unboxed tuples right
before code generation.
- Add `StgRubbishArg` to `StgArg`, and a new type `CmmArg` for better
code generation when sum values are involved.
- Add user manual section for unboxed sums.
Some other changes:
- Generalize `UbxTupleRep` to `MultiRep` and `UbxTupAlt` to
`MultiValAlt` to be able to use those with both sums and tuples.
- Don't use `tyConPrimRep` in `isVoidTy`: `tyConPrimRep` is really
wrong, given an `Any` `TyCon`, there's no way to tell what its kind
is, but `kindPrimRep` and in turn `tyConPrimRep` returns `PtrRep`.
- Fix some bugs on the way: #12375.
Not included in this patch:
- Update Haddock for new the new unboxed sum syntax.
- `TemplateHaskell` support is left as future work.
For reviewers:
- Front-end code is mostly trivial and adapted from unboxed tuple code
for type checking, pattern checking, renaming, desugaring etc.
- Main translation routines are in `RepType` and `UnariseStg`.
Documentation in `UnariseStg` should be enough for understanding
what's going on.
Credits:
- Johan Tibell wrote the initial front-end and interface file
extensions.
- Simon Peyton Jones reviewed this patch many times, wrote some code,
and helped with debugging.
Reviewers: bgamari, alanz, goldfire, RyanGlScott, simonpj, austin,
simonmar, hvr, erikd
Reviewed By: simonpj
Subscribers: Iceland_jack, ggreif, ezyang, RyanGlScott, goldfire,
thomie, mpickering
Differential Revision: https://phabricator.haskell.org/D2259
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ord Unique can be a source of invisible, accidental
nondeterminism as explained in Note [No Ord for Unique].
This removes it, leaving a note with rationale.
It's unfortunate that I had to write Ord instances for
codegen data structures by hand, but I believe that it's a
right trade-off here.
Test Plan: ./validate
Reviewers: simonmar, austin, bgamari
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2370
GHC Trac Issues: #4012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the ideas originally put forward in
"System FC with Explicit Kind Equality" (ICFP'13).
There are several noteworthy changes with this patch:
* We now have casts in types. These change the kind
of a type. See new constructor `CastTy`.
* All types and all constructors can be promoted.
This includes GADT constructors. GADT pattern matches
take place in type family equations. In Core,
types can now be applied to coercions via the
`CoercionTy` constructor.
* Coercions can now be heterogeneous, relating types
of different kinds. A coercion proving `t1 :: k1 ~ t2 :: k2`
proves both that `t1` and `t2` are the same and also that
`k1` and `k2` are the same.
* The `Coercion` type has been significantly enhanced.
The documentation in `docs/core-spec/core-spec.pdf` reflects
the new reality.
* The type of `*` is now `*`. No more `BOX`.
* Users can write explicit kind variables in their code,
anywhere they can write type variables. For backward compatibility,
automatic inference of kind-variable binding is still permitted.
* The new extension `TypeInType` turns on the new user-facing
features.
* Type families and synonyms are now promoted to kinds. This causes
trouble with parsing `*`, leading to the somewhat awkward new
`HsAppsTy` constructor for `HsType`. This is dispatched with in
the renamer, where the kind `*` can be told apart from a
type-level multiplication operator. Without `-XTypeInType` the
old behavior persists. With `-XTypeInType`, you need to import
`Data.Kind` to get `*`, also known as `Type`.
* The kind-checking algorithms in TcHsType have been significantly
rewritten to allow for enhanced kinds.
* The new features are still quite experimental and may be in flux.
* TODO: Several open tickets: #11195, #11196, #11197, #11198, #11203.
* TODO: Update user manual.
Tickets addressed: #9017, #9173, #7961, #10524, #8566, #11142.
Updates Haddock submodule.
|
|
|
|
|
|
|
|
|
|
| |
We will need to use these to setup proper unwinding information for the
stg_stop_thread closure. This pokes a hole in the STG abstraction,
exposing the machine's stack pointer register so that we can accomplish
this. We also expose a dummy return address register, which corresponds
to the register used to hold the DWARF return address.
Differential Revision: https://phabricator.haskell.org/D1225
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned
the STG registers F1 and D1 the same hardware register (xmm1), and
the same for the registers F2 and D2, etc. When mixing calls to
functions involving Float#s and Double#s, this can cause wrong Cmm
optimizations that assume the F1 and D1 registers are independent.
Reviewers: simonpj, austin
Reviewed By: austin
Subscribers: simonpj, thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D993
GHC Trac Issues: #10521
|
|
|
|
| |
-fwarn-redundant-constraints
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of silent superclass parameters was to solve the
awkward problem of superclass dictinaries being bound to bottom.
See THE PROBLEM in Note [Recursive superclasses] in TcInstDcls
Although the silent-superclass idea worked,
* It had non-local consequences, and had effects even in Haddock,
where we had to discard silent parameters before displaying
instance declarations
* It had unexpected peformance costs, shown up by Trac #3064 and its
test case. In monad-transformer code, when constructing a Monad
dictionary you had to pass an Applicative dictionary; and to
construct that you neede a Functor dictionary. Yet these extra
dictionaries were often never used. (All this got much worse when
we added Applicative as a superclass of Monad.) Test T3064
compiled *far* faster after silent superclasses were eliminated.
* It introduced new bugs. For example SilentParametersOverlapping,
T5051, and T7862, all failed to compile because of instance overlap
directly because of the silent-superclass trick.
So this patch takes a new approach, which I worked out with Dimitrios
in the closing hours before Christmas. It is described in detail
in THE PROBLEM in Note [Recursive superclasses] in TcInstDcls.
Seems to work great!
Quite a bit of knock-on effect
* The main implementation work is in tcSuperClasses in TcInstDcls
Everything else is fall-out
* IdInfo.DFunId no longer needs its n-silent argument
* Ditto IDFunId in IfaceSyn
* Hence interface file format changes
* Now that DFunIds do not have silent superclass parameters, printing
out instance declarations is simpler. There is tiny knock-on effect
in Haddock, so that submodule is updated
* I realised that when computing the "size of a dictionary type"
in TcValidity.sizePred, we should be rather conservative about
type functions, which can arbitrarily increase the size of a type.
Hence the new datatype TypeSize, which has a TSBig constructor for
"arbitrarily big".
* instDFunType moves from TcSMonad to Inst
* Interestingly, CmmNode and CmmExpr both now need a non-silent
(Ord r) in a couple of instance declarations. These were previously
silent but must now be explicit.
* Quite a bit of wibbling in error messages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for 6 XMM registers on x86-64 which overlap with the F
and D registers and may hold 128-bit wide SIMD vectors. Because there is not a
good way to attach type information to STG registers, we aggressively bitcast in
the LLVM back-end.
|
| |
|
|
|
|
|
|
| |
We would like to calculate register liveness for global registers as well as
local registers, so this patch generalizes the existing infrastructure to set
the stage.
|
| |
|
|
|
|
|
|
| |
I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future.
|
| |
|
| |
|
|
|
|
| |
aargh.
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* origin/master: (756 commits)
don't crash if argv[0] == NULL (#7037)
-package P was loading all versions of P in GHCi (#7030)
Add a Note, copying text from #2437
improve the --help docs a bit (#7008)
Copy Data.HashTable's hashString into our Util module
Build fix
Build fixes
Parse error: suggest brackets and indentation.
Don't build the ghc DLL on Windows; works around trac #5987
On Windows, detect if DLLs have too many symbols; trac #5987
Add some more Integer rules; fixes #6111
Fix PA dfun construction with silent superclass args
Add silent superclass parameters to the vectoriser
Add silent superclass parameters (again)
Mention Generic1 in the user's guide
Make the GHC API a little more powerful.
tweak llvm version warning message
New version of the patch for #5461.
Fix Word64ToInteger conversion rule.
Implemented feature request on reconfigurable pretty-printing in GHCi (#5461)
...
Conflicts:
compiler/basicTypes/UniqSupply.lhs
compiler/cmm/CmmBuildInfoTables.hs
compiler/cmm/CmmLint.hs
compiler/cmm/CmmOpt.hs
compiler/cmm/CmmPipeline.hs
compiler/cmm/CmmStackLayout.hs
compiler/cmm/MkGraph.hs
compiler/cmm/OldPprCmm.hs
compiler/codeGen/CodeGen.lhs
compiler/codeGen/StgCmm.hs
compiler/codeGen/StgCmmBind.hs
compiler/codeGen/StgCmmLayout.hs
compiler/codeGen/StgCmmUtils.hs
compiler/main/CodeOutput.lhs
compiler/main/HscMain.hs
compiler/nativeGen/AsmCodeGen.lhs
compiler/simplStg/SimplStg.lhs
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Also:
- improvements to code generation: push slow-call continuations
on the stack instead of generating explicit continuations
- remove unused CmmInfo wrapper type (replace with CmmInfoTable)
- squash Area and AreaId together, remove now-unused RegSlot
- comment out old unused stack-allocation code that no longer
compiles after removal of RegSlot
|
| | |
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
|
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|