| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike other targets, wasm requires the function signature of the call
site and callee to strictly match. So in Cmm, when we call a C
function that actually returns a value, we need to add an _unused
local variable to receive it, otherwise type error awaits.
An even bigger problem is calling variadic functions like barf() and
such. Cmm doesn't support CAPI calling convention yet, so calls to
variadic functions just happen to work in some cases with some
target's ABI. But again, it doesn't work with wasm. Fortunately, the
wasm C ABI lowers varargs to a stack pointer argument, and it can be
passed NULL when no other arguments are expected to be passed. So we
also add the additional unused NULL arguments to those functions, so
to fix wasm, while not affecting behavior on other targets.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements GHC proposal 313, "Delimited continuation
primops", by adding native support for delimited continuations to the
GHC RTS.
All things considered, the patch is relatively small. It almost
exclusively consists of changes to the RTS; the compiler itself is
essentially unaffected. The primops come with fairly extensive Haddock
documentation, and an overview of the implementation strategy is given
in the Notes in rts/Continuation.c.
This first stab at the implementation prioritizes simplicity over
performance. Most notably, every continuation is always stored as a
single, contiguous chunk of stack. If one of these chunks is
particularly large, it can result in poor performance, as the current
implementation does not attempt to cleverly squeeze a subset of the
stack frames into the existing stack: it must fit all at once. If this
proves to be a performance issue in practice, a cleverer strategy would
be a worthwhile target for future improvements.
|
|
|
|
| |
Fixes #21251
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SMALL_MUT_ARR_PTRS_FROZEN0 -> SMALL_MUT_ARR_PTRS_FROZEN_DIRTY
SMALL_MUT_ARR_PTRS_FROZEN -> SMALL_MUT_ARR_PTRS_FROZEN_CLEAN
MUT_ARR_PTRS_FROZEN0 -> MUT_ARR_PTRS_FROZEN_DIRTY
MUT_ARR_PTRS_FROZEN -> MUT_ARR_PTRS_FROZEN_CLEAN
Naming is now consistent with other CLEAR/DIRTY objects (MVAR, MUT_VAR,
MUT_ARR_PTRS).
(alternatively we could rename MVAR_DIRTY/MVAR_CLEAN etc. to MVAR0/MVAR)
Removed a few comments in Scav.c about FROZEN0 being on the mut_list
because it's now clear from the closure type.
Reviewers: bgamari, simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4784
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was blatantly wrong due to copy-paste blindness:
* labels were shadowed, which GHC doesn't warn about(!), resulting in
plainly wrong behavior
* the sharing check was omitted
* the wrong closure layout was being used
Moreover, the test wasn't being run due to its primitive dependency, so
I didn't even notice. Sillyness.
Test Plan: install `primitive`, `make test TEST=compact_small_array`
Reviewers: simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #13857.
Differential Revision: https://phabricator.haskell.org/D4702
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide information about paths more likely to be taken in the cmm files
used by the rts.
This leads to slightly better assembly being generated.
Reviewers: bgamari, erikd, simonmar
Subscribers: alexbiehl, rwbarton, thomie, carter
GHC Trac Issues: #14672
Differential Revision: https://phabricator.haskell.org/D4324
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate
Reviewers: austin, erikd, simonmar, dfeuer
Reviewed By: dfeuer
Subscribers: rwbarton, andrewthad, thomie, dfeuer
GHC Trac Issues: #13860, #13857
Differential Revision: https://phabricator.haskell.org/D3888
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, erikd, simonmar
Reviewed By: erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change does the following:
- Add explicit declaration of exception closures
from base. C backend needs those symbols to be
visible.
- Reorder cmm functions in use order. Again C
backend needs symbol declaration/definition
before use. even for module-local cmm functions.
Fixes the following build failure:
rts_dist_HC rts/dist/build/Compact.o
In file included from /tmp/ghc3348_0/ghc_4.hc:3:0: error:
/tmp/ghc3348_0/ghc_4.hc: In function 'stg_compactAddWithSharingzh':
/tmp/ghc3348_0/ghc_4.hc:27:11: error:
error: 'stg_compactAddWorkerzh' undeclared (first use in this function)
JMP_((W_)&stg_compactAddWorkerzh);
^
...
/tmp/ghc3348_0/ghc_4.hc:230:13: error:
error: 'base_GHCziIOziException_cannotCompactMutable_closure'
undeclared (first use in this function)
R1.w = (W_)&base_GHCziIOziException_cannotCompactMutable_closure;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|