| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
| |
I need to do this now or when I move these files the linter will be mad.
|
|
|
|
|
|
| |
Sized arrays cannot be used in headers that might be imported from C++.
Fixes #20199.
|
|
|
|
| |
Fixes #20160.
|
|
|
|
| |
All uses of these now use ExecPage.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|
|
|
|
|
|
|
|
|
| |
This drops allocateExec for darwin, and replaces it with
a alloc, write, mark executable strategy instead. This prevents
us from trying to allocate an executable range and then write to
it, which X^W will prohibit on darwin.
This will *only* work if we can use mmap.
|
|
|
|
|
|
|
|
|
| |
This function is exposed in the RtsAPI.h so that external users have a
blessed way to traverse all the different `bdescr`s which are known by
the RTS.
The main motivation is to use this function in ghc-debug but avoid
having to expose the internal structure of a Capability in the API.
|
|
|
|
|
| |
Having a union in the closure profiling header really just complicates
things so get back to basics, we just have a single StgWord there for now.
|
|
|
|
|
|
|
|
| |
When threadPaused blackholes a thunk it calls `OVERWRITING_CLOSURE` to
zero the slop for the benefit of the sanity checker. Previously this was
done *before* pushing the thunk's free variables to the update
remembered set. Consequently we would pull zero'd pointers to the update
remembered set.
|
|
|
|
|
|
|
|
| |
This addes the necessary logic to support aarch64 on elf, as well
as aarch64 on mach-o, which Apple calls arm64.
We change architecture name to AArch64, which is the official arm
naming scheme.
|
| |
|
| |
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously the `current_value`, `first_watch_queue_entry`, and
`num_updates` fields of `StgTVar` were marked as `volatile` in an
attempt to provide strong ordering. Of course, this isn't sufficient.
We now use proper atomic operations. In most of these cases I strengthen
the ordering all the way to SEQ_CST although it's possible that some
could be weakened with some thought.
|
|/ |
|
|
|
|
|
| |
Also ensure that we also store the info table pointer last to ensure
that the synchronization covers all stores.
|
| |
|
| |
|
|
|
|
|
| |
This macro is not used and got broken in the meantime, as ENTRY_CODE was
deleted.
|
|
|
|
|
|
|
|
|
| |
When shrinking arrays in the profiling way we currently don't always zero
the leftover slop. This means we can't traverse such closures in the heap
profiler. The old Note [zeroing slop] and #8402 have some rationale for why
this is so but I belive the reasoning doesn't apply to mutable
closures. There users already have to ensure multiple threads don't step on
each other's toes so zeroing should be safe.
|
|
|
|
|
|
| |
The new ASSERT in LDV_recordDead() was being tripped up by MVars when
removeFromMVarBlockedQueue() calls OVERWRITING_CLOSURE() via
OVERWRITE_INFO().
|
|
|
|
|
|
|
|
|
|
|
| |
The code is just more confusing than it needs to be. We don't need to mix
the threaded check with the ldv profiling check since ldv's init already
checks for this. Hence they can be two separate checks. Taking the sanity
checking into account is also cleaner via DebugFlags.sanity. No need for
checking the DEBUG define.
The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the
old code had also make things a lot more opaque IMO so I removed those.
|
|
|
|
|
| |
The comments make it clear LDV_recordDead should not be called for
inhererently used closures, so add an assertion to codify this fact.
|
|
|
|
|
|
|
|
|
|
| |
The additional commentary introduced by commit 8916e64e5437 ("Implement
shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got
this wrong. We set 'prim' to true in overwritingClosureOfs because we
_don't_ want to call LDV_recordDead().
The reason is because of this "inherently used" distinction made in the LDV
profiler so I rename the variable to be more appropriate.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The heap profiler currently cannot traverse pinned blocks because of
alignment slop. This used to just be a minor annoyance as the whole block
is accounted into a special cost center rather than the respective object's
CCS, cf. #7275. However for the new root profiler we would like to be able
to visit _every_ closure on the heap. We need to do this so we can get rid
of the current 'flip' bit hack in the heap traversal code.
Since info pointers are always non-zero we can in principle skip all the
slop in the profiler if we can rely on it being zeroed. This assumption
caused problems in the past though, commit a586b33f8e ("rts: Correct
handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use
the same trick for BF_LARGE objects but neglected to take into account that
shrink*Array# functions don't ensure that slop is zeroed when not
compiling with profiling.
Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix
as we will only be assuming slop is zeroed when profiling is on.
This commit also reduces the ammount of slop we introduce in the first
place by calculating the needed alignment before doing the allocation for
small objects where we know the next available address. For large objects
we don't know how much alignment we'll have to do yet since those details
are hidden behind the allocateMightFail function so there we continue to
allocate the maximum additional words we'll need to do the alignment.
So we don't have to duplicate all this logic in the cmm code we pull it
into the RTS allocatePinned function instead.
Metric Decrease:
T7257
haddock.Cabal
haddock.base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #17937
Previously compacting GC simply ignored CNFs. This is mostly fine as
most (see "What about small compacts?" below) CNF objects don't have
outgoing pointers, and are "large" (allocated in large blocks) and large
objects are not moved or compacted.
However if we do GC *during* sharing-preserving compaction then the CNF
will have a hash table mapping objects that have been moved to the CNF
to their location in the CNF, to be able to preserve sharing.
This case is handled in the copying collector, in `scavenge_compact`,
where we evacuate hash table entries and then rehash the table.
Compacting GC ignored this case.
We now visit CNFs in all generations when threading pointers to the
compacted heap and thread hash table keys. A visited CNF is added to the
list `nfdata_chain`. After compaction is done, we re-visit the CNFs in
that list and rehash the tables.
The overhead is minimal: the list is static in `Compact.c`, and link
field is added to `StgCompactNFData` closure. Programs that don't use
CNFs should not be affected.
To test this CNF tests are now also run in a new way 'compacting_gc',
which just passes `-c` to the RTS, enabling compacting GC for the oldest
generation. Before this patch the result would be:
Unexpected failures:
compact_gc.run compact_gc [bad exit code (139)] (compacting_gc)
compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc)
With this patch all tests pass. I can also pass `-c -DS` without any
failures.
What about small compacts? Small CNFs are still not handled by the
compacting GC. However so far I'm unable to write a test that triggers a
runtime panic ("update_fwd: unknown/strange object") by allocating a
small CNF in a compated heap. It's possible that I'm missing something
and it's not possible to have a small CNF.
NoFib Results:
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS +0.1% 0.0% 0.0% +0.0% +0.0%
CSD +0.1% 0.0% 0.0% 0.0% 0.0%
FS +0.1% 0.0% 0.0% 0.0% 0.0%
S +0.1% 0.0% 0.0% 0.0% 0.0%
VS +0.1% 0.0% 0.0% 0.0% 0.0%
VSD +0.1% 0.0% +0.0% +0.0% -0.0%
VSM +0.1% 0.0% +0.0% -0.0% 0.0%
anna +0.0% 0.0% -0.0% -0.0% -0.0%
ansi +0.1% 0.0% +0.0% +0.0% +0.0%
atom +0.1% 0.0% +0.0% +0.0% +0.0%
awards +0.1% 0.0% +0.0% +0.0% +0.0%
banner +0.1% 0.0% +0.0% +0.0% +0.0%
bernouilli +0.1% 0.0% 0.0% -0.0% +0.0%
binary-trees +0.1% 0.0% -0.0% -0.0% 0.0%
boyer +0.1% 0.0% +0.0% +0.0% +0.0%
boyer2 +0.1% 0.0% +0.0% +0.0% +0.0%
bspt +0.1% 0.0% -0.0% -0.0% -0.0%
cacheprof +0.1% 0.0% -0.0% -0.0% -0.0%
calendar +0.1% 0.0% +0.0% +0.0% +0.0%
cichelli +0.1% 0.0% +0.0% +0.0% +0.0%
circsim +0.1% 0.0% +0.0% +0.0% +0.0%
clausify +0.1% 0.0% -0.0% +0.0% +0.0%
comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0%
compress +0.1% 0.0% +0.0% +0.0% 0.0%
compress2 +0.1% 0.0% -0.0% 0.0% 0.0%
constraints +0.1% 0.0% +0.0% +0.0% +0.0%
cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0%
cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0%
cse +0.1% 0.0% +0.0% +0.0% +0.0%
digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0%
digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0%
dom-lt +0.1% 0.0% +0.0% +0.0% +0.0%
eliza +0.1% 0.0% +0.0% +0.0% +0.0%
event +0.1% 0.0% +0.0% +0.0% +0.0%
exact-reals +0.1% 0.0% +0.0% +0.0% +0.0%
exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0%
expert +0.1% 0.0% +0.0% +0.0% +0.0%
fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0%
fasta +0.1% 0.0% -0.0% +0.0% +0.0%
fem +0.1% 0.0% -0.0% +0.0% 0.0%
fft +0.1% 0.0% -0.0% +0.0% +0.0%
fft2 +0.1% 0.0% +0.0% +0.0% +0.0%
fibheaps +0.1% 0.0% +0.0% +0.0% +0.0%
fish +0.1% 0.0% +0.0% +0.0% +0.0%
fluid +0.0% 0.0% +0.0% +0.0% +0.0%
fulsom +0.1% 0.0% -0.0% +0.0% 0.0%
gamteb +0.1% 0.0% +0.0% +0.0% 0.0%
gcd +0.1% 0.0% +0.0% +0.0% +0.0%
gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0%
genfft +0.1% 0.0% +0.0% +0.0% +0.0%
gg +0.1% 0.0% 0.0% +0.0% +0.0%
grep +0.1% 0.0% -0.0% +0.0% +0.0%
hidden +0.1% 0.0% +0.0% -0.0% 0.0%
hpg +0.1% 0.0% -0.0% -0.0% -0.0%
ida +0.1% 0.0% +0.0% +0.0% +0.0%
infer +0.1% 0.0% +0.0% 0.0% -0.0%
integer +0.1% 0.0% +0.0% +0.0% +0.0%
integrate +0.1% 0.0% -0.0% -0.0% -0.0%
k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0%
kahan +0.1% 0.0% +0.0% +0.0% +0.0%
knights +0.1% 0.0% -0.0% -0.0% -0.0%
lambda +0.1% 0.0% +0.0% +0.0% -0.0%
last-piece +0.1% 0.0% +0.0% 0.0% 0.0%
lcss +0.1% 0.0% +0.0% +0.0% 0.0%
life +0.1% 0.0% -0.0% +0.0% +0.0%
lift +0.1% 0.0% +0.0% +0.0% +0.0%
linear +0.1% 0.0% -0.0% +0.0% 0.0%
listcompr +0.1% 0.0% +0.0% +0.0% +0.0%
listcopy +0.1% 0.0% +0.0% +0.0% +0.0%
maillist +0.1% 0.0% +0.0% -0.0% -0.0%
mandel +0.1% 0.0% +0.0% +0.0% 0.0%
mandel2 +0.1% 0.0% +0.0% +0.0% +0.0%
mate +0.1% 0.0% +0.0% 0.0% +0.0%
minimax +0.1% 0.0% -0.0% 0.0% -0.0%
mkhprog +0.1% 0.0% +0.0% +0.0% +0.0%
multiplier +0.1% 0.0% +0.0% 0.0% 0.0%
n-body +0.1% 0.0% +0.0% +0.0% +0.0%
nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0%
para +0.1% 0.0% 0.0% +0.0% +0.0%
paraffins +0.1% 0.0% +0.0% -0.0% 0.0%
parser +0.1% 0.0% -0.0% -0.0% -0.0%
parstof +0.1% 0.0% +0.0% +0.0% +0.0%
pic +0.1% 0.0% -0.0% -0.0% 0.0%
pidigits +0.1% 0.0% +0.0% -0.0% -0.0%
power +0.1% 0.0% +0.0% +0.0% +0.0%
pretty +0.1% 0.0% -0.0% -0.0% -0.1%
primes +0.1% 0.0% -0.0% -0.0% -0.0%
primetest +0.1% 0.0% -0.0% -0.0% -0.0%
prolog +0.1% 0.0% -0.0% -0.0% -0.0%
puzzle +0.1% 0.0% -0.0% -0.0% -0.0%
queens +0.1% 0.0% +0.0% +0.0% +0.0%
reptile +0.1% 0.0% -0.0% -0.0% +0.0%
reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0%
rewrite +0.1% 0.0% -0.0% -0.0% -0.0%
rfib +0.1% 0.0% +0.0% +0.0% +0.0%
rsa +0.1% 0.0% -0.0% +0.0% -0.0%
scc +0.1% 0.0% -0.0% -0.0% -0.1%
sched +0.1% 0.0% +0.0% +0.0% +0.0%
scs +0.1% 0.0% +0.0% +0.0% +0.0%
simple +0.1% 0.0% -0.0% -0.0% -0.0%
solid +0.1% 0.0% +0.0% +0.0% +0.0%
sorting +0.1% 0.0% -0.0% -0.0% -0.0%
spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0%
sphere +0.1% 0.0% -0.0% -0.0% -0.0%
symalg +0.1% 0.0% -0.0% -0.0% -0.0%
tak +0.1% 0.0% +0.0% +0.0% +0.0%
transform +0.1% 0.0% +0.0% +0.0% +0.0%
treejoin +0.1% 0.0% +0.0% -0.0% -0.0%
typecheck +0.1% 0.0% +0.0% +0.0% +0.0%
veritas +0.0% 0.0% +0.0% +0.0% +0.0%
wang +0.1% 0.0% 0.0% +0.0% +0.0%
wave4main +0.1% 0.0% +0.0% +0.0% +0.0%
wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0%
wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0%
x2n1 +0.1% 0.0% +0.0% +0.0% +0.0%
--------------------------------------------------------------------------------
Min +0.0% 0.0% -0.0% -0.0% -0.1%
Max +0.1% 0.0% +0.0% +0.0% +0.0%
Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0%
Bumping numbers of nonsensical perf tests:
Metric Increase:
T12150
T12234
T12425
T13035
T5837
T6048
It's simply not possible for this patch to increase allocations, and
I've wasted enough time on these test in the past (see #17686). I think
these tests should not be perf tests, but for now I'll bump the numbers.
|
|
|
|
|
|
|
|
| |
- Added a few comments in StgPAP
- Added a few comments and assertions in scavenge_small_bitmap and
walk_large_bitmap
- Did tiny refactor in GHC.Data.Bitmap: added some comments, deleted
dead code, used PlatformWordSize type.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we used INFO_PTR_TO_STRUCT instead of
THUNK_INFO_PTR_TO_STRUCT when looking at a thunk. These two happen to be
equivalent on 64-bit architectures due to alignment considerations
however they are different on 32-bit platforms. This lead to #17487.
To fix this we also employ a small optimization: there is only one thunk
of type WHITEHOLE (namely stg_WHITEHOLE_info). Consequently, we can just
use a plain pointer comparison instead of testing against info->type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a part of GHC Proposal #25: "Offer more array resizing primitives".
Resources related to the proposal:
- Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121
- Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst
Only shrinkSmallMutableArray# is implemented as a primop since a
library-space implementation of resizeSmallMutableArray# (in GHC.Exts)
is no less efficient than a primop would be. This may be replaced by
a primop in the future if someone devises a strategy for growing
arrays in-place. The library-space implementation always copies the
array when growing it.
This commit also tweaks the documentation of the deprecated
sizeofMutableByteArray#, removing the mention of concurrency. That
primop is unsound even in single-threaded applications. Additionally,
the non-negativity assertion on the existing shrinkMutableByteArray#
primop has been removed since this predicate is trivially always true.
|
|\ \
| | |
| | |
| | | |
wip/gc/everything2
|
| |/
|/| |
|
| | |
|
|/ |
|
|
|
|
|
|
| |
This required some fiddling around with the location of forward
declarations since the C sources generated by GHC's C backend only
includes Stg.h.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the non-moving collector to allow concurrent collection.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
This extension involves the introduction of a capability-local
remembered set, known as the /update remembered set/, which tracks
objects which may no longer be visible to the collector due to mutation.
To maintain this remembered set we introduce a write barrier on
mutations which is enabled while a concurrent mark is underway.
The update remembered set representation is similar to that of the
nonmoving mark queue, being a chunked array of `MarkEntry`s. Each
`Capability` maintains a single accumulator chunk, which it flushed
when it (a) is filled, or (b) when the nonmoving collector enters its
post-mark synchronization phase.
While the write barrier touches a significant amount of code it is
conceptually straightforward: the mutator must ensure that the referee
of any pointer it overwrites is added to the update remembered set.
However, there are a few details:
* In the case of objects with a dirty flag (e.g. `MVar`s) we can
exploit the fact that only the *first* mutation requires a write
barrier.
* Weak references, as usual, complicate things. In particular, we must
ensure that the referee of a weak object is marked if dereferenced by
the mutator. For this we (unfortunately) must introduce a read
barrier, as described in Note [Concurrent read barrier on deRefWeak#]
(in `NonMovingMark.c`).
* Stable names are also a bit tricky as described in Note [Sweeping
stable names in the concurrent collector] (`NonMovingSweep.c`).
We take quite some pains to ensure that the high thread count often seen
in parallel Haskell applications doesn't affect pause times. To this end
we allow thread stacks to be marked either by the thread itself (when it
is executed or stack-underflows) or the concurrent mark thread (if the
thread owning the stack is never scheduled). There is a non-trivial
handshake to ensure that this happens without racing which is described
in Note [StgStack dirtiness flags and concurrent marking].
Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the core heap structure and a serial mark/sweep
collector which can be used to manage the oldest-generation heap.
This is the first step towards a concurrent mark-and-sweep collector
aimed at low-latency applications.
The full design of the collector implemented here is described in detail
in a technical note
B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell
Compiler" (2018)
The basic heap structure used in this design is heavily inspired by
K. Ueno & A. Ohori. "A fully concurrent garbage collector for
functional programs on multicore processors." /ACM SIGPLAN Notices/
Vol. 51. No. 9 (presented by ICFP 2016)
This design is intended to allow both marking and sweeping
concurrent to execution of a multi-core mutator. Unlike the Ueno design,
which requires no global synchronization pauses, the collector
introduced here requires a stop-the-world pause at the beginning and end
of the mark phase.
To avoid heap fragmentation, the allocator consists of a number of
fixed-size /sub-allocators/. Each of these sub-allocators allocators into
its own set of /segments/, themselves allocated from the block
allocator. Each segment is broken into a set of fixed-size allocation
blocks (which back allocations) in addition to a bitmap (used to track
the liveness of blocks) and some additional metadata (used also used
to track liveness).
This heap structure enables collection via mark-and-sweep, which can be
performed concurrently via a snapshot-at-the-beginning scheme (although
concurrent collection is not implemented in this patch).
The mark queue is a fairly straightforward chunked-array structure.
The representation is a bit more verbose than a typical mark queue to
accomodate a combination of two features:
* a mark FIFO, which improves the locality of marking, reducing one of
the major overheads seen in mark/sweep allocators (see [1] for
details)
* the selector optimization and indirection shortcutting, which
requires that we track where we found each reference to an object
in case we need to update the reference at a later point (e.g. when
we find that it is an indirection). See Note [Origin references in
the nonmoving collector] (in `NonMovingMark.h`) for details.
Beyond this the mark/sweep is fairly run-of-the-mill.
[1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for
Mark-Sweep Garbage Collection." ISMM 2007.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|\ \
| | |
| | |
| | | |
'wip/gc/aligned-block-allocation' into wip/gc/preparation
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This implements support for block group allocations which are aligned to
an integral number of blocks.
This will be used by the nonmoving garbage collector, which uses the
block allocator to allocate the segments which back its heap. These
segments are a fixed number of blocks in size, with each segment being
aligned to the segment size boundary. This allows us to easily find the
segment metadata stored at the beginning of the segment.
|
| | |
|
| |
| |
| |
| |
| | |
This were previously quite unclear and will change a bit under the
non-moving collector so let's clear this up now.
|
|/
|
|
|
|
|
| |
Namely ensure that block descriptors are initialized with valid
generation numbers.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
| |
|