summaryrefslogtreecommitdiff
path: root/includes/rts/storage
Commit message (Collapse)AuthorAgeFilesLines
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-0910-2700/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* Clean up whitespace in /includesJohn Ericson2021-08-091-9/+9
| | | | I need to do this now or when I move these files the linter will be mad.
* rts: Fix use of sized array in Heap.hBen Gamari2021-08-091-2/+6
| | | | | | Sized arrays cannot be used in headers that might be imported from C++. Fixes #20199.
* rts: Fix inconsistent signatures for collect_pointersBen Gamari2021-07-271-1/+1
| | | | Fixes #20160.
* rts: Drop allocateExec and friendsBen Gamari2021-07-271-7/+0
| | | | All uses of these now use ExecPage.
* rts: Break up adjustor logicBen Gamari2021-07-271-3/+0
|
* Adds AArch64 Native Code GeneratorMoritz Angermann2021-06-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which we add a new code generator to the Glasgow Haskell Compiler. This codegen supports ELF and Mach-O targets, thus covering Linux, macOS, and BSDs in principle. It was tested only on macOS and Linux. The NCG follows a similar structure as the other native code generators we already have, and should therfore be realtively easy to follow. It supports most of the features required for a proper native code generator, but does not claim to be perfect or fully optimised. There are still opportunities for optimisations. Metric Decrease: ManyAlternatives ManyConstructors MultiLayerModules PmSeriesG PmSeriesS PmSeriesT PmSeriesV T10421 T10421a T10858 T11195 T11276 T11303b T11374 T11822 T12227 T12545 T12707 T13035 T13253 T13253-spj T13379 T13701 T13719 T14683 T14697 T15164 T15630 T16577 T17096 T17516 T17836 T17836b T17977 T17977b T18140 T18282 T18304 T18478 T18698a T18698b T18923 T1969 T3064 T5030 T5321FD T5321Fun T5631 T5642 T5837 T783 T9198 T9233 T9630 T9872d T9961 WWRec Metric Increase: T4801
* Allocate Adjustors and mark them readable in two stepsMoritz Angermann2021-03-291-1/+7
| | | | | | | | | This drops allocateExec for darwin, and replaces it with a alloc, write, mark executable strategy instead. This prevents us from trying to allocate an executable range and then write to it, which X^W will prohibit on darwin. This will *only* work if we can use mmap.
* rts: Add generic block traversal function, listAllBlocksMatthew Pickering2021-02-181-0/+3
| | | | | | | | | This function is exposed in the RtsAPI.h so that external users have a blessed way to traverse all the different `bdescr`s which are known by the RTS. The main motivation is to use this function in ghc-debug but avoid having to expose the internal structure of a Capability in the API.
* rts: TraverseHeap: Simplify profiling headerDaniel Gröber2021-02-171-14/+5
| | | | | Having a union in the closure profiling header really just complicates things so get back to basics, we just have a single StgWord there for now.
* ThreadPaused: Don't zero slop until free vars are pushedGHC GitLab CI2020-11-291-0/+4
| | | | | | | | When threadPaused blackholes a thunk it calls `OVERWRITING_CLOSURE` to zero the slop for the benefit of the sanity checker. Previously this was done *before* pushing the thunk's free variables to the update remembered set. Consequently we would pull zero'd pointers to the update remembered set.
* AArch64/arm64 adjustmentsMoritz Angermann2020-11-151-1/+1
| | | | | | | | This addes the necessary logic to support aarch64 on elf, as well as aarch64 on mach-o, which Apple calls arm64. We change architecture name to AArch64, which is the official arm naming scheme.
* rts: Introduce highMemDynamicGHC GitLab CI2020-11-111-0/+4
|
* Add code comments for StgInfoTable and StgStack structsDavid Eichmann2020-11-102-2/+19
|
* ghc-heap: expose decoding from heap representationDavid Eichmann2020-11-101-0/+9
| | | | | | Co-authored-by: Sven Tennie <sven.tennie@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com> Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
* Merge branch 'wip/tsan/stm' into wip/tsan/allBen Gamari2020-11-011-3/+3
|\
| * rts/stm: Strengthen orderings to SEQ_CST instead of volatilewip/tsan/stmBen Gamari2020-10-241-3/+3
| | | | | | | | | | | | | | | | | | | | Previously the `current_value`, `first_watch_queue_entry`, and `num_updates` fields of `StgTVar` were marked as `volatile` in an attempt to provide strong ordering. Of course, this isn't sufficient. We now use proper atomic operations. In most of these cases I strengthen the ordering all the way to SEQ_CST although it's possible that some could be weakened with some thought.
* | rts/GC: Use atomicsBen Gamari2020-10-301-3/+3
|/
* rts: Introduce SET_HDR_RELEASEwip/tsan/prepBen Gamari2020-10-241-2/+8
| | | | | Also ensure that we also store the info table pointer last to ensure that the synchronization covers all stores.
* rts/ClosureMaros: Use relaxed atomicsBen Gamari2020-10-241-10/+13
|
* winio: Add IOPort synchronization primitiveTamar Christina2020-07-151-0/+1
|
* rts: Remove unused GET_ENTRY closure macroÖmer Sinan Ağacan2020-06-101-2/+0
| | | | | This macro is not used and got broken in the meantime, as ENTRY_CODE was deleted.
* Always zero shrunk mutable array slop when profilingDaniel Gröber2020-06-011-5/+26
| | | | | | | | | When shrinking arrays in the profiling way we currently don't always zero the leftover slop. This means we can't traverse such closures in the heap profiler. The old Note [zeroing slop] and #8402 have some rationale for why this is so but I belive the reasoning doesn't apply to mutable closures. There users already have to ensure multiple threads don't step on each other's toes so zeroing should be safe.
* Fix OVERWRITING_CLOSURE assuming closures are not inherently usedDaniel Gröber2020-06-011-21/+17
| | | | | | The new ASSERT in LDV_recordDead() was being tripped up by MVars when removeFromMVarBlockedQueue() calls OVERWRITING_CLOSURE() via OVERWRITE_INFO().
* Cleanup OVERWRITING_CLOSURE logicDaniel Gröber2020-06-011-32/+49
| | | | | | | | | | | The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
* rts: Assert LDV_recordDead is not called for inherently used closuresDaniel Gröber2020-04-141-0/+7
| | | | | The comments make it clear LDV_recordDead should not be called for inhererently used closures, so add an assertion to codify this fact.
* rts: Fix nomenclature in OVERWRITING_CLOSURE macrosDaniel Gröber2020-04-141-15/+16
| | | | | | | | | | The additional commentary introduced by commit 8916e64e5437 ("Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.") unfortunately got this wrong. We set 'prim' to true in overwritingClosureOfs because we _don't_ want to call LDV_recordDead(). The reason is because of this "inherently used" distinction made in the LDV profiler so I rename the variable to be more appropriate.
* rts: Underline some Notes as is conventionalDaniel Gröber2020-04-141-0/+1
|
* rts: Expand and add more notes regarding slopDaniel Gröber2020-04-141-16/+23
|
* Zero out pinned block alignment slop when profilingDaniel Gröber2020-04-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
* Fix CNF handling in compacting GCÖmer Sinan Ağacan2020-04-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #17937 Previously compacting GC simply ignored CNFs. This is mostly fine as most (see "What about small compacts?" below) CNF objects don't have outgoing pointers, and are "large" (allocated in large blocks) and large objects are not moved or compacted. However if we do GC *during* sharing-preserving compaction then the CNF will have a hash table mapping objects that have been moved to the CNF to their location in the CNF, to be able to preserve sharing. This case is handled in the copying collector, in `scavenge_compact`, where we evacuate hash table entries and then rehash the table. Compacting GC ignored this case. We now visit CNFs in all generations when threading pointers to the compacted heap and thread hash table keys. A visited CNF is added to the list `nfdata_chain`. After compaction is done, we re-visit the CNFs in that list and rehash the tables. The overhead is minimal: the list is static in `Compact.c`, and link field is added to `StgCompactNFData` closure. Programs that don't use CNFs should not be affected. To test this CNF tests are now also run in a new way 'compacting_gc', which just passes `-c` to the RTS, enabling compacting GC for the oldest generation. Before this patch the result would be: Unexpected failures: compact_gc.run compact_gc [bad exit code (139)] (compacting_gc) compact_huge_array.run compact_huge_array [bad exit code (1)] (compacting_gc) With this patch all tests pass. I can also pass `-c -DS` without any failures. What about small compacts? Small CNFs are still not handled by the compacting GC. However so far I'm unable to write a test that triggers a runtime panic ("update_fwd: unknown/strange object") by allocating a small CNF in a compated heap. It's possible that I'm missing something and it's not possible to have a small CNF. NoFib Results: -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS +0.1% 0.0% 0.0% +0.0% +0.0% CSD +0.1% 0.0% 0.0% 0.0% 0.0% FS +0.1% 0.0% 0.0% 0.0% 0.0% S +0.1% 0.0% 0.0% 0.0% 0.0% VS +0.1% 0.0% 0.0% 0.0% 0.0% VSD +0.1% 0.0% +0.0% +0.0% -0.0% VSM +0.1% 0.0% +0.0% -0.0% 0.0% anna +0.0% 0.0% -0.0% -0.0% -0.0% ansi +0.1% 0.0% +0.0% +0.0% +0.0% atom +0.1% 0.0% +0.0% +0.0% +0.0% awards +0.1% 0.0% +0.0% +0.0% +0.0% banner +0.1% 0.0% +0.0% +0.0% +0.0% bernouilli +0.1% 0.0% 0.0% -0.0% +0.0% binary-trees +0.1% 0.0% -0.0% -0.0% 0.0% boyer +0.1% 0.0% +0.0% +0.0% +0.0% boyer2 +0.1% 0.0% +0.0% +0.0% +0.0% bspt +0.1% 0.0% -0.0% -0.0% -0.0% cacheprof +0.1% 0.0% -0.0% -0.0% -0.0% calendar +0.1% 0.0% +0.0% +0.0% +0.0% cichelli +0.1% 0.0% +0.0% +0.0% +0.0% circsim +0.1% 0.0% +0.0% +0.0% +0.0% clausify +0.1% 0.0% -0.0% +0.0% +0.0% comp_lab_zift +0.1% 0.0% +0.0% +0.0% +0.0% compress +0.1% 0.0% +0.0% +0.0% 0.0% compress2 +0.1% 0.0% -0.0% 0.0% 0.0% constraints +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm1 +0.1% 0.0% +0.0% +0.0% +0.0% cryptarithm2 +0.1% 0.0% +0.0% +0.0% +0.0% cse +0.1% 0.0% +0.0% +0.0% +0.0% digits-of-e1 +0.1% 0.0% +0.0% -0.0% -0.0% digits-of-e2 +0.1% 0.0% -0.0% -0.0% -0.0% dom-lt +0.1% 0.0% +0.0% +0.0% +0.0% eliza +0.1% 0.0% +0.0% +0.0% +0.0% event +0.1% 0.0% +0.0% +0.0% +0.0% exact-reals +0.1% 0.0% +0.0% +0.0% +0.0% exp3_8 +0.1% 0.0% +0.0% -0.0% 0.0% expert +0.1% 0.0% +0.0% +0.0% +0.0% fannkuch-redux +0.1% 0.0% -0.0% 0.0% 0.0% fasta +0.1% 0.0% -0.0% +0.0% +0.0% fem +0.1% 0.0% -0.0% +0.0% 0.0% fft +0.1% 0.0% -0.0% +0.0% +0.0% fft2 +0.1% 0.0% +0.0% +0.0% +0.0% fibheaps +0.1% 0.0% +0.0% +0.0% +0.0% fish +0.1% 0.0% +0.0% +0.0% +0.0% fluid +0.0% 0.0% +0.0% +0.0% +0.0% fulsom +0.1% 0.0% -0.0% +0.0% 0.0% gamteb +0.1% 0.0% +0.0% +0.0% 0.0% gcd +0.1% 0.0% +0.0% +0.0% +0.0% gen_regexps +0.1% 0.0% -0.0% +0.0% 0.0% genfft +0.1% 0.0% +0.0% +0.0% +0.0% gg +0.1% 0.0% 0.0% +0.0% +0.0% grep +0.1% 0.0% -0.0% +0.0% +0.0% hidden +0.1% 0.0% +0.0% -0.0% 0.0% hpg +0.1% 0.0% -0.0% -0.0% -0.0% ida +0.1% 0.0% +0.0% +0.0% +0.0% infer +0.1% 0.0% +0.0% 0.0% -0.0% integer +0.1% 0.0% +0.0% +0.0% +0.0% integrate +0.1% 0.0% -0.0% -0.0% -0.0% k-nucleotide +0.1% 0.0% +0.0% +0.0% 0.0% kahan +0.1% 0.0% +0.0% +0.0% +0.0% knights +0.1% 0.0% -0.0% -0.0% -0.0% lambda +0.1% 0.0% +0.0% +0.0% -0.0% last-piece +0.1% 0.0% +0.0% 0.0% 0.0% lcss +0.1% 0.0% +0.0% +0.0% 0.0% life +0.1% 0.0% -0.0% +0.0% +0.0% lift +0.1% 0.0% +0.0% +0.0% +0.0% linear +0.1% 0.0% -0.0% +0.0% 0.0% listcompr +0.1% 0.0% +0.0% +0.0% +0.0% listcopy +0.1% 0.0% +0.0% +0.0% +0.0% maillist +0.1% 0.0% +0.0% -0.0% -0.0% mandel +0.1% 0.0% +0.0% +0.0% 0.0% mandel2 +0.1% 0.0% +0.0% +0.0% +0.0% mate +0.1% 0.0% +0.0% 0.0% +0.0% minimax +0.1% 0.0% -0.0% 0.0% -0.0% mkhprog +0.1% 0.0% +0.0% +0.0% +0.0% multiplier +0.1% 0.0% +0.0% 0.0% 0.0% n-body +0.1% 0.0% +0.0% +0.0% +0.0% nucleic2 +0.1% 0.0% +0.0% +0.0% +0.0% para +0.1% 0.0% 0.0% +0.0% +0.0% paraffins +0.1% 0.0% +0.0% -0.0% 0.0% parser +0.1% 0.0% -0.0% -0.0% -0.0% parstof +0.1% 0.0% +0.0% +0.0% +0.0% pic +0.1% 0.0% -0.0% -0.0% 0.0% pidigits +0.1% 0.0% +0.0% -0.0% -0.0% power +0.1% 0.0% +0.0% +0.0% +0.0% pretty +0.1% 0.0% -0.0% -0.0% -0.1% primes +0.1% 0.0% -0.0% -0.0% -0.0% primetest +0.1% 0.0% -0.0% -0.0% -0.0% prolog +0.1% 0.0% -0.0% -0.0% -0.0% puzzle +0.1% 0.0% -0.0% -0.0% -0.0% queens +0.1% 0.0% +0.0% +0.0% +0.0% reptile +0.1% 0.0% -0.0% -0.0% +0.0% reverse-complem +0.1% 0.0% +0.0% 0.0% -0.0% rewrite +0.1% 0.0% -0.0% -0.0% -0.0% rfib +0.1% 0.0% +0.0% +0.0% +0.0% rsa +0.1% 0.0% -0.0% +0.0% -0.0% scc +0.1% 0.0% -0.0% -0.0% -0.1% sched +0.1% 0.0% +0.0% +0.0% +0.0% scs +0.1% 0.0% +0.0% +0.0% +0.0% simple +0.1% 0.0% -0.0% -0.0% -0.0% solid +0.1% 0.0% +0.0% +0.0% +0.0% sorting +0.1% 0.0% -0.0% -0.0% -0.0% spectral-norm +0.1% 0.0% +0.0% +0.0% +0.0% sphere +0.1% 0.0% -0.0% -0.0% -0.0% symalg +0.1% 0.0% -0.0% -0.0% -0.0% tak +0.1% 0.0% +0.0% +0.0% +0.0% transform +0.1% 0.0% +0.0% +0.0% +0.0% treejoin +0.1% 0.0% +0.0% -0.0% -0.0% typecheck +0.1% 0.0% +0.0% +0.0% +0.0% veritas +0.0% 0.0% +0.0% +0.0% +0.0% wang +0.1% 0.0% 0.0% +0.0% +0.0% wave4main +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve1 +0.1% 0.0% +0.0% +0.0% +0.0% wheel-sieve2 +0.1% 0.0% +0.0% +0.0% +0.0% x2n1 +0.1% 0.0% +0.0% +0.0% +0.0% -------------------------------------------------------------------------------- Min +0.0% 0.0% -0.0% -0.0% -0.1% Max +0.1% 0.0% +0.0% +0.0% +0.0% Geometric Mean +0.1% -0.0% -0.0% -0.0% -0.0% Bumping numbers of nonsensical perf tests: Metric Increase: T12150 T12234 T12425 T13035 T5837 T6048 It's simply not possible for this patch to increase allocations, and I've wasted enough time on these test in the past (see #17686). I think these tests should not be perf tests, but for now I'll bump the numbers.
* Document and refactor a few things around bitmap scavengingÖmer Sinan Ağacan2020-02-291-0/+6
| | | | | | | | - Added a few comments in StgPAP - Added a few comments and assertions in scavenge_small_bitmap and walk_large_bitmap - Did tiny refactor in GHC.Data.Bitmap: added some comments, deleted dead code, used PlatformWordSize type.
* Make Block.h compile with c++ compilersMatthew Pickering2020-01-271-4/+9
|
* Changing Thread IDs from 32 bits to 64 bits.Roland Zumkeller2019-11-191-2/+2
|
* nonmoving: Use correct info table pointer accessorBen Gamari2019-11-191-14/+0
| | | | | | | | | | | Previously we used INFO_PTR_TO_STRUCT instead of THUNK_INFO_PTR_TO_STRUCT when looking at a thunk. These two happen to be equivalent on 64-bit architectures due to alignment considerations however they are different on 32-bit platforms. This lead to #17487. To fix this we also employ a small optimization: there is only one thunk of type WHITEHOLE (namely stg_WHITEHOLE_info). Consequently, we can just use a plain pointer comparison instead of testing against info->type.
* Implement shrinkSmallMutableArray# and resizeSmallMutableArray#.Andrew Martin2019-10-261-3/+10
| | | | | | | | | | | | | | | | | | | | | This is a part of GHC Proposal #25: "Offer more array resizing primitives". Resources related to the proposal: - Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/121 - Proposal: https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0025-resize-boxed.rst Only shrinkSmallMutableArray# is implemented as a primop since a library-space implementation of resizeSmallMutableArray# (in GHC.Exts) is no less efficient than a primop would be. This may be replaced by a primop in the future if someone devises a strategy for growing arrays in-place. The library-space implementation always copies the array when growing it. This commit also tweaks the documentation of the deprecated sizeofMutableByteArray#, removing the mention of concurrency. That primop is unsound even in single-threaded applications. Additionally, the non-negativity assertion on the existing shrinkMutableByteArray# primop has been removed since this predicate is trivially always true.
*-. Merge branches 'wip/gc/segment-header-to-bdescr' and 'wip/gc/docs' into ↵wip/gc/everything2Ben Gamari2019-10-222-11/+20
|\ \ | | | | | | | | | wip/gc/everything2
| | * NonMoving: Add summarizing Notewip/gc/docsBen Gamari2019-10-221-0/+3
| |/ |/|
| * NonMoving: Move next_free_snap to block descriptorwip/gc/segment-header-to-bdescrBen Gamari2019-10-221-0/+1
| |
| * NonMoving: Move block size to block descriptorBen Gamari2019-10-221-11/+16
|/
* Fix unregisterised buildwip/gc/nonmoving-concurrentBen Gamari2019-10-221-1/+1
| | | | | | This required some fiddling around with the location of forward declarations since the C sources generated by GHC's C backend only includes Stg.h.
* rts: Shrink size of STACK's dirty and marking fieldsBen Gamari2019-10-201-2/+2
|
* rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-203-2/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* rts: Non-concurrent mark and sweepÖmer Sinan Ağacan2019-10-201-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the core heap structure and a serial mark/sweep collector which can be used to manage the oldest-generation heap. This is the first step towards a concurrent mark-and-sweep collector aimed at low-latency applications. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) The basic heap structure used in this design is heavily inspired by K. Ueno & A. Ohori. "A fully concurrent garbage collector for functional programs on multicore processors." /ACM SIGPLAN Notices/ Vol. 51. No. 9 (presented by ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). The mark queue is a fairly straightforward chunked-array structure. The representation is a bit more verbose than a typical mark queue to accomodate a combination of two features: * a mark FIFO, which improves the locality of marking, reducing one of the major overheads seen in mark/sweep allocators (see [1] for details) * the selector optimization and indirection shortcutting, which requires that we track where we found each reference to an object in case we need to update the reference at a later point (e.g. when we find that it is an indirection). See Note [Origin references in the nonmoving collector] (in `NonMovingMark.h`) for details. Beyond this the mark/sweep is fairly run-of-the-mill. [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for Mark-Sweep Garbage Collection." ISMM 2007. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*-. Merge branches 'wip/gc/sync-without-capability' and ↵wip/gc/preparationBen Gamari2019-10-201-0/+7
|\ \ | | | | | | | | | 'wip/gc/aligned-block-allocation' into wip/gc/preparation
| | * rts/BlockAlloc: Allow aligned allocation requestswip/gc/aligned-block-allocationÖmer Sinan Ağacan2019-10-181-0/+7
| |/ | | | | | | | | | | | | | | | | | | | | This implements support for block group allocations which are aligned to an integral number of blocks. This will be used by the nonmoving garbage collector, which uses the block allocator to allocate the segments which back its heap. These segments are a fixed number of blocks in size, with each segment being aligned to the segment size boundary. This allows us to easily find the segment metadata stored at the beginning of the segment.
* | rts: Fix macro parenthesisationwip/gc/misc-rtsBen Gamari2019-10-181-1/+1
| |
* | rts: Give stack flags proper macrosBen Gamari2019-10-181-0/+5
| | | | | | | | | | This were previously quite unclear and will change a bit under the non-moving collector so let's clear this up now.
* | rts/GC: Add an obvious assertion during block initializationÖmer Sinan Ağacan2019-10-181-0/+8
|/ | | | | | | Namely ensure that block descriptors are initialized with valid generation numbers. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* rts: Add note reference to SET_PROF_HDR for profiling 'flip' bitDaniel Gröber2019-09-221-0/+2
|