summaryrefslogtreecommitdiff
path: root/includes/rts/storage/GC.h
Commit message (Collapse)AuthorAgeFilesLines
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-091-261/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* rts: Drop allocateExec and friendsBen Gamari2021-07-271-7/+0
| | | | All uses of these now use ExecPage.
* rts: Break up adjustor logicBen Gamari2021-07-271-3/+0
|
* Adds AArch64 Native Code GeneratorMoritz Angermann2021-06-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which we add a new code generator to the Glasgow Haskell Compiler. This codegen supports ELF and Mach-O targets, thus covering Linux, macOS, and BSDs in principle. It was tested only on macOS and Linux. The NCG follows a similar structure as the other native code generators we already have, and should therfore be realtively easy to follow. It supports most of the features required for a proper native code generator, but does not claim to be perfect or fully optimised. There are still opportunities for optimisations. Metric Decrease: ManyAlternatives ManyConstructors MultiLayerModules PmSeriesG PmSeriesS PmSeriesT PmSeriesV T10421 T10421a T10858 T11195 T11276 T11303b T11374 T11822 T12227 T12545 T12707 T13035 T13253 T13253-spj T13379 T13701 T13719 T14683 T14697 T15164 T15630 T16577 T17096 T17516 T17836 T17836b T17977 T17977b T18140 T18282 T18304 T18478 T18698a T18698b T18923 T1969 T3064 T5030 T5321FD T5321Fun T5631 T5642 T5837 T783 T9198 T9233 T9630 T9872d T9961 WWRec Metric Increase: T4801
* Allocate Adjustors and mark them readable in two stepsMoritz Angermann2021-03-291-1/+7
| | | | | | | | | This drops allocateExec for darwin, and replaces it with a alloc, write, mark executable strategy instead. This prevents us from trying to allocate an executable range and then write to it, which X^W will prohibit on darwin. This will *only* work if we can use mmap.
* rts: Add generic block traversal function, listAllBlocksMatthew Pickering2021-02-181-0/+3
| | | | | | | | | This function is exposed in the RtsAPI.h so that external users have a blessed way to traverse all the different `bdescr`s which are known by the RTS. The main motivation is to use this function in ghc-debug but avoid having to expose the internal structure of a Capability in the API.
* AArch64/arm64 adjustmentsMoritz Angermann2020-11-151-1/+1
| | | | | | | | This addes the necessary logic to support aarch64 on elf, as well as aarch64 on mach-o, which Apple calls arm64. We change architecture name to AArch64, which is the official arm naming scheme.
* rts: Introduce highMemDynamicGHC GitLab CI2020-11-111-0/+4
|
* rts/GC: Use atomicsBen Gamari2020-10-301-3/+3
|
* Zero out pinned block alignment slop when profilingDaniel Gröber2020-04-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heap profiler currently cannot traverse pinned blocks because of alignment slop. This used to just be a minor annoyance as the whole block is accounted into a special cost center rather than the respective object's CCS, cf. #7275. However for the new root profiler we would like to be able to visit _every_ closure on the heap. We need to do this so we can get rid of the current 'flip' bit hack in the heap traversal code. Since info pointers are always non-zero we can in principle skip all the slop in the profiler if we can rely on it being zeroed. This assumption caused problems in the past though, commit a586b33f8e ("rts: Correct handling of LARGE ARR_WORDS in LDV profiler"), part of !1118, tried to use the same trick for BF_LARGE objects but neglected to take into account that shrink*Array# functions don't ensure that slop is zeroed when not compiling with profiling. Later, commit 0c114c6599 ("Handle large ARR_WORDS in heap census (fix as we will only be assuming slop is zeroed when profiling is on. This commit also reduces the ammount of slop we introduce in the first place by calculating the needed alignment before doing the allocation for small objects where we know the next available address. For large objects we don't know how much alignment we'll have to do yet since those details are hidden behind the allocateMightFail function so there we continue to allocate the maximum additional words we'll need to do the alignment. So we don't have to duplicate all this logic in the cmm code we pull it into the RTS allocatePinned function instead. Metric Decrease: T7257 haddock.Cabal haddock.base
* rts: Implement concurrent collection in the nonmoving collectorBen Gamari2019-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the *first* mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
* rts/GC: Add an obvious assertion during block initializationÖmer Sinan Ağacan2019-10-181-0/+8
| | | | | | | Namely ensure that block descriptors are initialized with valid generation numbers. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* rts: Update some comments, minor refactoringÖmer Sinan Ağacan2018-06-271-1/+6
|
* rts: Add --internal-counters RTS flag and several countersDouglas Wilson2018-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing internal counters: * gc_alloc_block_sync * whitehole_spin * gen[g].sync * gen[1].sync are now not shown in the -s report unless --internal-counters is also passed. If --internal-counters is passed we now show the counters above, reformatted, as well as several other counters. In particular, we now count the yieldThread() calls that SpinLocks do as well as their spins. The added counters are: * gc_spin (spin and yield) * mut_spin (spin and yield) * whitehole_threadPaused (spin only) * whitehole_executeMessage (spin only) * whitehole_lockClosure (spin only) * waitForGcThreadsd (spin and yield) As well as the following, which are not SpinLock-like things: * any_work * do_work * scav_find_work See the Note for descriptions of what these counters are. We add busy_wait_nops in these loops along with the counter increment where it was absent. Old internal counters output: ``` gc_alloc_block_sync: 0 whitehole_gc_spin: 0 gen[0].sync: 0 gen[1].sync: 0 ``` New internal counters output: ``` Internal Counters: Spins Yields gc_alloc_block_sync 323 0 gc_spin 9016713 752 mut_spin 57360944 47716 whitehole_gc 0 n/a whitehole_threadPaused 0 n/a whitehole_executeMessage 0 n/a whitehole_lockClosure 0 0 waitForGcThreads 2 415 gen[0].sync 6 0 gen[1].sync 1 0 any_work 2017 no_work 2014 scav_find_work 1004 ``` Test Plan: ./validate Check it builds with #define PROF_SPIN removed from includes/rts/Config.h Reviewers: bgamari, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #3553, #9221 Differential Revision: https://phabricator.haskell.org/D4302
* rts: Throw proper HeapOverflow exception on allocating large arrayBen Gamari2017-09-261-2/+3
| | | | | | | | | | | | Test Plan: Validate, add tests Reviewers: simonmar, austin, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4021
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-231-4/+1
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* Overhaul GC statsSimon Marlow2016-12-061-55/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Visible API changes: * The C struct `GCDetails` gives the stats about a single GC. This is passed to the `gcDone()` callback if one is set via the RtsConfig. (previously we just passed a collection of values, so this is more extensible, at the expense of breaking the existing API) * `RTSStats` gives cumulative stats since the start of the program, and includes the `GCDetails` for the most recent GC. This struct can be obtained via `getRTSStats()` (the old `getGCStats()` has been removed, and `getGCStatsEnabled()` has been renamed to `getRTSStatsEnabled()`) Improvements: * The per-GC stats and cumulative stats are now cleanly separated. * Inside the RTS we have a top-level `RTSStats` struct to keep all our stats in, previously this was just a collection of strangely-named variables. This struct is mostly just copied in `getRTSStats()`, so the implementation of that function is a lot shorter. * Types are more consistent. We use a uint64_t byte count for all memory values, and Time for all time values. * Names are more consistent. We use a suffix `_bytes` for all byte counts and `_ns` for all time values. * We now collect information about the amount of memory in large objects and compact objects in `GCDetails`. (the latter was the reason I started doing this patch but it seems to have ballooned a bit!) * I fixed a bug in the calculation of the elapsed MUT time, and added an ASSERT to stop the calculations going wrong in the future. For now I kept the Haskell API in `GHC.Stats` the same, by impedence-matching with the new API. We could either break that API and make it match the C API more closely, or we could add a new API and deprecate the old one. Opinions welcome. This stuff is very easy to get wrong, and it's hard to test. Reviews welcome! Test Plan: manual testing validate Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2756
* Use C99's boolBen Gamari2016-11-291-2/+2
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Add mblocks_allocated to GC stats APIBartosz Nitka2016-07-271-0/+1
| | | | | | | | | | | | | | This exposes mblocks_allocated in the GCStats struct. Test Plan: it builds Reviewers: bgamari, simonmar, austin, hvr, erikd Reviewed By: erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2429
* Compact RegionsGiovanni Campagna2016-07-201-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
* Rts flags cleanupSimon Marlow2016-06-101-23/+17
| | | | | | | | * Remove unused/old flags from the structs * Update old comments * Add missing flags to GHC.RTS * Simplify GHC.RTS, remove C code and use hsc2hs instead * Make ParFlags unconditional, and add support to GHC.RTS
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-4/+4
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Fix for CAF retention when dynamically loading & unloading codeSimon Marlow2015-06-081-2/+4
| | | | | | | | | | | | | In a situaion where we have some statically-linked code and we want to load and unload a series of objects, we need the CAFs in the statically-linked code to be retained indefinitely, while the CAFs in the dynamically-linked code should be GC'd as normal, so that we can detect when the code is unloadable. This was wrong before - we GC'd CAFs in the static code, leading to a crash in the rare case where we use a CAF, GC it, and then load a new object that uses it again. I also did some tidy up: RtsConfig now has a field keep_cafs to indicate whether we want CAFs to be retained in static code.
* Fix comments (#8254)Simon Marlow2014-12-151-3/+1
|
* Make clearNursery freeSimon Marlow2014-11-251-0/+21
| | | | | | | | | | | | | | | | | | | | | Summary: clearNursery resets all the bd->free pointers of nursery blocks to make the blocks empty. In profiles we've seen clearNursery taking significant amounts of time particularly with large -N and -A values. This patch moves the work of clearNursery to the point at which we actually need the new block, thereby introducing an invariant that blocks to the right of the CurrentNursery pointer still need their bd->free pointer reset. This should make things faster overall, because we don't need to clear blocks that we don't use. Test Plan: validate Reviewers: AndreasVoellmy, ezyang, austin Subscribers: thomie, carter, ezyang, simonmar Differential Revision: https://phabricator.haskell.org/D318
* [ci skip] includes: detabify/dewhitespace rts/storage/GC.hAustin Seipp2014-08-201-27/+27
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Move the allocation of CAF blackholes into 'newCAF' (#8590)Patrick Palka2013-12-041-2/+2
| | | | | | | | | | We now do the allocation of the blackhole indirection closure inside the RTS procedure 'newCAF' instead of generating the allocation code inline in the closure body of each CAF. This slightly decreases code size in modules with a lot of CAFs. As a result of this change, for example, the size of DynFlags.o drops by ~60KB and HsExpr.o by ~100KB.
* GHCi: Properly generate jump code for ARM (#8380)Austin Seipp2013-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | This adds code for jumping to given addresses for ARM, written by Ben Gamari. However, when allocating new infotables for bytecode (which is where this jump code occurs), we need to be sure to flush the cache on the execute pointer returned from allocateExec() - on systems like ARM, the processor won't reliably read back code or automatically cache flush, where x86 will. So we add a new flushExec primitive to call out to GCC's __builtin___clear_cache primitive, which will properly generate the correct code (nothing on x86, and a call to libgcc's __clear_cache on ARM) and make sure we use it after writing the code out. Authored-by: Ben Gamari <bgamari.foss@gmail.com> Authored-by: Austin Seipp <austin@well-typed.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* In the DEBUG rts, track when CAFs are GC'dSimon Marlow2013-11-211-2/+2
| | | | | | | | | | | | This resurrects some old code and makes it work again. The idea is that we want to get an error message if we ever enter a CAF that has been GC'd, rather than following its indirection which will likely cause a segfault. Without this patch, these bugs are hard to track down in gdb, because the IND_STATIC code overwrites R1 (the pointer to the CAF) with its indirectee before jumping into bad memory, so we've lost the address of the CAF that got GC'd. Some associated refactoring while I was here.
* Fix freeHaskellFunPtr crash on iOS.Austin Seipp2013-09-151-0/+3
| | | | | Authored-by: Stephen Blackheath <...@blacksapphire.com> Signed-off-by: Austin Seipp <austin@well-typed.com>
* Revert "Default to infinite stack size (#8189)"Austin Seipp2013-09-081-10/+3
| | | | This reverts commit d85044f6b201eae0a9e453b89c0433608e0778f0.
* Default to infinite stack size (#8189)Austin Seipp2013-09-081-3/+10
| | | | | | | | | | | | | When servicing a stack overflows, only throw an exception to the given thread if the user explicitly set a max stack size, using +RTS -K. Otherwise just service it normally and grow the stack. In case we actually run out of *heap* (stack chuncks are allocated on the heap), then we need to bail by calling the stackOverflow() hook and exit immediately. Authored-by: Ben Gamari <bgamari.foss@gmail.com> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Maintain per-generation lists of weak pointers (#7847)Takano Akio2013-06-151-0/+3
|
* use libffi for iOS adjustors; fixes #7718Ian Lynagh2013-06-081-2/+5
| | | | Based on a patch from Stephen Blackheath.
* Simplify the allocation stats accountingSimon Marlow2013-02-141-1/+1
| | | | | | | | | | | We were doing it in two different ways and asserting that the results were the same. In most cases they were, but I found one case where they weren't: the GC itself allocates some memory for running finalizers, and this memory was accounted for one way but not the other. It was simpler to remove the old way of counting allocation that to try to fix it up, so I did that.
* Cache the result of countOccupied(gen->large_objects) as gen->n_large_words ↵Simon Marlow2012-09-211-0/+1
| | | | | | | | | (#7257) The program in #7257 was spending 90% of its time counting the live data in gen->large_objects. We already avoid doing this for small objects, but in this example the old generation was full of large objects (actually pinned ByteStrings).
* Lots of nat -> StgWord changesSimon Marlow2012-09-071-4/+4
|
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-2/+2
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* Add getGCStatsEnabled function.Paolo Capriotti2012-06-191-0/+1
|
* Change the presentation of parallel GC work balance in +RTS -sDuncan Coutts2012-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | Also rename internal variables to make the names match what they hold. The parallel GC work balance is calculated using the total amount of memory copied by all GC threads, and the maximum copied by any individual thread. You have serial GC when the max is the same as copied, and perfectly balanced GC when total/max == n_caps. Previously we presented this as the ratio total/max and told users that the serial value was 1 and the ideal value N, for N caps, e.g. Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2) The downside of this is that the user always has to keep in mind the number of cores being used. Our new presentation uses a normalised scale 0--1 as a percentage. The 0% means completely serial and 100% is perfect balance, e.g. Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
* use (GHC) idiomatic typesGabor Greif2012-01-091-4/+4
|
* make CAFs atomic, to fix #5558Simon Marlow2011-10-171-2/+2
| | | | See Note [atomic CAFs] in rts/sm/Storage.c
* Also include basic time statistics in GCStats.Edward Z. Yang2011-08-061-0/+2
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Implement public interface for GC statistics.Edward Z. Yang2011-07-301-0/+44
| | | | | | | | | | | We add a new RTS flag -T for collecting statistics but not giving any new inputs. There is one new struct in rts/storage/GC.h: GCStats. We add two new global counters current_residency and current_slop, which are useful for in-program GC statistics. See GHC.Stats in base for a Haskell interface to this functionality. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* fix an integer overflow (#5086), and pre-emptively avoid more of theseSimon Marlow2011-05-251-9/+17
| | | | in the future.
* GC refactoring and cleanupSimon Marlow2011-02-021-4/+1
| | | | | | | | | Now we keep any partially-full blocks in the gc_thread[] structs after each GC, rather than moving them to the generation. This should give us slightly better locality (though I wasn't able to measure any difference). Also in this patch: better sanity checking with THREADED.
* A small GC optimisationSimon Marlow2011-02-021-3/+3
| | | | | | Store the *number* of the destination generation in the Bdescr struct, so that in evacuate() we don't have to deref gen to get it. This is another improvement ported over from my GC branch.
* Remove the per-generation mutable listsSimon Marlow2011-02-021-3/+0
| | | | Now that we use the per-capability mutable lists exclusively.
* Count allocations more accuratelySimon Marlow2010-12-211-2/+3
| | | | | | | | | | | The allocation stats (+RTS -s etc.) used to count the slop at the end of each nursery block (except the last) as allocated space, now we count the allocated words accurately. This should make allocation figures more predictable, too. This has the side effect of reducing the apparent allocations by a small amount (~1%), so remember to take this into account when looking at nofib results.
* Make getAllocations() visibleSimon Marlow2010-06-171-0/+7
|