| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Metric Increase:
T11276
|
|
|
|
|
|
|
|
|
| |
This drops allocateExec for darwin, and replaces it with
a alloc, write, mark executable strategy instead. This prevents
us from trying to allocate an executable range and then write to
it, which X^W will prohibit on darwin.
This will *only* work if we can use mmap.
|
|
|
|
|
|
| |
tuples and sums.
fixes #1257
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to #19381 #19359 #14702
After a spike in memory usage we have been conservative about returning
allocated blocks to the OS in case we are still allocating a lot and would
end up just reallocating them. The result of this was that up to 4 * live_bytes
of blocks would be retained once they were allocated even if memory usage ended up
a lot lower.
For a heap of size ~1.5G, this would result in OS memory reporting 6G which is
both misleading and worrying for users.
In long-lived server applications this results in consistent high memory
usage when the live data size is much more reasonable (for example ghcide)
Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes
of blocks before gradually returning uneeded memory back to the OS on subsequent
major GCs which are NOT caused by a heap overflow.
Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs
counter and the amount of memory which is retained is inversely proportional to this number.
By default the excess memory retained is
oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor)
On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0
(as we could continue to allocate more, so retaining all the memory might make sense).
Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower.
Smaller values make it get returned faster. Setting `-Fd0` disables the
memory return completely, which is the behaviour of older GHC versions.
The default is `-Fd4` which results in the following scaling:
> mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]]
(1.0,0.8408964152537146)
(2.0,0.7071067811865475)
(3.0,0.5946035575013605)
(4.0,0.5)
(5.0,0.4204482076268573)
(6.0,0.35355339059327373)
(7.0,0.29730177875068026)
(8.0,0.25)
(9.0,0.21022410381342865)
(10.0,0.17677669529663687)
(11.0,0.14865088937534013)
(12.0,0.125)
(13.0,0.10511205190671433)
(14.0,8.838834764831843e-2)
(15.0,7.432544468767006e-2)
(16.0,6.25e-2)
(17.0,5.255602595335716e-2)
(18.0,4.4194173824159216e-2)
(19.0,3.716272234383503e-2)
(20.0,3.125e-2)
So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained.
Further to this decay factor, the amount of memory we attempt to retain is
also influenced by the GC strategy for the oldest generation. If we are using
a copying strategy then we will need at least 2 * live_bytes for copying to take
place, so we always keep that much. If using compacting or nonmoving then we need a lower number,
so we just retain at least `1.2 * live_bytes` for some protection.
In future we might want to make this behaviour more aggressive, some
relevant literature is
> Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106
which describes the "memory reducer" in the V8 javascript engine which
on an idle collection immediately returns as much memory as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BLOCKS_SIZE event reports the size of the currently allocated blocks
in bytes.
It is like the HEAP_SIZE event, but reports about the blocks rather than
megablocks.
You can work out the current heap fragmentation by looking at the
difference between HEAP_SIZE and BLOCKS_SIZE.
Fixes #19357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See #19357
The event reports the
* Current number of megablocks allocated
* The number that the RTS thinks it needs
* The number is managed to return to the OS
When current > need then the difference is returned to the OS, the
successful number of returned mblocks is reported by 'returned'.
In a fragmented heap current > need but returned < current - need.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new flag embeds a lookup table from the address of an info table
to information about that info table.
The main interface for consulting the map is the `lookupIPE` C function
> InfoProvEnt * lookupIPE(StgInfoTable *info)
The `InfoProvEnt` has the following structure:
> typedef struct InfoProv_{
> char * table_name;
> char * closure_desc;
> char * ty_desc;
> char * label;
> char * module;
> char * srcloc;
> } InfoProv;
>
> typedef struct InfoProvEnt_ {
> StgInfoTable * info;
> InfoProv prov;
> struct InfoProvEnt_ *link;
> } InfoProvEnt;
The source positions are approximated in a similar way to the source
positions for DWARF debugging information. They are only approximate but
in our experience provide a good enough hint about where the problem
might be. It is therefore recommended to use this flag in conjunction
with `-g<n>` for more accurate locations.
The lookup table is also emitted into the eventlog when it is available
as it is intended to be used with the `-hi` profiling mode.
Using this flag will significantly increase the size of the resulting
object file but only by a factor of 2-3x in our experience.
|
|
|
|
|
|
|
|
| |
This profiling mode creates bands by the address of the info table for
each closure. This provides a much more fine-grained profiling output
than any of the other profiling modes.
The `-hi` profiling mode does not require a profiling build.
|
|
|
|
|
|
|
|
|
|
| |
This patch exposes three new functions in `GHC.Profiling` which allow
heap profiling to be enabled and disabled dynamically.
1. startHeapProfTimer - Starts heap profiling with the given RTS options
2. stopHeapProfTimer - Stops heap profiling
3. requestHeapCensus - Perform a heap census on the next context
switch, regardless of whether the timer is enabled or not.
|
| |
|
|
|
|
|
|
|
| |
It should be left to tooling to perform the filtering to remove these
specific closure types from the profile if desired.
Fixes #16795
|
|
|
|
|
|
|
| |
This introduces a flag, --eventlog-flush-interval, which can be used to
set an upper bound on the amount of time for which an eventlog event
will remain enqueued. This can be useful in real-time monitoring
settings.
|
|
|
|
|
|
|
|
|
| |
This function is exposed in the RtsAPI.h so that external users have a
blessed way to traverse all the different `bdescr`s which are known by
the RTS.
The main motivation is to use this function in ghc-debug but avoid
having to expose the internal structure of a Capability in the API.
|
|
|
|
|
| |
Having a union in the closure profiling header really just complicates
things so get back to basics, we just have a single StgWord there for now.
|
| |
|
|
|
|
|
|
|
|
| |
Move them from the external IOInterface.h to the internal IOManager.h.
The functions are all in fact internal. They are not used from the base
library at all.
Remove ioManagerWakeup as an exported symbol. It is not used elsewhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Naming is hard. Where we want to get to is to have a clear internal and
external API for the IO manager within the RTS. What we have right now
is just the external API (used in base for the Haskell side of the
threaded IO manager impls) living in includes/rts/IOManager.h.
We want to add a clear RTS internal API, which really ought to live in
rts/IOManager.h. Several people think it's too confusing to have both:
* includes/rts/IOManager.h for the external API
* rts/IOManager.h for the internal API
So the plan is to add rts/IOManager.{h,c} as the internal parts, and
rename the external part to be includes/rts/IOInterface.h.
It is admittidly not great to have .h files in includes/rts/ called
"interface" since by definition, every .h fle under includes/ is an
interface!
Alternative naming scheme suggestions welcome!
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
When threadPaused blackholes a thunk it calls `OVERWRITING_CLOSURE` to
zero the slop for the benefit of the sanity checker. Previously this was
done *before* pushing the thunk's free variables to the update
remembered set. Consequently we would pull zero'd pointers to the update
remembered set.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #18043, flushTrace failed flush anything beyond the writer.
This means that a significant amount of data sitting in capability-local
event buffers may never get flushed, despite the users' pleads for us to
flush.
Fix this by making flushEventLog flush all of the event buffers before
flushing the writer.
Fixes #18043.
|
|
|
|
|
|
|
|
| |
We currently only post the entry counters, not the other global
counters as in my experience the former are more useful. We use the heap
profiler's census period to decide when to dump.
Also spruces up the documentation surrounding ticky-ticky a bit.
|
|
|
|
|
|
|
|
| |
This addes the necessary logic to support aarch64 on elf, as well
as aarch64 on mach-o, which Apple calls arm64.
We change architecture name to AArch64, which is the official arm
naming scheme.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This change is originally written by niteria)
This adds two functions:
* `loadNativeObj`
* `unloadNativeObj`
and implements them for Linux.
They are useful if you want to load a shared object with Haskell code
using the system linker and have GHC call dlclose() after the
code is no longer referenced from the heap.
Using the system linker allows you to load the shared object
above outside the low-mem region. It also loads the DWARF sections
in a way that `perf` understands.
`dl_iterate_phdr` is what makes this implementation Linux specific.
|
| |
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|\ |
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Previously the `current_value`, `first_watch_queue_entry`, and
`num_updates` fields of `StgTVar` were marked as `volatile` in an
attempt to provide strong ordering. Of course, this isn't sufficient.
We now use proper atomic operations. In most of these cases I strengthen
the ordering all the way to SEQ_CST although it's possible that some
could be weakened with some thought.
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Not only is this in general a good idea, but it turns out that GCC
unrolls the retry loop, resulting is massive code bloat in critical
parts of the RTS (e.g. `evacuate`).
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously we would take all capabilities but fail to join on the thread
itself, potentially resulting in a leaked thread.
|
| | | | |
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes two potentially problematic data races in the StablePtr
implementation:
* We would fail to RELEASE the stable pointer table when enlarging it,
causing other cores to potentially see uninitialized memory.
* We would fail to ACQUIRE when dereferencing a stable pointer.
|
| |/ |
|
| |
| |
| |
| |
| | |
Also ensure that we also store the info table pointer last to ensure
that the synchronization covers all stores.
|
| | |
|
| |
| |
| |
| |
| |
| | |
This is fairly straightforward; we just needed to use relaxed operations
for the PROF_SPIN counters and a release store instead of a write
barrier.
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
| |
The `rts_pause` and `rts_resume` functions have been added to `RtsAPI.h` and
allow an external process to completely pause and resume the RTS.
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Some removed globals variables were still declared in the RTS.
They were removed in the following commits:
* 4fc6524a2a4a0003495a96c8b84783286f65c198
* 0dc7985663efa1739aafb480759e2e2e7fca2a36
* bbd3c399939311ec3e308721ab87ca6b9443f358
|
|
|
|
|
| |
There are still global variables but only 3 booleans instead of a single
DynFlags.
|
|
|
|
|
|
| |
Previously we would allocate a linked list cell for each foreign export.
Now we can avoid this by taking advantage of the fact that they are
already broken into groups.
|
|
|
|
|
|
|
|
|
| |
This avoids calling `libc` in the initializers which are responsible for
registering foreign exports. We believe this should avoid the corruption
observed in #18548.
See Note [Tracking foreign exports] in rts/ForeignExports.c for an
overview of the new scheme.
|
| |
|
| |
|