| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Move the logic for taking censuses of "normal" and pinned blocks to
their own functions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fix for 18919 was somewhat incomplete: while the MVars were
correctly added to the mut_list via dirty_MVAR(), their info table
remained "clean".
While this is mostly harmless in non-debug builds, but trips an
assertion in the debug build, and may result in the MVar being
needlessly being added to the mut_list multiple times.
Resolves: #19145
|
| |
|
|
|
|
|
|
|
| |
In general we are less careful about locking closures when running with
only a single capability.
Fixes #19075.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This gives a small increase in performance under most circumstances.
For single threaded GC the improvement is on the order of 1-2%.
For multi threaded GC the results are quite noisy but seem to
fall into the same ballpark.
Fixes #16499
|
| |
|
|
|
|
|
|
| |
Previously we would push large objects and compact regions to the mark
queue during the deadlock detect GC, resulting in failure to detect
deadlocks.
|
|
|
|
| |
Pull the cold non-moving allocation path out of alloc_for_copy.
|
|
|
|
|
|
| |
Previously the deadlock-detection promotion logic in alloc_for_copy was
just plain wrong: it failed to fire when gct->evac_gen_no !=
oldest_gen->gen_no. The fix is simple: move the
|
|
|
|
|
| |
When performing a deadlock-detection GC we must ensure that all objects
end up in the non-moving generation. Assert this in scavenge.
|
|
|
|
|
| |
Previously an incorrect semicolon meant that we would fail to call
busy_wait_nop when spinning.
|
| |
|
|
|
|
|
|
|
|
| |
The algorithm described in the referenced paper uses this slightly
weaker atomic op.
This is the first "exotic" cas we're using. I've added a macro in the
<ORDERING>_OP style to match existing ones.
|
|
|
|
|
| |
THREADED_RTS was previously misspelled as THREADEDED_RTS.
Fixes #19057.
|
|
|
|
|
| |
The elf size is 32bit on 32bit builds and 64 otherwise.
We just upcast to 64bits before printing now.
|
| |
|
|
|
|
|
| |
StgWord has different widths on 32/64bit. So use the proper type
instead.
|
| |
|
|
|
|
|
|
| |
Every time I am asked about how to interpret these events I need to
figure it out from scratch. It's well past time that the users guide
properly documents these.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change fixes build failure on musl:
```
rts/linker/Elf.c:2031:3: error:
warning: implicit declaration of function 'dlclose'; did you mean 'close'? [-Wimplicit-function-declaration]
2031 | dlclose(nc->dlopen_handle);
| ^~~~~~~
| close
```
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
| |
|
|
|
|
| |
Instead of relying on RTS_LINKER_USE_MMAP
|
| |
|
|
|
|
| |
Consolidates munmap calls to ensure consistent error handling.
|
|
|
|
|
|
|
|
| |
Previously most of the uses of mmapForLinker were mapping anonymous
memory, resulting in a great deal of unnecessary repetition. Factor this
out into a new helper.
Also fixes a few places where error checking was missing or suboptimal.
|
|
|
|
| |
This previously resulted in warnings due to spurious unmap failures.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
While the original head and tail of the TSO queue may be in the same
generation as the MVAR, interior elements of the queue could be younger
after a GC run and may then be exposed by putMVar operation that updates
the queue head.
Resolves #18919
|
|
|
|
|
|
|
|
| |
In the past some people have confused ASSERT, which is for checking
internal invariants, which CHECK, which should be used when checking
things that might fail due to bad input (and therefore should be enabled
even in the release compiler). Change some of these cases in the linker
to use CHECK.
|
|
|
|
| |
Use the GHC wrappers instead of <assert.h>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, in an attempt to reduce fragmentation, each new allocator
would map a region of M32_MAX_PAGES fresh pages to seed itself. However,
this ends up being extremely wasteful since it turns out that we often
use fewer than this. Consequently, these pages end up getting freed
which, ends up fragmenting our address space more than than we would
have if we had naively allocated pages on-demand.
Here we refactor m32 to avoid this waste while achieving the
fragmentation mitigation previously desired. In particular, we move all
page allocation into the global m32_alloc_page, which will pull a page
from the free page pool. If the free page pool is empty we then refill
it by allocating a region of M32_MAP_PAGES and adding them to the pool.
Furthermore, we do away with the initial seeding entirely. That is, the
allocator starts with no active pages: pages are rather allocated on an
as-needed basis.
On the whole this ends up being a pleasingly simple change,
simultaneously making m32 more efficient, more robust, and simpler.
Fixes #18980.
|
|
|
|
| |
See Note [Non-moving GC: Marking evacuated objects].
|
| |
|
|
|
|
| |
The mark thread is not joinable as we detach from it on creation.
|
|
|
|
| |
pthread_join returns its error code and apparently doesn't set errno.
|
|
|
|
|
| |
Ensure that the the free variables have been pushed to the update
remembered set before we zero the slop.
|
| |
|
|
|
|
|
|
| |
After a THROWTO message has been handle the message closure is
overwritten by a NULL message. We must ensure that the original
closure's pointers continue to be visible to the nonmoving GC.
|
|
|
|
|
|
|
| |
The TSAN rework (specifically aad1f803) introduced a subtle regression
in GC.c, swapping `g0` in place of `gen`. Whoops!
Fixes #18997.
|
|
|
|
|
|
|
|
| |
When threadPaused blackholes a thunk it calls `OVERWRITING_CLOSURE` to
zero the slop for the benefit of the sanity checker. Previously this was
done *before* pushing the thunk's free variables to the update
remembered set. Consequently we would pull zero'd pointers to the update
remembered set.
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
| |
As noted in #18991, we would previously allocate heap in low memory.
Due to this the linker, which typically *needs* low memory, would end up
competing with the heap. In longer builds we end up running out of
low memory entirely, leading to linking failures.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On windows using gcc-10 gcc failed to inline copy_tag into evacuate.
To fix this we now set the always_inline attribute for the various
copy* functions in Evac.c. The main motivation here is not the
overhead of the function call, but rather that this allows the code
to "specialize" for the size of the closure we copy which is often
known at compile time.
An earlier commit also tried to avoid evacuate_large inlining. But
didn't quite succeed. So I also marked evacuate_large as noinline.
Fixes #12416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with
Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us
with properly sized primitives in the codegenerator instead of pretending
they are all full machine words.
This came up when implementing darwinpcs for arm64. The darwinpcs reqires
us to pack function argugments in excess of registers on the stack. While
most procedure call standards (pcs) assume arguments are just passed in
8 byte slots; and thus the caller does not know the exact signature to make
the call, darwinpcs requires us to adhere to the prototype, and thus have
the correct sizes. If we specify CInt in the FFI call, it should correspond
to the C int, and not just be Word sized, when it's only half the size.
This does change the expected output of T16402 but the new result is no
less correct as it eliminates the narrowing (instead of the `and` as was
previously done).
Bumps the array, bytestring, text, and binary submodules.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
Metric Increase:
T13701
T14697
|
|
|
|
|
|
| |
Otherwise `opt` fails with:
error: use of undefined value '@memcmp$def'
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #18043, flushTrace failed flush anything beyond the writer.
This means that a significant amount of data sitting in capability-local
event buffers may never get flushed, despite the users' pleads for us to
flush.
Fix this by making flushEventLog flush all of the event buffers before
flushing the writer.
Fixes #18043.
|
|
|
|
|
|
|
|
| |
We currently only post the entry counters, not the other global
counters as in my experience the former are more useful. We use the heap
profiler's census period to decide when to dump.
Also spruces up the documentation surrounding ticky-ticky a bit.
|