| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This fixes an assertion failure in the m32 allocator due to the
imprecisely specified preconditions of `m32_allocator_push_filled_list`.
Specifically, the caller must ensure that the page type is set to filled
prior to calling `m32_allocator_push_filled_list`.
While this issue did result in an assertion failure in the debug RTS,
the issue is in fact benign.
|
|
|
|
| |
This reverts commit e09afbf2a998beea7783e3de5dce5dd3c6ff23db.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we deprecate the eventlogging RTS ways and instead enable eventlog
support in the remaining ways. This simplifies packaging and reduces GHC
compilation times (as we can eliminate two whole compilations of the RTS)
while simplifying the end-user story. The trade-off is a small increase
in binary sizes in the case that the user does not want eventlogging
support, but we think that this is a fine trade-off.
This also revealed a latent RTS bug: some files which included `Cmm.h`
also assumed that it defined various macros which were in fact defined
by `Config.h`, which `Cmm.h` did not include. Fixing this in turn
revealed that `StgMiscClosures.cmm` failed to import various spinlock
statistics counters, as evidenced by the failed unregisterised build.
Closes #18948.
|
|
|
|
| |
If the user has not configured a writer then there is nothing to flush.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Solves the quadratic worst case performance of freeing megablocks that
was described in issue #19897.
During GC runs, we now keep a secondary free list for megablocks that is
neither sorted, nor coalesced. That way, free becomes an O(1) operation
at the expense of not being able to reuse memory for larger allocations.
At the end of a GC run, the secondary free list is sorted and then
merged into the actual free list in a single pass.
That way, our worst case performance is O(n log(n)) rather than O(n^2).
We postulate that temporarily losing coalescense during a single GC run
won't have any adverse effects in practice because:
- We would need to release enough memory during the GC, and then after
that (but within the same GC run) allocate a megablock group of more
than one megablock. This seems unlikely, as large objects are not
copied during GC, and so we shouldn't need such large allocations
during a GC run.
- Allocations of megablock groups of more than one megablock are rare.
They only happen when a single heap object is large enough to require
that amount of space. Any allocation areas that are supposed to hold
more than one heap object cannot use megablock groups, because only
the first megablock of a megablock group has valid `bdescr`s. Thus,
heap object can only start in the first megablock of a group, not in
later ones.
|
|
|
|
| |
Also drops the unused TREC_COMMITTED transaction state.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug that @JunmingZhao42 and I noticed while working on her
MMTK port. Specifically, in stg_stop_thread we used stg_enter_info as a
sentinel at the tail of a stack after a thread has completed. However,
stg_enter_info expects to have a two-field payload, which we do not
push. Consequently, if the GC ends up somehow the stack it will attempt
to interpret data past the end of the stack as the frame's fields,
resulting in unsound behavior.
To fix this I eliminate this hacky use of `stg_stop_thread` and instead
introduce a new stack frame type, `stg_dead_thread_info`. Not only does
this eliminate the potential for the previously mentioned memory
unsoundness but it also more clearly captures the intended structure of
the dead threads' stacks.
|
|
|
|
|
| |
GHC no longers uses libtool for linking and therefore this is no longer
necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As described in Note [Wired-in exceptions are not CAFfy], a small set of
built-in exception closures get special treatment in the code generator,
being declared as non-CAFfy despite potentially containing CAF
references. The original intent of this treatment for the RTS to then
add StablePtrs for each of the closures, ensuring that they are not
GC'd. However, this logic was not applied consistently and eventually
removed entirely in 951c1fb0. This lead to #21141.
Here we fix this bug by reintroducing the StablePtrs and document the
status quo.
Closes #21141.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
While debugging it is very useful to be able to determine whether a
given info table is a stack frame or not. We have spare bits in the
closure flags array anyways, use one for this information.
|
| |
|
|
|
|
|
| |
Previously the interpreter's handling of `RET_BCO` stack frames would
throw away the tag of the returned closure. This resulted in #21390.
|
| |
|
| |
|
|
|
|
|
| |
Since we have switched to Clang the toolchain now links against
ucrt rather than msvcrt.
|
|
|
|
| |
As is necessary on Windows.
|
|\ \ \
| | | |
| | | |
| | | | |
'wip/windows-clang-2' and 'wip/lint-rts-includes' into wip/windows-clang-join
|
| | | |
| | | |
| | | |
| | | | |
This fixes various violations of the newly-added RTS includes linter.
|
| |_|/
|/| |
| | |
| | | |
It's easier to ensure that this is included first than Rts.h
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Clang on Windows does not understand the `gnu_printf` attribute; use
`printf` instead.
|
| | | |
|
| |/
|/|
| |
| | |
This is a gcc-specific extension.
|
| |
| |
| |
| |
| | |
Previously `isArchive` could leak a `FILE` handle if the `fread`
returned a short read.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously the RTS linker would call initializers during the
"resolve" phase of linking. However, this is problematic in the
case of cyclic dependencies between objects. In particular, consider
the case where we have a situation where a static library
contains a set of recursive objects:
* object A has depends upon symbols in object B
* object B has an initializer that depends upon object A
* we try to load object A
The linker would previously:
1. start resolving object A
2. encounter the reference to object B, loading it resolve object B
3. run object B's initializer
4. the initializer will attempt to call into object A,
which hasn't been fully resolved (and therefore protected)
Fix this by moving constructor execution to a new linking
phase, which follows resolution.
Fix #21253.
|
| | |
|
| |
| |
| |
| |
| |
| | |
swprintf deviates from usual `snprintf` semantics in that it does not
guarantee reasonable behavior when the buffer is NULL (that is,
returning the number of bytes that would have been emitted).
|
| | |
|
| |
| |
| |
| |
| |
| | |
We now preserve the address that we last mapped, allowing us to resume
our search and avoiding quadratic allocation costs. This fixes the
runtime of T10296a, which allocates many adjustors.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a significant rework of the PEi386 linker, making the linker
compatible with high image base addresses. Specifically, we now use the
m32 allocator instead of `HeapAllocate`.
In addition I found a number of latent bugs in our handling of import
libraries and relocations. I've added quite a few comments describing
what I've learned about Windows import libraries while fixing these.
Thanks to Tamar Christina (@Phyx) for providing the address space search
logic, countless hours of help while debugging, and his boundless
Windows knowledge.
Co-Authored-By: Tamar Christina <tamar@zhox.com>
|
| |
| |
| |
| |
| |
| |
| | |
Tables-next-to-code mandates that we treat symbols with info tables like
data since we cannot relocate them using a jump island.
See #20983.
|
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes handling of overflowed relocations on PEi386 targets:
* Refuse to create jump islands for relocations of data symbols
* Correctly handle the `__imp___acrt_iob_func` symbol, which is an new
type of symbol: `SYM_TYPE_INDIRECT_DATA`
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As noted in #20978, the linker would previously handle overflowed
relocations by creating a jump island. While this is fine in the case of
code symbols, it's very much not okay in the case of data symbols. To
fix this we must keep track of whether each symbol is code or data and
relocate them appropriately. This patch takes the first step in this
direction, adding a symbol type field to the linker's symbol table. It
doesn't yet change relocation behavior to take advantage of this
knowledge.
Fixes #20978.
|
| |
| |
| |
| |
| | |
Previously we would leak the section information of the `.bss`
section.
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In !7511 (closed) I introduced a new allocator for adjustors,
AdjustorPool, which eliminates the address space fragmentation issues
which adjustors can introduce. In that work I focused on amd64 since
that was the platform where I observed issues.
However, in #21132 we noted that the size of adjustors is also a cause
of CI fragility on i386. In this MR I port i386 to use AdjustorPool.
Sadly the complexity of the i386 adjustor code does cause require a bit
of generalization which makes the code a bit more opaque but such is the
world.
Closes #21132.
|
|
|
|
| |
Unfortunately the i386 adjustor logic needs this.
|
|
|
|
| |
Since there may be .o files which are in fact archives.
|
|
|
|
| |
See #21068.
|
|
|
|
|
|
|
| |
Check the file's header to catch static archive bearing the `.o`
extension, as may happen on Windows after the Clang refactoring.
See #21068
|
|
|
|
|
|
|
|
|
| |
* The size of End concurrent mark phase looks wrong and, it used to be 4 and now it's 0.
* The size of Task create is wrong, used to be 18 and now 14.
* The event ticky-ticky entry counter begin sample has the wrong name
* The event ticky-ticky entry counter being sample has the wrong size, was 0 now 32.
Closes #21070
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When passed a combination of `-N` and `-qn` options the cpu time for
garbage collection was being vastly overcounted because the counters
were not being zeroed appropiately.
When -qn1 is passed, only 1 of the N avaiable GC threads is chosen to
perform work, the rest are idle. At the end of the GC period, stat_endGC
traverses all the GC threads and adds up the elapsed time from each of
them. For threads which didn't participate in this GC, the value of the
cpu time should be zero, but before this patch, the counters were not
zeroed and hence we would count the same elapsed time on many subsequent
iterations (until the thread participated in a GC again).
The most direct way to zero these fields is to do so immediately after
the value is added into the global counter, after which point they are
never used again.
We also tried another approach where we would zero the counter in
yieldCapability but there are some (undiagnosed) siations where a
capbility would not pass through yieldCapability before the GC ended and
the same double counting problem would occur.
Fixes #21082
|