| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Since we have switched to Clang the toolchain now links against
ucrt rather than msvcrt.
|
|
|
|
| |
As is necessary on Windows.
|
|\ \ \
| | | |
| | | |
| | | | |
'wip/windows-clang-2' and 'wip/lint-rts-includes' into wip/windows-clang-join
|
| | | |
| | | |
| | | |
| | | | |
This fixes various violations of the newly-added RTS includes linter.
|
| |_|/
|/| |
| | |
| | | |
It's easier to ensure that this is included first than Rts.h
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Clang on Windows does not understand the `gnu_printf` attribute; use
`printf` instead.
|
| | | |
|
| |/
|/|
| |
| | |
This is a gcc-specific extension.
|
| |
| |
| |
| |
| | |
Previously `isArchive` could leak a `FILE` handle if the `fread`
returned a short read.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously the RTS linker would call initializers during the
"resolve" phase of linking. However, this is problematic in the
case of cyclic dependencies between objects. In particular, consider
the case where we have a situation where a static library
contains a set of recursive objects:
* object A has depends upon symbols in object B
* object B has an initializer that depends upon object A
* we try to load object A
The linker would previously:
1. start resolving object A
2. encounter the reference to object B, loading it resolve object B
3. run object B's initializer
4. the initializer will attempt to call into object A,
which hasn't been fully resolved (and therefore protected)
Fix this by moving constructor execution to a new linking
phase, which follows resolution.
Fix #21253.
|
| | |
|
| |
| |
| |
| |
| |
| | |
swprintf deviates from usual `snprintf` semantics in that it does not
guarantee reasonable behavior when the buffer is NULL (that is,
returning the number of bytes that would have been emitted).
|
| | |
|
| |
| |
| |
| |
| |
| | |
We now preserve the address that we last mapped, allowing us to resume
our search and avoiding quadratic allocation costs. This fixes the
runtime of T10296a, which allocates many adjustors.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a significant rework of the PEi386 linker, making the linker
compatible with high image base addresses. Specifically, we now use the
m32 allocator instead of `HeapAllocate`.
In addition I found a number of latent bugs in our handling of import
libraries and relocations. I've added quite a few comments describing
what I've learned about Windows import libraries while fixing these.
Thanks to Tamar Christina (@Phyx) for providing the address space search
logic, countless hours of help while debugging, and his boundless
Windows knowledge.
Co-Authored-By: Tamar Christina <tamar@zhox.com>
|
| |
| |
| |
| |
| |
| |
| | |
Tables-next-to-code mandates that we treat symbols with info tables like
data since we cannot relocate them using a jump island.
See #20983.
|
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes handling of overflowed relocations on PEi386 targets:
* Refuse to create jump islands for relocations of data symbols
* Correctly handle the `__imp___acrt_iob_func` symbol, which is an new
type of symbol: `SYM_TYPE_INDIRECT_DATA`
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As noted in #20978, the linker would previously handle overflowed
relocations by creating a jump island. While this is fine in the case of
code symbols, it's very much not okay in the case of data symbols. To
fix this we must keep track of whether each symbol is code or data and
relocate them appropriately. This patch takes the first step in this
direction, adding a symbol type field to the linker's symbol table. It
doesn't yet change relocation behavior to take advantage of this
knowledge.
Fixes #20978.
|
| |
| |
| |
| |
| | |
Previously we would leak the section information of the `.bss`
section.
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In !7511 (closed) I introduced a new allocator for adjustors,
AdjustorPool, which eliminates the address space fragmentation issues
which adjustors can introduce. In that work I focused on amd64 since
that was the platform where I observed issues.
However, in #21132 we noted that the size of adjustors is also a cause
of CI fragility on i386. In this MR I port i386 to use AdjustorPool.
Sadly the complexity of the i386 adjustor code does cause require a bit
of generalization which makes the code a bit more opaque but such is the
world.
Closes #21132.
|
|
|
|
| |
Unfortunately the i386 adjustor logic needs this.
|
|
|
|
| |
Since there may be .o files which are in fact archives.
|
|
|
|
| |
See #21068.
|
|
|
|
|
|
|
| |
Check the file's header to catch static archive bearing the `.o`
extension, as may happen on Windows after the Clang refactoring.
See #21068
|
|
|
|
|
|
|
|
|
| |
* The size of End concurrent mark phase looks wrong and, it used to be 4 and now it's 0.
* The size of Task create is wrong, used to be 18 and now 14.
* The event ticky-ticky entry counter begin sample has the wrong name
* The event ticky-ticky entry counter being sample has the wrong size, was 0 now 32.
Closes #21070
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When passed a combination of `-N` and `-qn` options the cpu time for
garbage collection was being vastly overcounted because the counters
were not being zeroed appropiately.
When -qn1 is passed, only 1 of the N avaiable GC threads is chosen to
perform work, the rest are idle. At the end of the GC period, stat_endGC
traverses all the GC threads and adds up the elapsed time from each of
them. For threads which didn't participate in this GC, the value of the
cpu time should be zero, but before this patch, the counters were not
zeroed and hence we would count the same elapsed time on many subsequent
iterations (until the thread participated in a GC again).
The most direct way to zero these fields is to do so immediately after
the value is added into the global counter, after which point they are
never used again.
We also tried another approach where we would zero the counter in
yieldCapability but there are some (undiagnosed) siations where a
capbility would not pass through yieldCapability before the GC ended and
the same double counting problem would occur.
Fixes #21082
|
|
|
|
|
|
|
|
| |
Previously `markCAFs` would call `markObjectCode` even in non-major GCs.
This is problematic since `prepareUnloadCheck` is not called in such
GCs, meaning that the section index has not been updated.
Fixes #21254
|
|
|
|
|
|
|
|
| |
Previously we failed to untag the function closure when scavenging the
payload of a PAP, resulting in an invalid closure pointer being passed
to scavenge_large_bitmap and consequently #21254. Fix this.
Fixes #21254
|
|
|
|
|
|
|
|
|
|
| |
In !7604 we started placing adjustor templates in the data section on
Linux as some toolchains there reject relocations in the text section.
However, it turns out that OpenBSD also exhibits this restriction.
Fix this by *always* placing adjustor templates in the data section.
Fixes #21155.
|
|
|
|
| |
Fixes #21251
|
| |
|
| |
|
|
|
|
|
|
| |
When ASSIGN_BaseReg is a no-op, we shouldn't generate any C code,
otherwise C compiler complains a bunch of -Wunused-value warnings when
doing unregisterised codegen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a number of changes to ticky-ticky profiling.
When an executable is profiled with IPE profiling it's now possible to
associate id-related ticky counters to their source location.
This works by emitting the info table address as part of the counter
which can be looked up in the IPE table.
Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks
which are precompiled into the RTS. This means reduced cache locality
and increased code size. But it allows better attribution of execution
cost to specific source locations instead of simple attributing it to
the standard thunk.
ticky-ticky now uses the `arg` field to emit additional information
about counters in json format. When ticky-ticky is used in combination
with the eventlog eventlog2html can be used to generate a html table
from the eventlog similar to the old text output for ticky-ticky.
|
|
|
|
|
|
| |
@nrnrnr points out that on his machine ld.lld rejects text relocations.
Generalize the Darwin text-relocation avoidance logic to account for
this.
|
|
|
|
|
|
|
|
| |
Several 64-bit operation were implemented with FFI calls on 32-bit
architectures but we can easily implement them with inline assembly
code.
Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
|
|
|
|
|
|
| |
bitmap_get is only used in the DEBUG RTS configuration.
Fixes #21079.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch relaxes the instruction for load_load_barrier().
Current load_load_barrier() implements full-barrier with `dmb sy`.
It's too strong to order load-load instructions.
We can relax it by using `dmb ld`.
If current load_load_barrier() is used for full-barriers
(load/store - load/store barrier), this patch is not suitable.
See also linux-kernel's smp_rmb() implementation:
https://github.com/torvalds/linux/blob/v5.14/arch/arm64/include/asm/barrier.h#L90
Hopefully, it's better to use `dmb ishld` rather than `dmb ld`
to improve performance. However, I can't validate effects on
a real many-core Arm machine.
|
| |
|
| |
|
| |
|
| |
|
| |
|