summaryrefslogtreecommitdiff
path: root/rts
Commit message (Collapse)AuthorAgeFilesLines
* rts: Fix off-by-one in snwprintf usagewip/windows-finalwip/windows-clang-joinBen Gamari2022-04-071-2/+5
|
* rts: Fallback to ucrtbase not msvcrtBen Gamari2022-04-071-3/+4
| | | | | Since we have switched to Clang the toolchain now links against ucrt rather than msvcrt.
* rts/CloneStack: Ensure that Rts.h is #included firstBen Gamari2022-04-071-2/+2
| | | | As is necessary on Windows.
*---. Merge branches 'wip/windows-high-codegen', 'wip/windows-high-linker', ↵Ben Gamari2022-04-0733-802/+1080
|\ \ \ | | | | | | | | | | | | 'wip/windows-clang-2' and 'wip/lint-rts-includes' into wip/windows-clang-join
| | | * rts: Fix various #include issuesBen Gamari2022-04-0616-30/+28
| | | | | | | | | | | | | | | | This fixes various violations of the newly-added RTS includes linter.
| | | * rts: Move __USE_MINGW_ANSI_STDIO definition to PosixSource.hBen Gamari2022-04-062-12/+12
| |_|/ |/| | | | | | | | It's easier to ensure that this is included first than Rts.h
| | * rts: Adjust RTS symbol table on Windows for ucrtBen Gamari2022-04-071-4/+4
| | |
| | * rts: Add missing newline in error messageBen Gamari2022-04-071-1/+1
| | |
| | * rts: Refactor and fix printf attributes on clangBen Gamari2022-04-073-26/+15
| | | | | | | | | | | | | | | Clang on Windows does not understand the `gnu_printf` attribute; use `printf` instead.
| | * linker/PEi386: More descriptive error messageBen Gamari2022-04-061-1/+1
| | |
| | * rts: Eliminate use of nested functionsGHC GitLab CI2022-04-061-9/+11
| |/ |/| | | | | This is a gcc-specific extension.
| * rts/linker/LoadArchive: Fix leaking file handlewip/windows-high-linkerBen Gamari2022-04-061-1/+1
| | | | | | | | | | Previously `isArchive` could leak a `FILE` handle if the `fread` returned a short read.
| * rts/linker: Split up object resolution and initializationBen Gamari2022-04-062-15/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the RTS linker would call initializers during the "resolve" phase of linking. However, this is problematic in the case of cyclic dependencies between objects. In particular, consider the case where we have a situation where a static library contains a set of recursive objects: * object A has depends upon symbols in object B * object B has an initializer that depends upon object A * we try to load object A The linker would previously: 1. start resolving object A 2. encounter the reference to object B, loading it resolve object B 3. run object B's initializer 4. the initializer will attempt to call into object A, which hasn't been fully resolved (and therefore protected) Fix this by moving constructor execution to a new linking phase, which follows resolution. Fix #21253.
| * rts/linker: Report archive member indexBen Gamari2022-04-061-5/+7
| |
| * rts/PathUtils: Define pathprintf in terms of snwprintf on WindowsBen Gamari2022-04-061-1/+1
| | | | | | | | | | | | swprintf deviates from usual `snprintf` semantics in that it does not guarantee reasonable behavior when the buffer is NULL (that is, returning the number of bytes that would have been emitted).
| * rts/linker: More descriptive debug outputBen Gamari2022-04-062-12/+21
| |
| * rts/PEi386: Avoid accidentally-quadratic allocation costBen Gamari2022-04-061-19/+45
| | | | | | | | | | | | We now preserve the address that we last mapped, allowing us to resume our search and avoiding quadratic allocation costs. This fixes the runtime of T10296a, which allocates many adjustors.
| * rts/PEi386: Move allocateBytes to MMap.cBen Gamari2022-04-063-110/+92
| |
| * rts/PEi386: Rework linkerBen Gamari2022-04-067-377/+493
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a significant rework of the PEi386 linker, making the linker compatible with high image base addresses. Specifically, we now use the m32 allocator instead of `HeapAllocate`. In addition I found a number of latent bugs in our handling of import libraries and relocations. I've added quite a few comments describing what I've learned about Windows import libraries while fixing these. Thanks to Tamar Christina (@Phyx) for providing the address space search logic, countless hours of help while debugging, and his boundless Windows knowledge. Co-Authored-By: Tamar Christina <tamar@zhox.com>
| * rts: Mark anything that might have an info table as dataGHC GitLab CI2022-04-061-265/+269
| | | | | | | | | | | | | | Tables-next-to-code mandates that we treat symbols with info tables like data since we cannot relocate them using a jump island. See #20983.
| * rts/PEi386: Fix relocation overflow behaviorBen Gamari2022-04-063-16/+27
| | | | | | | | | | | | | | | | This fixes handling of overflowed relocations on PEi386 targets: * Refuse to create jump islands for relocations of data symbols * Correctly handle the `__imp___acrt_iob_func` symbol, which is an new type of symbol: `SYM_TYPE_INDIRECT_DATA`
| * rts/linker: Preserve information about symbol typesBen Gamari2022-04-0610-41/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | As noted in #20978, the linker would previously handle overflowed relocations by creating a jump island. While this is fine in the case of code symbols, it's very much not okay in the case of data symbols. To fix this we must keep track of whether each symbol is code or data and relocate them appropriately. This patch takes the first step in this direction, adding a symbol type field to the linker's symbol table. It doesn't yet change relocation behavior to take advantage of this knowledge. Fixes #20978.
| * rts/PEi386: Fix memory leakGHC GitLab CI2022-04-061-1/+3
| | | | | | | | | | Previously we would leak the section information of the `.bss` section.
| * rts/PEi386: Move some debugging output to -DLGHC GitLab CI2022-04-061-0/+4
|/
* adjustors/i386: Use AdjustorPoolBen Gamari2022-04-065-133/+163
| | | | | | | | | | | | | | | In !7511 (closed) I introduced a new allocator for adjustors, AdjustorPool, which eliminates the address space fragmentation issues which adjustors can introduce. In that work I focused on amd64 since that was the platform where I observed issues. However, in #21132 we noted that the size of adjustors is also a cause of CI fragility on i386. In this MR I port i386 to use AdjustorPool. Sadly the complexity of the i386 adjustor code does cause require a bit of generalization which makes the code a bit more opaque but such is the world. Closes #21132.
* rts/AdjustorPool: Generalize to allow arbitrary contextsBen Gamari2022-04-064-35/+62
| | | | Unfortunately the i386 adjustor logic needs this.
* Build ar archives with -L when "joining" objectsBen Gamari2022-04-061-0/+1
| | | | Since there may be .o files which are in fact archives.
* Add a Note describing lack of object merging on WindowsBen Gamari2022-04-061-0/+2
| | | | See #21068.
* rts/linker: Catch archives masquerading as object filesBen Gamari2022-04-063-2/+33
| | | | | | | Check the file's header to catch static archive bearing the `.o` extension, as may happen on Windows after the Clang refactoring. See #21068
* Fix remaining issues in eventlog types (gen_event_types.py)Matthew Pickering2022-04-011-3/+4
| | | | | | | | | * The size of End concurrent mark phase looks wrong and, it used to be 4 and now it's 0. * The size of Task create is wrong, used to be 18 and now 14. * The event ticky-ticky entry counter begin sample has the wrong name * The event ticky-ticky entry counter being sample has the wrong size, was 0 now 32. Closes #21070
* RTS: Zero gc_cpu_start and gc_cpu_end after accountingMatthew Pickering2022-03-291-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | When passed a combination of `-N` and `-qn` options the cpu time for garbage collection was being vastly overcounted because the counters were not being zeroed appropiately. When -qn1 is passed, only 1 of the N avaiable GC threads is chosen to perform work, the rest are idle. At the end of the GC period, stat_endGC traverses all the GC threads and adds up the elapsed time from each of them. For threads which didn't participate in this GC, the value of the cpu time should be zero, but before this patch, the counters were not zeroed and hence we would count the same elapsed time on many subsequent iterations (until the thread participated in a GC again). The most direct way to zero these fields is to do so immediately after the value is added into the global counter, after which point they are never used again. We also tried another approach where we would zero the counter in yieldCapability but there are some (undiagnosed) siations where a capbility would not pass through yieldCapability before the GC ended and the same double counting problem would occur. Fixes #21082
* rts: Don't mark object code in markCAFs unless necessaryBen Gamari2022-03-231-2/+4
| | | | | | | | Previously `markCAFs` would call `markObjectCode` even in non-major GCs. This is problematic since `prepareUnloadCheck` is not called in such GCs, meaning that the section index has not been updated. Fixes #21254
* rts: Untag function field in scavenge_PAP_payloadBen Gamari2022-03-231-1/+2
| | | | | | | | Previously we failed to untag the function closure when scavenging the payload of a PAP, resulting in an invalid closure pointer being passed to scavenge_large_bitmap and consequently #21254. Fix this. Fixes #21254
* rts/adjustor: Place adjustor templates in data section on all OSsBen Gamari2022-03-231-2/+2
| | | | | | | | | | In !7604 we started placing adjustor templates in the data section on Linux as some toolchains there reject relocations in the text section. However, it turns out that OpenBSD also exhibits this restriction. Fix this by *always* placing adjustor templates in the data section. Fixes #21155.
* Compact regions: Maintain tags properlyAndreas Klebinger2022-03-191-2/+2
| | | | Fixes #21251
* linker: Fix ADDR32NB relocations on WindowsTamar Christina2022-03-171-1/+11
|
* linker: Initial Windows C++ exception unwinding supportTamar Christina2022-03-173-3/+121
|
* CmmToC: fix -Wunused-value warning in ASSIGN_BaseRegCheng Shao2022-03-111-1/+1
| | | | | | When ASSIGN_BaseReg is a no-op, we shouldn't generate any C code, otherwise C compiler complains a bunch of -Wunused-value warnings when doing unregisterised codegen.
* Ticky profiling improvements.Matthew Pickering2022-03-023-4/+6
| | | | | | | | | | | | | | | | | | | | This adds a number of changes to ticky-ticky profiling. When an executable is profiled with IPE profiling it's now possible to associate id-related ticky counters to their source location. This works by emitting the info table address as part of the counter which can be looked up in the IPE table. Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks which are precompiled into the RTS. This means reduced cache locality and increased code size. But it allows better attribution of execution cost to specific source locations instead of simple attributing it to the standard thunk. ticky-ticky now uses the `arg` field to emit additional information about counters in json format. When ticky-ticky is used in combination with the eventlog eventlog2html can be used to generate a html table from the eventlog similar to the old text output for ticky-ticky.
* rts/adjustor: Always place adjustor templates in data sectionBen Gamari2022-02-251-4/+8
| | | | | | @nrnrnr points out that on his machine ld.lld rejects text relocations. Generalize the Darwin text-relocation avoidance logic to account for this.
* NCG: inline some 64-bit primops on x86/32-bit (#5444)Sylvain Henry2022-02-231-2/+0
| | | | | | | | Several 64-bit operation were implemented with FFI calls on 32-bit architectures but we can easily implement them with inline assembly code. Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
* rts/AdjustorPool: Silence unused function warningBen Gamari2022-02-171-1/+2
| | | | | | bitmap_get is only used in the DEBUG RTS configuration. Fixes #21079.
* rts: document some closure typesAdam Sandberg Ericsson2022-02-164-68/+198
|
* rts: remove struct StgRetry, it is never usedAdam Sandberg Ericsson2022-02-161-5/+0
|
* Relax load_load_barrier for aarch64Takenobu Tani2022-02-161-1/+1
| | | | | | | | | | | | | | | | | This patch relaxes the instruction for load_load_barrier(). Current load_load_barrier() implements full-barrier with `dmb sy`. It's too strong to order load-load instructions. We can relax it by using `dmb ld`. If current load_load_barrier() is used for full-barriers (load/store - load/store barrier), this patch is not suitable. See also linux-kernel's smp_rmb() implementation: https://github.com/torvalds/linux/blob/v5.14/arch/arm64/include/asm/barrier.h#L90 Hopefully, it's better to use `dmb ishld` rather than `dmb ld` to improve performance. However, I can't validate effects on a real many-core Arm machine.
* adjustors/NativeAmd64Mingw: Use AdjustorPoolBen Gamari2022-02-134-151/+189
|
* adjustors/NativeAmd64: Use AdjustorPoolBen Gamari2022-02-134-124/+160
|
* Introduce initAdjustorsBen Gamari2022-02-138-5/+22
|
* rts: Initial commit of AdjustorPoolBen Gamari2022-02-134-0/+365
|
* rts/adjustor: Split Windows path out of NativeAmd64Ben Gamari2022-02-134-165/+229
|