| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a number of changes to ticky-ticky profiling.
When an executable is profiled with IPE profiling it's now possible to
associate id-related ticky counters to their source location.
This works by emitting the info table address as part of the counter
which can be looked up in the IPE table.
Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks
which are precompiled into the RTS. This means reduced cache locality
and increased code size. But it allows better attribution of execution
cost to specific source locations instead of simple attributing it to
the standard thunk.
ticky-ticky now uses the `arg` field to emit additional information
about counters in json format. When ticky-ticky is used in combination
with the eventlog eventlog2html can be used to generate a html table
from the eventlog similar to the old text output for ticky-ticky.
|
|
|
|
|
|
| |
@nrnrnr points out that on his machine ld.lld rejects text relocations.
Generalize the Darwin text-relocation avoidance logic to account for
this.
|
|
|
|
|
|
|
|
| |
Several 64-bit operation were implemented with FFI calls on 32-bit
architectures but we can easily implement them with inline assembly
code.
Also remove unused hs_int64ToWord64 and hs_word64ToInt64 C functions.
|
|
|
|
|
|
| |
bitmap_get is only used in the DEBUG RTS configuration.
Fixes #21079.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch relaxes the instruction for load_load_barrier().
Current load_load_barrier() implements full-barrier with `dmb sy`.
It's too strong to order load-load instructions.
We can relax it by using `dmb ld`.
If current load_load_barrier() is used for full-barriers
(load/store - load/store barrier), this patch is not suitable.
See also linux-kernel's smp_rmb() implementation:
https://github.com/torvalds/linux/blob/v5.14/arch/arm64/include/asm/barrier.h#L90
Hopefully, it's better to use `dmb ishld` rather than `dmb ld`
to improve performance. However, I can't validate effects on
a real many-core Arm machine.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
The last Alpha chip was produced in 2004.
|
|
|
|
| |
Previously we failed to handle the case that `allocateExecPage` failed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
not specific to the tag analysis.
-------------------------
Metric Decrease:
T12545
T18698b
T18140
T18923
LargeRecord
Metric Increase:
LargeRecord
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13253-spj
T13379
T15164
T18282
T18304
T18698a
T1969
T20049
T3294
T4801
T5321FD
T5321Fun
T783
T9233
T9675
T9961
T19695
WWRec
-------------------------
|
| |
|
|
|
|
|
| |
Not all events start with CapNo and there's not logic I could see which
adds this to the length.
|
|
|
|
|
|
|
| |
This leads to corrupted eventlogs because the size of EVENT_MEM_RETURN is
completely wrong.
Fixes a bug introduced in 2e29edb7421c21902b47d130d45f60d3f584a0de
|
|
|
|
|
|
|
| |
This leads to corrupted eventlogs because the size of EVENT_IPE is
completely wrong.
Fixes a bug introduced in 2e29edb7421c21902b47d130d45f60d3f584a0de
|
|
|
|
|
|
|
|
|
|
| |
Previously `addLibrarySearchPath` failed to normalise the added path to
UNC form before passing it to `AddDllDirectory`. Consequently, the call
was subject to the MAX_PATH restriction, leading to the failure of
`test-defaulting-plugin-fail`, among others. Happily, this also nicely
simplifies the implementation.
Closes #21059.
|
|
|
|
| |
We no longer support Windows Vista.
|
|
|
|
|
|
| |
Here we try to separate the policy decisions of where to place mappings
from the mechanism of creating the mappings. This makes things
significantly easier to follow.
|
|
|
|
|
|
|
|
| |
As noted in #21057, we really shouldn't be using MAP_FIXED. I would much
rather have the process crash with a "failed to map" error than randomly
overwrite existing mappings.
Closes #21057.
|
| |
|
| |
|
|
|
|
| |
They are not particularly related to linking.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
There appears to be some inconsistency in system-call type naming across
Darwin toolchains. Specifically:
* the `address` argument to `mach_vm_region` apparently wants to be a
`mach_vm_address_t *`, not a `vm_address_t *`
* the `vmsize` argument to `mach_vm_region` wants to be a
`mach_vm_size_t`, not a `vm_size_t`
|
|
|
|
|
| |
Recent FreeBSD versions gained the sched_getaffinity function, which made two
mutually exclusive #ifdef blocks to be enabled.
|
|
|
|
|
| |
In 35bea01b xxhash.c was removed. Remove the extra-source-files
stanza referring to it.
|
|
|
|
|
|
| |
* Move `PRINTF` macro from `Stats.h` to `Stats.c` as it's only needed in
the latter.
* Undefine `PRINTF` at the end of `Messages.h` to avoid leaking it.
|
|
|
|
| |
Fixes #20992.
|
| |
|
|
|
|
| |
Not entirely convinced that this is worth doing.
|
|
|
|
|
|
|
| |
This adds logic, enabled in the `-debug` RTS for checking the internal
consistency of the m32 allocator. This area has always made me a bit
nervous so this should help me sleep better at night in exchange for
very little overhead.
|
|
|
|
|
| |
Renamed to mprotectForLinker and allowed setting of arbitrary protection
modes.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously m32 would assume that the program image was located near the
start of the address space and therefore assume that it wanted pages
in the bottom 4GB of address space. Instead we now check whether they
are within 4GB of whereever the program is loaded.
This is necessary on Windows, which now tends to place the image in high
memory. The eventual goal is to use m32 to allocate memory for linker
sections on Windows.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, directly calling a function that pattern matches on an
unlifted data type which has at least two constructors in GHCi resulted
in a segfault.
This happened due to unaccounted return frame info table pointer. The fix is
to pop the above mentioned frame info table pointer when unlifted things are
returned. See Note [Popping return frame for unlifted things]
authors: bgamari, nineonine
|
| |
|
| |
|
| |
|
|
|
|
| |
A few %s occurrences have snuck in over the past months.
|
|
|
|
|
|
| |
Previously `closurePtrs#` would allocate an aray of the size of the
closure being decoded on the C stack. This was ripe for overflowing the
C stack overflow. This resulted in `T12492` failing on Windows.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Previously we would build the eventTypes array at runtime during RTS
initialization. However, this is completely unnecessary; it is
completely static data.
|