| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Just a tiny cleanup inspired by the following comment:
https://gitlab.haskell.org/ghc/ghc/-/issues/19437#note_334271
I was just getting familiar with rts code base so I
thought might as well do this.
|
|
|
|
|
|
| |
is used outside of the rts so we do this rather than just fish it out of
the repo in ad-hoc way, in order to make packages in this repo more
self-contained.
|
| |
|
|
|
|
|
|
|
| |
Since commit be5d74caab the payload of a closure of Int<N> or Word<N>
is not extended anymore to the machines word size. Instead, only the
first N bits of a payload are written. This patch ensures that only
those bits are read/written independent of the machines endianness.
|
|
|
|
|
|
|
|
|
| |
- All rts_mk functions return the tagged closure address
- rts_mkChar/rts_mkInt avoid allocation when the argument is within the
CHARLIKE/INTLIKE range
- rts_getBool avoids a memory load by checking the closure tag
- In rts_mkInt64/rts_mkWord64, allocated closure payload size is either
1 or 2 words depending on target architecture word size
|
|
|
|
| |
This check is merely a service to the user; no reason to synchronize.
|
|
|
|
|
|
|
|
| |
These are used to find the current roots of the garbage collector.
Co-authored-by: Sven Tennie's avatarSven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering's avatarMatthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: default avatarBen Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
|
|
| |
The `rts_pause` and `rts_resume` functions have been added to `RtsAPI.h` and
allow an external process to completely pause and resume the RTS.
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
|
| |
This diff makes sure that incall threads, when using `rts_setInCallCapability`, will be created as locked.
If the thread is not locked, the thread might end up being scheduled to a different capability.
While this is mentioned in the docs for `rts_setInCallCapability,`, it makes the method significantly less useful as there is no guarantees on the capability being used.
This commit also adds a test to make sure things stay on the correct capability.
|
|
|
|
| |
If hs_try_putmvar was called through an unsafe import, it would lose track of the running cap causing a deadlock
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Long ago, the stable name table and stable pointer tables were one.
Now, they are separate, and have significantly different
implementations. I believe the time has come to finish the split
that began in #7674.
* Divide `rts/Stable` into `rts/StableName` and `rts/StablePtr`.
* Give each table its own mutex.
* Add FFI functions `hs_lock_stable_ptr_table` and
`hs_unlock_stable_ptr_table` and document them.
These are intended to replace the previously undocumented
`hs_lock_stable_tables` and `hs_lock_stable_tables`,
which are now documented as deprecated synonyms.
* Make `eqStableName#` use pointer equality instead of unnecessarily
comparing stable name table indices.
Reviewers: simonmar, bgamari, erikd
Reviewed By: bgamari
Subscribers: rwbarton, carter
GHC Trac Issues: #15555
Differential Revision: https://phabricator.haskell.org/D5084
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An info table with an SRT normally looks like this:
StgWord64 srt_offset
StgClosureInfo layout
StgWord32 layout
StgWord32 has_srt
But we only need 32 bits for srt_offset on x86_64, because the small
memory model requires that code segments are at most 2GB. So we can
optimise this to
StgClosureInfo layout
StgWord32 layout
StgWord32 srt_offset
saving a word. We can tell whether the info table has an SRT or not,
because zero is not a valid srt_offset, so zero still indicates that
there's no SRT.
Test Plan:
* validate
* For results, see D4632.
Reviewers: bgamari, niteria, osa1, erikd
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Previously we would hvae a single big table of pointers per module,
with a set of bitmaps to reference entries within it. The new
representation is identical to a static constructor, which is much
simpler for the GC to traverse, and we get to remove the complicated
bitmap-traversal code from the GC.
- Rewrite all the code to generate SRTs in CmmBuildInfoTables, and
document it much better (see Note [SRTs]). This has been something
I've wanted to do since we moved to the new code generator, I
finally had the opportunity to finish it while on a transatlantic
flight recently :)
There are a series of 4 diffs:
1. D4632 (this one), which does the bulk of the changes
2. D4633 which adds support for smaller `CmmLabelDiffOff` constants
3. D4634 which takes advantage of D4632 and D4633 to save a word in
info tables that have an SRT on x86_64. This is where most of the
binary size improvement comes from.
4. D4637 which makes a further optimisation to merge some SRTs with
static FUN closures. This adds some complexity and the benefits
are fairly modest, so it's not clear yet whether we should do this.
Results (after (3), on x86_64)
- GHC itself (staticaly linked) is 5.2% smaller
- -1.7% binary sizes in nofib, -2.9% module sizes. Full nofib results: P176
- I measured the overhead of traversing all the static objects in a
major GC in GHC itself by doing `replicateM_ 1000 performGC` as the
first thing in `Main.main`. The new version was 5-10% faster, but
the results did vary quite a bit.
- I'm not sure if there's a compile-time difference, the results are
too unreliable.
Test Plan: validate
Reviewers: bgamari, michalt, niteria, simonpj, erikd, osa1
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4632
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rts is forked it doesn't update toplevel handler, so UserInterrupt
exception is sent to Thread1 that doesn't exist in forked process.
We install toplevel handler when fork so signal will be delivered to the
new main thread.
Fixes #12903
Reviewers: simonmar, austin, erikd, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2770
GHC Trac Issues: #12903
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
can be called from C/C++.
It's useful for callback-based C/C++ APIs: the idea is that the callback
invokes hs_try_putmvar(), and the Haskell code waits for the callback to
run by blocking in takeMVar.
The callback doesn't block - this is often a requirement of
callback-based APIs. The callback wakes up the Haskell thread with
minimal overhead and no unnecessary context-switches.
There are a couple of benchmarks in
testsuite/tests/concurrent/should_run. Some example results comparing
hs_try_putmvar() with using a standard foreign export:
./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s
./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s
hs_try_putmvar() is 4x faster for this workload (see the source for
hs_try_putmvar003.hs for details of the workload).
An alternative solution is to use the IO Manager for this. We've tried
it, but there are problems with that approach:
* Need to create a new file descriptor for each callback
* The IO Manger thread(s) become a bottleneck
* More potential for things to go wrong, e.g. throwing an exception in
an IO Manager callback kills the IO Manager thread.
Test Plan: validate; new unit tests
Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2501
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to more const-correctness fixes this patch fixes an
infelicity of the previous const-correctness patch (995cf0f356) which
left `UNTAG_CLOSURE` taking a `const StgClosure` pointer parameter
but returning a non-const pointer. Here we restore the original type
signature of `UNTAG_CLOSURE` and add a new function
`UNTAG_CONST_CLOSURE` which takes and returns a const `StgClosure`
pointer and uses that wherever possible.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: Phyx, hsyl20, bgamari, austin, simonmar, trofi
Reviewed By: simonmar, trofi
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2231
|
|
|
|
|
|
|
|
|
|
|
| |
On AIX, C system headers can redirect the token `stat` via
#define stat stat64
to provide large-file support. Simply avoiding the use of `stat` as an
identifier eschews macro-replacement.
Differential Revision: https://phabricator.haskell.org/D1566
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yieldCapability() was not prepared to be called by a Task that is not
either a worker or a bound Task. This could happen if we ended up in
yieldCapability via this call stack:
performGC()
scheduleDoGC()
requestSync()
yieldCapability()
and there were a few other ways this could happen via requestSync.
The fix is to handle this case in yieldCapability(): when the Task is
not a worker or a bound Task, we put it on the returning_workers
queue, where it will be woken up again.
Summary of changes:
* `yieldCapability`: factored out subroutine waitForWorkerCapability`
* `waitForReturnCapability` renamed to `waitForCapability`, and
factored out subroutine `waitForReturnCapability`
* `releaseCapabilityAndQueue` worker renamed to `enqueueWorker`, does
not take a lock and no longer tests if `!isBoundTask()`
* `yieldCapability` adjusted for refactorings, only change in behavior
is when it is not a worker or bound task.
Test Plan:
* new test concurrent/should_run/performGC
* validate
Reviewers: niteria, austin, ezyang, bgamari
Subscribers: thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D997
GHC Trac Issues: #10545
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
See documentation for details.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
| |
This means that when the process is shutting down, if we have calls to
foreign exports in progress, they get forcibly terminated as before,
but now they only shut down the calling thread rather than the whole
process (with -threaded).
This came up in a discussion started by Akio Takano on ghc-users.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com>
Use the new task tracing functions traceTaskCreate/Migrate/Delete.
There are two key places. One is for worker tasks which have a
relatively simple life cycle. Worker tasks are created and deleted by
the RTS. The other case is bound tasks which are either created by the
RTS, or appear as foreign C threads making calls into the RTS. For bound
threads we do the tracing in rts_lock/unlock, which actually covers both
threads coming in from outside, and also bound threads made by the RTS.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider this experimental for the time being. There are a lot of
things that could go wrong, but I've verified that at least it works
on the test cases we have.
I also did some API cleanups while I was here. Previously we had:
Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
but this API is particularly error-prone: if you forget to discard the
Capability * you passed in and use the return value instead, then
you're in for subtle bugs with +RTS -N later on. So I changed all
these functions to this form:
void rts_eval (/* inout */ Capability **cap,
/* in */ HaskellObj p,
/* out */ HaskellObj *ret)
It's much harder to use this version incorrectly, because you have to
pass the Capability in by reference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes two changes to the way stacks are managed:
1. The stack is now stored in a separate object from the TSO.
This means that it is easier to replace the stack object for a thread
when the stack overflows or underflows; we don't have to leave behind
the old TSO as an indirection any more. Consequently, we can remove
ThreadRelocated and deRefTSO(), which were a pain.
This is obviously the right thing, but the last time I tried to do it
it made performance worse. This time I seem to have cracked it.
2. Stacks are now represented as a chain of chunks, rather than
a single monolithic object.
The big advantage here is that individual chunks are marked clean or
dirty according to whether they contain pointers to the young
generation, and the GC can avoid traversing clean stack chunks during
a young-generation collection. This means that programs with deep
stacks will see a big saving in GC overhead when using the default GC
settings.
A secondary advantage is that there is much less copying involved as
the stack grows. Programs that quickly grow a deep stack will see big
improvements.
In some ways the implementation is simpler, as nothing special needs
to be done to reclaim stack as the stack shrinks (the GC just recovers
the dead stack chunks). On the other hand, we have to manage stack
underflow between chunks, so there's a new stack frame
(UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
The total amount of code is probably about the same as before.
There are new RTS flags:
-ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m
-kc<size> Sets the stack chunk size (default 32k)
-kb<size> Sets the stack chunk buffer size (default 1k)
-ki was previously called just -k, and the old name is still accepted
for backwards compatibility. These new options are documented.
|
| |
|
|
|
|
|
| |
Broken by "Split part of the Task struct into a separate struct
InCall".
|
|
|
|
|
| |
Fixes a bug reported by Lennart Augustsson, whereby we could get an
incorrect error from the RTS about re-entry from a finalizer,
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of refactoring to remove some of the GC's global
state, as we move towards CPU-local GC.
- allocateLocal() now allocates large objects into the local
nursery, rather than taking a global lock and allocating
then in gen 0 step 0.
- allocatePinned() was still allocating from global storage and
taking a lock each time, now it uses local storage.
(mallocForeignPtrBytes should be faster with -threaded).
- We had a gen 0 step 0, distinct from the nurseries, which are
stored in a separate nurseries[] array. This is slightly strange.
I removed the g0s0 global that pointed to gen 0 step 0, and
removed all uses of it. I think now we don't use gen 0 step 0 at
all, except possibly when there is only one generation. Possibly
more tidying up is needed here.
- I removed the global allocate() function, and renamed
allocateLocal() to allocate().
- the alloc_blocks global is gone. MAYBE_GC() and
doYouWantToGC() now check the local nursery only.
|
|
|
|
|
| |
Otherwise, finalizer callbacks cause a deadlock in the threaded RTS
(including GHCi)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
| |
|
| |
|
| |
|
|
|
|
| |
alignment constraints
|
| |
|
| |
|
| |
|
|
|
|
| |
closure payloads.
|
|
|
|
|
|
|
| |
There were races between workerTaskStop() and freeTaskManager(): we
need to be sure that all Tasks have exited properly before we start
tearing things down. This isn't completely straighforward, see
comments for details.
|
|
|
|
|
|
| |
My guess is that this is left over from when we represented Int8 and
friends as zero-extended rather than sign-extended. It's amazing it hasn't
been noticed earlier.
|