| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An additional stat is tracked per gc: par_balanced_copied This is the
the number of bytes copied by each gc thread under the balanced lmit,
which is simply (copied_bytes / num_gc_threads). The stat is added to
all the appropriate GC structures, so is visible in the eventlog and in
GHC.Stats.
A note is added explaining how work balance is computed.
Remove some end of line whitespace
Test Plan:
./validate
experiment with the program attached to the ticket
examine code changes carefully
Reviewers: simonmar, austin, hvr, bgamari, erikd
Reviewed By: simonmar
Subscribers: Phyx, rwbarton, thomie
GHC Trac Issues: #13830
Differential Revision: https://phabricator.haskell.org/D3658
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently eventlog data is always written to a file `progname.eventlog`.
This patch introduces the `flushEventLog` field in `RtsConfig` which
allows to customize the writing of eventlog data.
One possible scenario is the ongoing live-profile-monitor effort by
@NCrashed which slurps all eventlog data through `fluchEventLog`.
`flushEventLog` takes a buffer with eventlog data and its size and
returns `false` (0) in case eventlog data could not be procesed.
Reviewers: simonmar, austin, erikd, bgamari
Reviewed By: simonmar, bgamari
Subscribers: qnikst, thomie, NCrashed
Differential Revision: https://phabricator.haskell.org/D2934
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Visible API changes:
* The C struct `GCDetails` gives the stats about a single GC. This is
passed to the `gcDone()` callback if one is set via the
RtsConfig. (previously we just passed a collection of values, so this
is more extensible, at the expense of breaking the existing API)
* `RTSStats` gives cumulative stats since the start of the program,
and includes the `GCDetails` for the most recent GC. This struct
can be obtained via `getRTSStats()` (the old `getGCStats()` has been
removed, and `getGCStatsEnabled()` has been renamed to
`getRTSStatsEnabled()`)
Improvements:
* The per-GC stats and cumulative stats are now cleanly separated.
* Inside the RTS we have a top-level `RTSStats` struct to keep all our
stats in, previously this was just a collection of strangely-named
variables. This struct is mostly just copied in `getRTSStats()`, so
the implementation of that function is a lot shorter.
* Types are more consistent. We use a uint64_t byte count for all
memory values, and Time for all time values.
* Names are more consistent. We use a suffix `_bytes` for all byte
counts and `_ns` for all time values.
* We now collect information about the amount of memory in large
objects and compact objects in `GCDetails`. (the latter was the reason
I started doing this patch but it seems to have ballooned a bit!)
* I fixed a bug in the calculation of the elapsed MUT time, and added
an ASSERT to stop the calculations going wrong in the future.
For now I kept the Haskell API in `GHC.Stats` the same, by
impedence-matching with the new API. We could either break that API
and make it match the C API more closely, or we could add a new API
and deprecate the old one. Opinions welcome.
This stuff is very easy to get wrong, and it's hard to test. Reviews
welcome!
Test Plan:
manual testing
validate
Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rts is forked it doesn't update toplevel handler, so UserInterrupt
exception is sent to Thread1 that doesn't exist in forked process.
We install toplevel handler when fork so signal will be delivered to the
new main thread.
Fixes #12903
Reviewers: simonmar, austin, erikd, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2770
GHC Trac Issues: #12903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`rts_setInCallCapability` sets the thread affinity as well as pins the
numa node. We should also have the ability to set the numa node without
setting the capability affinity. `rts_pinNumaNodeForCapability` function
is added and exported via `RtsAPI.h`.
Previous callers of `rts_setInCallCapability` should now also call
`rts_pinNumaNodeForCapability` to get the same effect as before.
Test Plan:
./validate
Reviewers: austin, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie, niteria
Differential Revision: https://phabricator.haskell.org/D2637
GHC Trac Issues: #12764
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate
Reviewers: hvr, austin, bgamari, simonmar
Reviewed By: bgamari, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2193
|
|
|
|
|
|
|
|
| |
This allows an OS thread to specify which capability it should run on
when it makes a call into Haskell. It is intended for a fairly
specialised use case, when the client wants to have tighter control over
the mapping between OS threads and Capabilities - perhaps 1:1
correspondence, for example.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a situaion where we have some statically-linked code and we want to
load and unload a series of objects, we need the CAFs in the
statically-linked code to be retained indefinitely, while the CAFs in
the dynamically-linked code should be GC'd as normal, so that we can
detect when the code is unloadable. This was wrong before - we GC'd
CAFs in the static code, leading to a crash in the rare case where we
use a CAF, GC it, and then load a new object that uses it again.
I also did some tidy up: RtsConfig now has a field keep_cafs to
indicate whether we want CAFs to be retained in static code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depends on D767
Setting this flag prevents RTS from giving RTS suggestions like "Use
`+RTS -Ksize -RTS' to increase it."
According to the comment @rwbarton made in #9579, sometimes "+RTS"
suggestions don't make sense (e.g. when the program is precompiled and
installed through package managers), we can encourage people to
distribute binaries with either "-no-rtsopts-suggestions" or "-rtsopts".
Reviewed By: erikd, austin
Differential Revision: https://phabricator.haskell.org/D809
GHC Trac Issues: #9579
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Hooks rely on static linking semantics, and are broken by -Bsymbolic
which we need when using dynamic linking.
Test Plan: Built it
Reviewers: austin, hvr, tibbe
Differential Revision: https://phabricator.haskell.org/D8
|
|
|
|
|
|
|
| |
This reverts commit 35672072b4091d6f0031417bc160c568f22d0469.
Conflicts:
compiler/main/DriverPipeline.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In preparation for indirecting all references to closures,
we rename _closure to _static_closure to ensure any old code
will get an undefined symbol error. In order to reference
a closure foobar_closure (which is now undefined), you should instead
use STATIC_CLOSURE(foobar). For convenience, a number of these
old identifiers are macro'd.
Across C-- and C (Windows and otherwise), there were differing
conventions on whether or not foobar_closure or &foobar_closure
was the address of the closure. Now, all foobar_closure references
are addresses, and no & is necessary.
CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
Part of remove HEAP_ALLOCED patch set (#8199)
Depends on D265
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D267
GHC Trac Issues: #8199
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
See documentation for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the RTS part of a patch to base's topHandler to handle exiting
by a signal.
The intended behaviour is that on Unix, throwing ExitFailure (-sig)
results in the process terminating with that signal. Previously
shutdownHaskellAndSignal was only used for exiting with SIGINT due to
the UserInterrupt exception.
Improve shutdownHaskellAndSignal to do the signal part more carefully.
In particular, it (should) now reliably terminates the process one way
or another. Previusly if the signal was blocked, ignored or handled then
shutdownHaskellAndSignal would actually return!
Also, the topHandler code has two paths a careful shutdown and a "fast
exit" where it does not give finalisers a chance to run. We want to
support that mode also when we want to exit by signal. So rather than
the base code directly calling stg_exit as it did before, we have a
fastExit bool paramater for both shutdownHaskellAnd{Exit,Signal}.
|
| |
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| |
|
|
|
|
|
| |
We may need to do this differently once we get as far as building the
RTS in the dyn ways.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider this experimental for the time being. There are a lot of
things that could go wrong, but I've verified that at least it works
on the test cases we have.
I also did some API cleanups while I was here. Previously we had:
Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
but this API is particularly error-prone: if you forget to discard the
Capability * you passed in and use the return value instead, then
you're in for subtle bugs with +RTS -N later on. So I changed all
these functions to this form:
void rts_eval (/* inout */ Capability **cap,
/* in */ HaskellObj p,
/* out */ HaskellObj *ret)
It's much harder to use this version incorrectly, because you have to
pass the Capability in by reference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than have main() be statically compiled as part of the RTS, we
now generate it into the tiny C file that we compile when linking a
binary.
The main motivation is that we want to pass the settings for the
-rtsotps and -with-rtsopts flags into the RTS, rather than relying on
fragile linking semantics to override the defaults, which don't work
with DLLs on Windows (#5373). In order to do this, we need to extend
the API for initialising the RTS, so now we have:
void hs_init_ghc (int *argc, char **argv[], // program arguments
RtsConfig rts_config); // RTS configuration
hs_init_ghc() can optionally be used instead of hs_init(), and allows
passing in configuration options for the RTS. RtsConfig is a struct,
which currently has two fields:
typedef struct {
RtsOptsEnabledEnum rts_opts_enabled;
const char *rts_opts;
} RtsConfig;
but might have more in the future. There is a default value for the
struct, defaultRtsConfig, the idea being that you start with this and
override individual fields as necessary.
In fact, main() was in a separate static library, libHSrtsmain.a.
That's now gone.
|
|
|
|
| |
(#5402)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
I've updated the wiki page about the RTS headers
http://hackage.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes
to reflect the new layout and explain some of the rationale. All the
header files now point to this page.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need this, or something equivalent, to be able to implement
stgAllocForGMP outside of the rts. That's because we want to use
allocateLocal which allocates from the given capability without
having to take any locks. In the gmp primops we're basically in
an unsafe foreign call, that is a context where we hold a current
capability. So it's safe for us to use allocateLocal. We just
need a way to get the current capability. The method to get the
current capability varies depends on whether we're using the
threaded rts or not. When stgAllocForGMP is built inside the rts
that's ok because we can do it conditionally on THREADED_RTS.
Outside the rts we need a single api we can call without knowing
if we're talking to a threaded rts or not, hence this addition.
|
|
|
|
|
|
| |
Really we should be raising an exception in this case, but that's
tricky (see comments). At least now we shut down the runtime
correctly rather than just exiting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
2301: Control-C now causes the new exception (AsyncException
UserInterrupt) to be raised in the main thread. The signal handler
is set up by GHC.TopHandler.runMainIO, and can be overriden in the
usual way by installing a new signal handler. The advantage is that
now all programs will get a chance to clean up on ^C.
When UserInterrupt is caught by the topmost handler, we now exit the
program via kill(getpid(),SIGINT), which tells the parent process that
we exited as a result of ^C, so the parent can take appropriate action
(it might want to exit too, for example).
One subtlety is that we have to use a weak reference to the ThreadId
for the main thread, so that the signal handler doesn't prevent the
main thread from being subject to deadlock detection.
1619: we now ignore SIGPIPE by default. Although POSIX says that a
SIGPIPE should terminate the process by default, I wonder if this
decision was made because many C applications failed to check the exit
code from write(). In Haskell a failed write due to a closed pipe
will generate an exception anyway, so the main difference is that we
now get a useful error message instead of silent program termination.
See #1619 for more discussion.
|
| |
|
|
|
|
| |
See #753
|
| |
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|