summaryrefslogtreecommitdiff
path: root/includes/rts/Flags.h
Commit message (Collapse)AuthorAgeFilesLines
* Prefer #if defined to #ifdefBen Gamari2017-04-281-1/+1
| | | | Our new CPP linter enforces this.
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-231-4/+1
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* rts: Fix buildBen Gamari2017-02-281-1/+1
| | | | | I evidently neglected to consider that validate doesn't build profiled ways. Arg.
* rts: Allow profile output path to be specified on RTS command lineBen Gamari2017-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | This introduces a RTS option, -po, which allows the user to override the stem used to form the output file names of the heap profile and cost center summary. It's a bit unclear to me whether this is really the interface we want. Alternatively we could just allow the user to specify the `.hp` and `.prof` file names separately. This would arguably be a bit more straightforward and would allow the user to name JSON output with an appropriate `.json` suffix if they so desired. However, this would come at the cost of taking more of the option space, which is a somewhat precious commodity. Test Plan: Validate, try using `-po` RTS option Reviewers: simonmar, austin, erikd Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3182
* JSON profiler reportsBen Gamari2017-02-231-2/+2
| | | | | | | | | | | | | | | | | | | This introduces a JSON output format for cost-centre profiler reports. It's not clear whether this is really something we want to introduce given that we may also move to a more Haskell-driven output pipeline in the future, but I nevertheless found this helpful, so I thought I would put it up. Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS -pj` Reviewers: austin, erikd, simonmar Reviewed By: simonmar Subscribers: duncan, maoe, thomie, simonmar Differential Revision: https://phabricator.haskell.org/D3132
* Throw an exception on heap overflowDemi Obenour2017-01-101-0/+10
| | | | | | | | | | | | | | | | | This changes heap overflow to throw a HeapOverflow exception instead of killing the process. Test Plan: GHC CI Reviewers: simonmar, austin, hvr, erikd, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2790 GHC Trac Issues: #1791
* Overhaul of Compact Regions (#12455)Simon Marlow2016-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
* Use C99's boolBen Gamari2016-11-291-38/+38
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* NUMA cleanupsSimon Marlow2016-06-171-3/+1
| | | | | - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c - Add a test to tests/rts
* Rts flags cleanupSimon Marlow2016-06-101-7/+0
| | | | | | | | * Remove unused/old flags from the structs * Update old comments * Add missing flags to GHC.RTS * Simplify GHC.RTS, remove C code and use hsc2hs instead * Make ParFlags unconditional, and add support to GHC.RTS
* NUMA supportSimon Marlow2016-06-101-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-29/+29
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Add +RTS -AL<size>Simon Marlow2016-05-041-0/+1
| | | | | | | | | | | | | | | +RTS -AL<size> controls the total size of large objects that can be allocated before a GC is triggered. Previously this was always just the value of -A, and the limit mainly existed to prevent runaway allocation in pathalogical programs that allocate a lot of large objects. However, since the limit is shared between all cores, on a large multicore the default becomes more restrictive, and can end up triggering GC well before it would normally have been. Arguably a better default would be A*N, but this is probably excessive. Adding a flag lets you choose, and I've left the default as it was. See docs for usage.
* Allow limiting the number of GC threads (+RTS -qn<n>)Simon Marlow2016-05-041-0/+4
| | | | | | | | | | | | | | | | | | This allows the GC to use fewer threads than the number of capabilities. At each GC, we choose some of the capabilities to be "idle", which means that the thread running on that capability (if any) will sleep for the duration of the GC, and the other threads will do its work. We choose capabilities that are already idle (if any) to be the idle capabilities. The idea is that this helps in the following situation: * We want to use a large -N value so as to make use of hyperthreaded cores * We use a large heap size, so GC is infrequent * But we don't want to use all -N threads in the GC, because that thrashes the memory too much. See docs for usage.
* RtsFlags: Fix const warningBen Gamari2015-11-211-7/+7
| | | | | | | | | | Reviewers: austin Reviewed By: austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1509
* rts: Kill PAPI supportBen Gamari2015-11-181-26/+0
| | | | | | | | | | | | | | | This hasn't been used for a very long time and will soon be superceded by perf_events support. Test Plan: validate Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: thomie, erikd Differential Revision: https://phabricator.haskell.org/D1493
* Add +RTS -n<size>: divide the nursery into chunksSimon Marlow2014-11-251-0/+1
| | | | See the documentation for details.
* accessors to RTS flag values -- #5364Ömer Sinan Ağacan2014-11-241-30/+48
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Implementation of #5364. Mostly boilerplate, reading FILE fields is missing. Test Plan: - Get some feedback on missing parts. (FILE fields) - Get some feedback on module name. - Get some feedback on other things. - Get code reviewed. - Make sure test suite is passing. (I haven't run it myself) Reviewers: hvr, austin, ezyang Reviewed By: ezyang Subscribers: ekmett, simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D306 GHC Trac Issues: #5364 Conflicts: includes/rts/Flags.h
* Per-thread allocation counters and limitsSimon Marlow2014-11-121-0/+8
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* Revert "Per-thread allocation counters and limits"Simon Marlow2014-05-041-8/+0
| | | | | | | | Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
* Per-thread allocation counters and limitsSimon Marlow2014-05-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
* Globally replace "hackage.haskell.org" with "ghc.haskell.org"Simon Marlow2013-10-011-1/+1
|
* Another overhaul of the recent_activity / idle GC handling (#5991)Simon Marlow2012-09-241-0/+1
| | | | | | | | | | | | | | | Improvements: - we now turn off the timer signal in the non-threaded RTS after idleGCDelay. This should make the xmonad users on #5991 happy. - we now turn off the timer signal after idleGCDelay even if the idle GC is disabled with +RTS -I0. - we now do *not* turn off the timer when profiling. - more comments to explain the meaning of the various ACTIVITY_* values
* use idiomatic (GHC) typesGabor Greif2012-02-271-6/+6
|
* New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GCSimon Marlow2011-12-131-0/+8
| | | | | | | | | | | | | | | | | This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
* Time handling overhaulSimon Marlow2011-11-251-6/+15
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* Add an RTS eventlog tracing class for user messagesDuncan Coutts2011-10-271-0/+1
| | | | Enables people to turn them on/off. Defaults to on.
* Add new fully-accurate per-spark trace/eventlog eventsDuncan Coutts2011-07-181-1/+2
| | | | | | | | | | | | | | Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events covering all stages of the spark lifcycle: create, dud, overflow, run, steal, fizzle, gc The sampled spark events are still available. There are now two event classes for sparks, the sampled and the fully accurate. They can be enabled/disabled independently. By default +RTS -l includes the sampled but not full detail spark events. Use +RTS -lf-p to enable the detailed 'f' and disable the sampled 'p' spark. Includes work by Mikolaj <mikolaj.konarski@gmail.com>
* Move GC tracing into a separate trace classDuncan Coutts2011-07-181-0/+1
| | | | | | | Previously GC was included in the scheduler trace class. It can be enabled specifically with +RTS -vg or -lg, though note that both -v and -l on their own now default to a sensible set of trace classes, currently: scheduler, gc and sparks.
* add a new trace class for spark eventsDuncan Coutts2011-07-181-1/+1
|
* prog_argv and rts_argv now contain *copies* of the args passed toSimon Marlow2011-05-251-2/+2
| | | | | | setupRtsFlags(), rather than sharing the memory. Previously if the caller of hs_init() passed in dynamically-allocated memory and then freed it, random crashes could happen later (#5177).
* Cleanup sweep and fix a bug in RTS flag processing.Simon Marlow2011-04-121-7/+0
| | | | | | | | | | | | | This code has accumulated a great deal of cruft over the years, this pass cleans up a lot of the surrounding cruft but leaves the actual argument processing alone - so there's still more that could be done. Bug fixed: - ghc_rts_opts should not be subject to the --rtsopts setting. If the programmer explicitly declares options with ghc_rts_opts, they shouldn't also have to accept command-line RTS options to make them work.
* +RTS -qw hasn't done anything since 7.0.1; remove the implementation & docsSimon Marlow2011-02-011-2/+1
| | | | It is still (silently) accepted for backwards compatibility.
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* Add support for collecting PAPI native eventsdmp@rice.edu2010-06-221-0/+4
| | | | | | | | | | | | | | | | | | | | | This patch extends the PAPI support in the RTS to allow collection of native events. PAPI can collect data for native events that are exposed by the hardware beyond the PAPI present events. The native events supported on your hardware can found by using the papi_native_avail tool. The RTS already allows users to specify PAPI preset events from the command line. This patch extends that support to allow users to specify native events. The changes needed are: 1) New option (#) for the RTS PAPI flag for native events. For example, to collect the native event 0x40000000, use ./a.out +RTS -a#0x40000000 -sstderr 2) Update the PAPI_FLAGS struct to store whether the user specified event is a papi preset or a native event 3) Update init_countable_events function to add the native events after parsing the event code and decoding the name using PAPI_event_code_to_name
* Implement a new heap-tuning option: -HSimon Marlow2009-11-301-0/+1
| | | | | | | | | | | | | | | | | -H alone causes the RTS to use a larger nursery, but without exceeding the amount of memory that the application is already using. It trades off GC time against locality: the default setting is to use a fixed-size 512k nursery, but this is sometimes worse than using a very large nursery despite the worse locality. Not all programs get faster, but some programs that use large heaps do much better with -H. e.g. this helps a lot with #3061 (binary-trees), though not as much as specifying -H<large>. Typically using -H<large> is better than plain -H, because the runtime doesn't know ahead of time how much memory you want to use. Should -H be on by default? I'm not sure, it makes some programs go slower, but others go faster.
* Add a way to generate tracing events programmaticallySimon Marlow2009-09-251-1/+5
| | | | | | | | | | | | | | added: primop TraceEventOp "traceEvent#" GenPrimOp Addr# -> State# s -> State# s { Emits an event via the RTS tracing framework. The contents of the event is the zero-terminated byte string passed as the first argument. The event will be emitted either to the .eventlog file, or to stderr, depending on the runtime RTS flags. } and added the required RTS functionality to support it. Also a bit of refactoring in the RTS tracing code.
* Improve the default parallel GC settings, and sanitise the flags (#3340)Simon Marlow2009-09-151-2/+7
| | | | | | | | | | | | | | | Flags (from +RTS -?): -qg[<n>] Use parallel GC only for generations >= <n> (default: 0, -qg alone turns off parallel GC) -qb[<n>] Use load-balancing in the parallel GC only for generations >= <n> (default: 1, -qb alone turns off load-balancing) these are good defaults for most parallel programs. Single-threaded programs that want to make use of parallel GC will probably want +RTS -qg1 (this is documented). I've also updated the docs.
* Unify event logging and debug tracing.Simon Marlow2009-08-291-9/+7
| | | | | | | | | | | | | | | | | | | - tracing facilities are now enabled with -DTRACING, and -DDEBUG additionally enables debug-tracing. -DEVENTLOG has been removed. - -debug now implies -eventlog - events can be printed to stderr instead of being sent to the binary .eventlog file by adding +RTS -v (which is implied by the +RTS -Dx options). - -Dx debug messages can be sent to the binary .eventlog file by adding +RTS -l. This should help debugging by reducing the impact of debug tracing on execution time. - Various debug messages that duplicated the information in events have been removed.
* Declare RTS-private prototypes with __attribute__((visibility("hidden")))Simon Marlow2009-08-051-3/+3
| | | | | | | | | | This has no effect with static libraries, but when the RTS is in a shared library it does two things: - it prevents the function from being exposed by the shared library - internal calls to the function can use the faster non-PLT calls, because the function cannot be overriden at link time.
* Tidy up file headers and copyrights; point to the wiki for docsSimon Marlow2009-08-251-1/+6
| | | | | | | I've updated the wiki page about the RTS headers http://hackage.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes to reflect the new layout and explain some of the rationale. All the header files now point to this page.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-0/+239
The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.