summaryrefslogtreecommitdiff
path: root/docs/users_guide/runtime_control.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/users_guide/runtime_control.rst')
-rw-r--r--docs/users_guide/runtime_control.rst1070
1 files changed, 1070 insertions, 0 deletions
diff --git a/docs/users_guide/runtime_control.rst b/docs/users_guide/runtime_control.rst
new file mode 100644
index 0000000000..4a05291612
--- /dev/null
+++ b/docs/users_guide/runtime_control.rst
@@ -0,0 +1,1070 @@
+.. _runtime-control:
+
+Running a compiled program
+==========================
+
+.. index::
+ single: runtime control of Haskell programs
+ single: running, compiled program
+ single: RTS options
+
+To make an executable program, the GHC system compiles your code and
+then links it with a non-trivial runtime system (RTS), which handles
+storage management, thread scheduling, profiling, and so on.
+
+The RTS has a lot of options to control its behaviour. For example, you
+can change the context-switch interval, the default size of the heap,
+and enable heap profiling. These options can be passed to the runtime
+system in a variety of different ways; the next section
+(:ref:`setting-rts-options`) describes the various methods, and the
+following sections describe the RTS options themselves.
+
+.. _setting-rts-options:
+
+Setting RTS options
+-------------------
+
+.. index::
+ single: RTS options, setting
+
+There are four ways to set RTS options:
+
+- on the command line between ``+RTS ... -RTS``, when running the
+ program (:ref:`rts-opts-cmdline`)
+
+- at compile-time, using ``--with-rtsopts``
+ (:ref:`rts-opts-compile-time`)
+
+- with the environment variable ``GHCRTS``
+ (:ref:`rts-options-environment`)
+
+- by overriding "hooks" in the runtime system (:ref:`rts-hooks`)
+
+.. _rts-opts-cmdline:
+
+Setting RTS options on the command line
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: +RTS
+ single: -RTS
+ single: --RTS
+
+If you set the ``-rtsopts`` flag appropriately when linking (see
+:ref:`options-linker`), you can give RTS options on the command line
+when running your program.
+
+When your Haskell program starts up, the RTS extracts command-line
+arguments bracketed between ``+RTS`` and ``-RTS`` as its own. For example:
+
+::
+
+ $ ghc prog.hs -rtsopts
+ [1 of 1] Compiling Main ( prog.hs, prog.o )
+ Linking prog ...
+ $ ./prog -f +RTS -H32m -S -RTS -h foo bar
+
+The RTS will snaffle ``-H32m -S`` for itself, and the remaining
+arguments ``-f -h foo bar`` will be available to your program if/when it
+calls ``System.Environment.getArgs``.
+
+No ``-RTS`` option is required if the runtime-system options extend to
+the end of the command line, as in this example:
+
+::
+
+ % hls -ltr /usr/etc +RTS -A5m
+
+If you absolutely positively want all the rest of the options in a
+command line to go to the program (and not the RTS), use a
+``--RTS``.
+
+As always, for RTS options that take ⟨size⟩s: If the last character of
+⟨size⟩ is a K or k, multiply by 1000; if an M or m, by 1,000,000; if a G
+or G, by 1,000,000,000. (And any wraparound in the counters is *your*
+fault!)
+
+Giving a ``+RTS -?`` ``-?``\ RTS option option will print out the RTS
+options actually available in your program (which vary, depending on how
+you compiled).
+
+.. note::
+ Since GHC is itself compiled by GHC, you can change RTS options in
+ the compiler using the normal ``+RTS ... -RTS`` combination. For instance, to set
+ the maximum heap size for a compilation to 128M, you would add
+ ``+RTS -M128m -RTS`` to the command line.
+
+.. _rts-opts-compile-time:
+
+Setting RTS options at compile time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+GHC lets you change the default RTS options for a program at compile
+time, using the ``-with-rtsopts`` flag (:ref:`options-linker`). A common
+use for this is to give your program a default heap and/or stack size
+that is greater than the default. For example, to set ``-H128m -K64m``,
+link with ``-with-rtsopts="-H128m -K64m"``.
+
+.. _rts-options-environment:
+
+Setting RTS options with the ``GHCRTS`` environment variable
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: RTS options; from the environment
+ single: environment variable; for setting RTS options
+ single: GHCRTS environment variable
+
+If the ``-rtsopts`` flag is set to something other than ``none`` when
+linking, RTS options are also taken from the environment variable
+``GHCRTS``. For example, to set the maximum heap size to 2G
+for all GHC-compiled programs (using an ``sh``\-like shell):
+
+::
+
+ GHCRTS='-M2G'
+ export GHCRTS
+
+RTS options taken from the ``GHCRTS`` environment variable can be
+overridden by options given on the command line.
+
+.. tip::
+ Setting something like ``GHCRTS=-M2G`` in your environment is a
+ handy way to avoid Haskell programs growing beyond the real memory in
+ your machine, which is easy to do by accident and can cause the machine
+ to slow to a crawl until the OS decides to kill the process (and you
+ hope it kills the right one).
+
+.. _rts-hooks:
+
+"Hooks" to change RTS behaviour
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. index::
+ single: hooks; RTS
+ single: RTS hooks
+ single: RTS behaviour, changing
+
+GHC lets you exercise rudimentary control over certain RTS settings for
+any given program, by compiling in a "hook" that is called by the
+run-time system. The RTS contains stub definitions for these hooks, but
+by writing your own version and linking it on the GHC command line, you
+can override the defaults.
+
+Owing to the vagaries of DLL linking, these hooks don't work under
+Windows when the program is built dynamically.
+
+You can change the messages printed when the runtime system "blows up,"
+e.g., on stack overflow. The hooks for these are as follows:
+
+``void OutOfHeapHook (unsigned long, unsigned long)``
+ .. index::
+ single: OutOfHeapHook
+
+ The heap-overflow message.
+
+``void StackOverflowHook (long int)``
+ .. index::
+ single: StackOverflowHook
+
+ The stack-overflow message.
+
+``void MallocFailHook (long int)``
+ .. index::
+ single: MallocFailHook
+
+ The message printed if ``malloc`` fails.
+
+.. _rts-options-misc:
+
+Miscellaneous RTS options
+-------------------------
+
+``-Vsecs``
+ .. index::
+ single: -V; RTS option
+
+ Sets the interval that the RTS clock ticks at. The runtime uses a
+ single timer signal to count ticks; this timer signal is used to
+ control the context switch timer (:ref:`using-concurrent`) and the
+ heap profiling timer :ref:`rts-options-heap-prof`. Also, the time
+ profiler uses the RTS timer signal directly to record time profiling
+ samples.
+
+ Normally, setting the ``-V`` option directly is not necessary: the
+ resolution of the RTS timer is adjusted automatically if a short
+ interval is requested with the ``-C`` or ``-i`` options. However,
+ setting ``-V`` is required in order to increase the resolution of
+ the time profiler.
+
+ Using a value of zero disables the RTS clock completely, and has the
+ effect of disabling timers that depend on it: the context switch
+ timer and the heap profiling timer. Context switches will still
+ happen, but deterministically and at a rate much faster than normal.
+ Disabling the interval timer is useful for debugging, because it
+ eliminates a source of non-determinism at runtime.
+
+``--install-signal-handlers=yes|no``
+ .. index::
+ single: --install-signal-handlers; RTS option
+
+ If yes (the default), the RTS installs signal handlers to catch
+ things like ctrl-C. This option is primarily useful for when you are
+ using the Haskell code as a DLL, and want to set your own signal
+ handlers.
+
+ Note that even with ``--install-signal-handlers=no``, the RTS
+ interval timer signal is still enabled. The timer signal is either
+ SIGVTALRM or SIGALRM, depending on the RTS configuration and OS
+ capabilities. To disable the timer signal, use the ``-V0`` RTS
+ option (see above).
+
+``-xmaddress``
+ .. index::
+ single: -xm; RTS option
+
+ WARNING: this option is for working around memory allocation
+ problems only. Do not use unless GHCi fails with a message like
+ “\ ``failed to mmap() memory below 2Gb``\ ”. If you need to use this
+ option to get GHCi working on your machine, please file a bug.
+
+ On 64-bit machines, the RTS needs to allocate memory in the low 2Gb
+ of the address space. Support for this across different operating
+ systems is patchy, and sometimes fails. This option is there to give
+ the RTS a hint about where it should be able to allocate memory in
+ the low 2Gb of the address space. For example,
+ ``+RTS -xm20000000 -RTS`` would hint that the RTS should allocate
+ starting at the 0.5Gb mark. The default is to use the OS's built-in
+ support for allocating memory in the low 2Gb if available (e.g.
+ ``mmap`` with ``MAP_32BIT`` on Linux), or otherwise ``-xm40000000``.
+
+``-xqsize``
+ .. index::
+ single: -xq; RTS option
+
+ [Default: 100k] This option relates to allocation limits; for more
+ about this see
+ :base-ref:`enableAllocationLimit <GHC-Conc.html#v%3AenableAllocationLimit>`.
+ When a thread hits its allocation limit, the RTS throws an exception
+ to the thread, and the thread gets an additional quota of allocation
+ before the exception is raised again, the idea being so that the
+ thread can execute its exception handlers. The ``-xq`` controls the
+ size of this additional quota.
+
+.. _rts-options-gc:
+
+RTS options to control the garbage collector
+--------------------------------------------
+
+.. index::
+ single: garbage collector; options
+ single: RTS options; garbage collection
+
+There are several options to give you precise control over garbage
+collection. Hopefully, you won't need any of these in normal operation,
+but there are several things that can be tweaked for maximum
+performance.
+
+``-A ⟨size⟩``
+ .. index::
+ single: -A; RTS option
+ single: allocation area, size
+
+ [Default: 512k] Set the allocation area size used by the garbage
+ collector. The allocation area (actually generation 0 step 0) is
+ fixed and is never resized (unless you use ``-H``, below).
+
+ Increasing the allocation area size may or may not give better
+ performance (a bigger allocation area means worse cache behaviour
+ but fewer garbage collections and less promotion).
+
+ With only 1 generation (``-G1``) the ``-A`` option specifies the
+ minimum allocation area, since the actual size of the allocation
+ area will be resized according to the amount of data in the heap
+ (see ``-F``, below).
+
+``-O ⟨size⟩``
+ .. index::
+ single: -O; RTS option
+ single: old generation, size
+
+ [Default: 1m] Set the minimum size of the old generation. The old
+ generation is collected whenever it grows to this size or the value
+ of the ``-F`` option multiplied by the size of the live data at the
+ previous major collection, whichever is larger.
+
+``-n ⟨size⟩``
+ .. index::
+ single: -n; RTS option
+
+ .. index::
+ single: allocation area, chunk size
+
+ [Default: 0, Example: ``-n4m``\ ] When set to a non-zero value, this
+ option divides the allocation area (``-A`` value) into chunks of the
+ specified size. During execution, when a processor exhausts its
+ current chunk, it is given another chunk from the pool until the
+ pool is exhausted, at which point a collection is triggered.
+
+ This option is only useful when running in parallel (``-N2`` or
+ greater). It allows the processor cores to make better use of the
+ available allocation area, even when cores are allocating at
+ different rates. Without ``-n``, each core gets a fixed-size
+ allocation area specified by the ``-A``, and the first core to
+ exhaust its allocation area triggers a GC across all the cores. This
+ can result in a collection happening when the allocation areas of
+ some cores are only partially full, so the purpose of the ``-n`` is
+ to allow cores that are allocating faster to get more of the
+ allocation area. This means less frequent GC, leading a lower GC
+ overhead for the same heap size.
+
+ This is particularly useful in conjunction with larger ``-A``
+ values, for example ``-A64m -n4m`` is a useful combination on larger core
+ counts (8+).
+
+``-c``
+ .. index::
+ single: -c; RTS option
+
+ .. index::
+ single: garbage collection; compacting
+
+ .. index::
+ single: compacting garbage collection
+
+ Use a compacting algorithm for collecting the oldest generation. By
+ default, the oldest generation is collected using a copying
+ algorithm; this option causes it to be compacted in-place instead.
+ The compaction algorithm is slower than the copying algorithm, but
+ the savings in memory use can be considerable.
+
+ For a given heap size (using the ``-H`` option), compaction can in
+ fact reduce the GC cost by allowing fewer GCs to be performed. This
+ is more likely when the ratio of live data to heap size is high, say
+ greater than 30%.
+
+ .. note::
+ Compaction doesn't currently work when a single generation is
+ requested using the ``-G1`` option.
+
+``-c ⟨n⟩``
+ [Default: 30] Automatically enable compacting collection when the
+ live data exceeds ⟨n⟩% of the maximum heap size (see the ``-M``
+ option). Note that the maximum heap size is unlimited by default, so
+ this option has no effect unless the maximum heap size is set with
+ ``-M ⟨size⟩.``
+
+``-F ⟨factor⟩``
+ .. index::
+ single: -F; RTS option
+ single: heap size, factor
+
+ [Default: 2] This option controls the amount of memory reserved for
+ the older generations (and in the case of a two space collector the
+ size of the allocation area) as a factor of the amount of live data.
+ For example, if there was 2M of live data in the oldest generation
+ when we last collected it, then by default we'll wait until it grows
+ to 4M before collecting it again.
+
+ The default seems to work well here. If you have plenty of memory,
+ it is usually better to use ``-H ⟨size⟩`` than to increase
+ ``-F ⟨factor⟩.``
+
+ The ``-F`` setting will be automatically reduced by the garbage
+ collector when the maximum heap size (the ``-M ⟨size⟩`` setting) is
+ approaching.
+
+``-G ⟨generations⟩``
+ .. index::
+ single: -G; RTS option
+ single: generations, number of
+
+ [Default: 2] Set the number of generations used by the garbage
+ collector. The default of 2 seems to be good, but the garbage
+ collector can support any number of generations. Anything larger
+ than about 4 is probably not a good idea unless your program runs
+ for a *long* time, because the oldest generation will hardly ever
+ get collected.
+
+ Specifying 1 generation with ``+RTS -G1`` gives you a simple 2-space
+ collector, as you would expect. In a 2-space collector, the ``-A``
+ option (see above) specifies the *minimum* allocation area size,
+ since the allocation area will grow with the amount of live data in
+ the heap. In a multi-generational collector the allocation area is a
+ fixed size (unless you use the ``-H`` option, see below).
+
+``-qggen``
+ .. index::
+ single: -qg; RTS option
+
+ [New in GHC 6.12.1] [Default: 0] Use parallel GC in generation ⟨gen⟩
+ and higher. Omitting ⟨gen⟩ turns off the parallel GC completely,
+ reverting to sequential GC.
+
+ The default parallel GC settings are usually suitable for parallel
+ programs (i.e. those using ``par``, Strategies, or with multiple
+ threads). However, it is sometimes beneficial to enable the parallel
+ GC for a single-threaded sequential program too, especially if the
+ program has a large amount of heap data and GC is a significant
+ fraction of runtime. To use the parallel GC in a sequential program,
+ enable the parallel runtime with a suitable ``-N`` option, and
+ additionally it might be beneficial to restrict parallel GC to the
+ old generation with ``-qg1``.
+
+``-qbgen``
+ .. index::
+ single: -qb; RTS option
+
+ [New in GHC 6.12.1] [Default: 1] Use load-balancing in the parallel
+ GC in generation ⟨gen⟩ and higher. Omitting ⟨gen⟩ disables
+ load-balancing entirely.
+
+ Load-balancing shares out the work of GC between the available
+ cores. This is a good idea when the heap is large and we need to
+ parallelise the GC work, however it is also pessimal for the short
+ young-generation collections in a parallel program, because it can
+ harm locality by moving data from the cache of the CPU where is it
+ being used to the cache of another CPU. Hence the default is to do
+ load-balancing only in the old-generation. In fact, for a parallel
+ program it is sometimes beneficial to disable load-balancing
+ entirely with ``-qb``.
+
+``-H [⟨size⟩]``
+ .. index::
+ single: -H; RTS option
+ single: heap size, suggested
+
+ [Default: 0] This option provides a “suggested heap size” for the
+ garbage collector. Think of ``-Hsize`` as a variable ``-A`` option.
+ It says: I want to use at least ⟨size⟩ bytes, so use whatever is
+ left over to increase the ``-A`` value.
+
+ This option does not put a *limit* on the heap size: the heap may
+ grow beyond the given size as usual.
+
+ If ⟨size⟩ is omitted, then the garbage collector will take the size
+ of the heap at the previous GC as the ⟨size⟩. This has the effect of
+ allowing for a larger ``-A`` value but without increasing the
+ overall memory requirements of the program. It can be useful when
+ the default small ``-A`` value is suboptimal, as it can be in
+ programs that create large amounts of long-lived data.
+
+``-I ⟨seconds⟩``
+ .. index::
+ single: -I; RTS option
+ single: idle GC
+
+ (default: 0.3) In the threaded and SMP versions of the RTS (see
+ ``-threaded``, :ref:`options-linker`), a major GC is automatically
+ performed if the runtime has been idle (no Haskell computation has
+ been running) for a period of time. The amount of idle time which
+ must pass before a GC is performed is set by the ``-I ⟨seconds⟩``
+ option. Specifying ``-I0`` disables the idle GC.
+
+ For an interactive application, it is probably a good idea to use
+ the idle GC, because this will allow finalizers to run and
+ deadlocked threads to be detected in the idle time when no Haskell
+ computation is happening. Also, it will mean that a GC is less
+ likely to happen when the application is busy, and so responsiveness
+ may be improved. However, if the amount of live data in the heap is
+ particularly large, then the idle GC can cause a significant delay,
+ and too small an interval could adversely affect interactive
+ responsiveness.
+
+ This is an experimental feature, please let us know if it causes
+ problems and/or could benefit from further tuning.
+
+``-ki ⟨size⟩``
+ .. index::
+ single: -k; RTS option
+ single: stack, initial size
+
+ [Default: 1k] Set the initial stack size for new threads.
+
+ Thread stacks (including the main thread's stack) live on the heap.
+ As the stack grows, new stack chunks are added as required; if the
+ stack shrinks again, these extra stack chunks are reclaimed by the
+ garbage collector. The default initial stack size is deliberately
+ small, in order to keep the time and space overhead for thread
+ creation to a minimum, and to make it practical to spawn threads for
+ even tiny pieces of work.
+
+ .. note::
+ This flag used to be simply ``-k``, but was renamed to ``-ki`` in
+ GHC 7.2.1. The old name is still accepted for backwards
+ compatibility, but that may be removed in a future version.
+
+``-kc ⟨size⟩``
+ .. index::
+ single: -kc; RTS option
+ single: stack; chunk size
+
+ [Default: 32k] Set the size of “stack chunks”. When a thread's
+ current stack overflows, a new stack chunk is created and added to
+ the thread's stack, until the limit set by ``-K`` is reached.
+
+ The advantage of smaller stack chunks is that the garbage collector
+ can avoid traversing stack chunks if they are known to be unmodified
+ since the last collection, so reducing the chunk size means that the
+ garbage collector can identify more stack as unmodified, and the GC
+ overhead might be reduced. On the other hand, making stack chunks
+ too small adds some overhead as there will be more
+ overflow/underflow between chunks. The default setting of 32k
+ appears to be a reasonable compromise in most cases.
+
+``-kb ⟨size⟩``
+ .. index::
+ single: -kc; RTS option
+ single: stack; chunk buffer size
+
+ [Default: 1k] Sets the stack chunk buffer size. When a stack chunk
+ overflows and a new stack chunk is created, some of the data from
+ the previous stack chunk is moved into the new chunk, to avoid an
+ immediate underflow and repeated overflow/underflow at the boundary.
+ The amount of stack moved is set by the ``-kb`` option.
+
+ Note that to avoid wasting space, this value should typically be
+ less than 10% of the size of a stack chunk (``-kc``), because in a
+ chain of stack chunks, each chunk will have a gap of unused space of
+ this size.
+
+``-K ⟨size⟩``
+ .. index::
+ single: -K; RTS option
+ single: stack, maximum size
+
+ [Default: 80% physical memory size] Set the maximum stack size for
+ an individual thread to ⟨size⟩ bytes. If the thread attempts to
+ exceed this limit, it will be sent the ``StackOverflow`` exception.
+ The limit can be disabled entirely by specifying a size of zero.
+
+ This option is there mainly to stop the program eating up all the
+ available memory in the machine if it gets into an infinite loop.
+
+``-m ⟨n⟩``
+ .. index::
+ single: -m; RTS option
+ single: heap, minimum free
+
+ Minimum % ⟨n⟩ of heap which must be available for allocation. The
+ default is 3%.
+
+``-M ⟨size⟩``
+ .. index::
+ single: -M; RTS option
+ single: heap size, maximum
+
+ [Default: unlimited] Set the maximum heap size to ⟨size⟩ bytes. The
+ heap normally grows and shrinks according to the memory requirements
+ of the program. The only reason for having this option is to stop
+ the heap growing without bound and filling up all the available swap
+ space, which at the least will result in the program being summarily
+ killed by the operating system.
+
+ The maximum heap size also affects other garbage collection
+ parameters: when the amount of live data in the heap exceeds a
+ certain fraction of the maximum heap size, compacting collection
+ will be automatically enabled for the oldest generation, and the
+ ``-F`` parameter will be reduced in order to avoid exceeding the
+ maximum heap size.
+
+.. _rts-options-statistics:
+
+RTS options to produce runtime statistics
+-----------------------------------------
+
+``-T``, ``-t [⟨file⟩]``, ``-s [⟨file⟩]``, ``-S [⟨file⟩]``, ``--machine-readable``
+ .. index::
+ single: -T; RTS option
+ single: -t; RTS option
+ single: -s; RTS option
+ single: -S; RTS option
+ single: --machine-readable; RTS option
+
+ These options produce runtime-system statistics, such as the amount
+ of time spent executing the program and in the garbage collector,
+ the amount of memory allocated, the maximum size of the heap, and so
+ on. The three variants give different levels of detail: ``-T``
+ collects the data but produces no output ``-t`` produces a single
+ line of output in the same format as GHC's ``-Rghc-timing`` option,
+ ``-s`` produces a more detailed summary at the end of the program,
+ and ``-S`` additionally produces information about each and every
+ garbage collection.
+
+ The output is placed in ⟨file⟩. If ⟨file⟩ is omitted, then the
+ output is sent to ``stderr``.
+
+ If you use the ``-T`` flag then, you should access the statistics
+ using :base-ref:`GHC.Stats <GHC-Stats.html>`.
+
+ If you use the ``-t`` flag then, when your program finishes, you
+ will see something like this:
+
+ ::
+
+ <<ghc: 36169392 bytes, 69 GCs, 603392/1065272 avg/max bytes residency (2 samples), 3M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02 elapsed), 0.07 GC (0.07 elapsed) :ghc>>
+
+ This tells you:
+
+ - The total number of bytes allocated by the program over the whole
+ run.
+
+ - The total number of garbage collections performed.
+
+ - The average and maximum "residency", which is the amount of live
+ data in bytes. The runtime can only determine the amount of live
+ data during a major GC, which is why the number of samples
+ corresponds to the number of major GCs (and is usually relatively
+ small). To get a better picture of the heap profile of your
+ program, use the ``-hT`` RTS option (:ref:`rts-profiling`).
+
+ - The peak memory the RTS has allocated from the OS.
+
+ - The amount of CPU time and elapsed wall clock time while
+ initialising the runtime system (INIT), running the program
+ itself (MUT, the mutator), and garbage collecting (GC).
+
+ You can also get this in a more future-proof, machine readable
+ format, with ``-t --machine-readable``:
+
+ ::
+
+ [("bytes allocated", "36169392")
+ ,("num_GCs", "69")
+ ,("average_bytes_used", "603392")
+ ,("max_bytes_used", "1065272")
+ ,("num_byte_usage_samples", "2")
+ ,("peak_megabytes_allocated", "3")
+ ,("init_cpu_seconds", "0.00")
+ ,("init_wall_seconds", "0.00")
+ ,("mutator_cpu_seconds", "0.02")
+ ,("mutator_wall_seconds", "0.02")
+ ,("GC_cpu_seconds", "0.07")
+ ,("GC_wall_seconds", "0.07")
+ ]
+
+ If you use the ``-s`` flag then, when your program finishes, you
+ will see something like this (the exact details will vary depending
+ on what sort of RTS you have, e.g. you will only see profiling data
+ if your RTS is compiled for profiling):
+
+ ::
+
+ 36,169,392 bytes allocated in the heap
+ 4,057,632 bytes copied during GC
+ 1,065,272 bytes maximum residency (2 sample(s))
+ 54,312 bytes maximum slop
+ 3 MB total memory in use (0 MB lost due to fragmentation)
+
+ Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed
+ Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed
+
+ SPARKS: 359207 (557 converted, 149591 pruned)
+
+ INIT time 0.00s ( 0.00s elapsed)
+ MUT time 0.01s ( 0.02s elapsed)
+ GC time 0.07s ( 0.07s elapsed)
+ EXIT time 0.00s ( 0.00s elapsed)
+ Total time 0.08s ( 0.09s elapsed)
+
+ %GC time 89.5% (75.3% elapsed)
+
+ Alloc rate 4,520,608,923 bytes per MUT second
+
+ Productivity 10.5% of total user, 9.1% of total elapsed
+
+ - The "bytes allocated in the heap" is the total bytes allocated by
+ the program over the whole run.
+
+ - GHC uses a copying garbage collector by default. "bytes copied
+ during GC" tells you how many bytes it had to copy during garbage
+ collection.
+
+ - The maximum space actually used by your program is the "bytes
+ maximum residency" figure. This is only checked during major
+ garbage collections, so it is only an approximation; the number
+ of samples tells you how many times it is checked.
+
+ - The "bytes maximum slop" tells you the most space that is ever
+ wasted due to the way GHC allocates memory in blocks. Slop is
+ memory at the end of a block that was wasted. There's no way to
+ control this; we just like to see how much memory is being lost
+ this way.
+
+ - The "total memory in use" tells you the peak memory the RTS has
+ allocated from the OS.
+
+ - Next there is information about the garbage collections done. For
+ each generation it says how many garbage collections were done,
+ how many of those collections were done in parallel, the total
+ CPU time used for garbage collecting that generation, and the
+ total wall clock time elapsed while garbage collecting that
+ generation.
+
+ - The ``SPARKS`` statistic refers to the use of
+ ``Control.Parallel.par`` and related functionality in the
+ program. Each spark represents a call to ``par``; a spark is
+ "converted" when it is executed in parallel; and a spark is
+ "pruned" when it is found to be already evaluated and is
+ discarded from the pool by the garbage collector. Any remaining
+ sparks are discarded at the end of execution, so "converted" plus
+ "pruned" does not necessarily add up to the total.
+
+ - Next there is the CPU time and wall clock time elapsed broken
+ down by what the runtime system was doing at the time. INIT is
+ the runtime system initialisation. MUT is the mutator time, i.e.
+ the time spent actually running your code. GC is the time spent
+ doing garbage collection. RP is the time spent doing retainer
+ profiling. PROF is the time spent doing other profiling. EXIT is
+ the runtime system shutdown time. And finally, Total is, of
+ course, the total.
+
+ %GC time tells you what percentage GC is of Total. "Alloc rate"
+ tells you the "bytes allocated in the heap" divided by the MUT
+ CPU time. "Productivity" tells you what percentage of the Total
+ CPU and wall clock elapsed times are spent in the mutator (MUT).
+
+ The ``-S`` flag, as well as giving the same output as the ``-s``
+ flag, prints information about each GC as it happens:
+
+ ::
+
+ Alloc Copied Live GC GC TOT TOT Page Flts
+ bytes bytes bytes user elap user elap
+ 528496 47728 141512 0.01 0.02 0.02 0.02 0 0 (Gen: 1)
+ [...]
+ 524944 175944 1726384 0.00 0.00 0.08 0.11 0 0 (Gen: 0)
+
+ For each garbage collection, we print:
+
+ - How many bytes we allocated this garbage collection.
+
+ - How many bytes we copied this garbage collection.
+
+ - How many bytes are currently live.
+
+ - How long this garbage collection took (CPU time and elapsed wall
+ clock time).
+
+ - How long the program has been running (CPU time and elapsed wall
+ clock time).
+
+ - How many page faults occurred this garbage collection.
+
+ - How many page faults occurred since the end of the last garbage
+ collection.
+
+ - Which generation is being garbage collected.
+
+RTS options for concurrency and parallelism
+-------------------------------------------
+
+The RTS options related to concurrency are described in
+:ref:`using-concurrent`, and those for parallelism in
+:ref:`parallel-options`.
+
+.. _rts-profiling:
+
+RTS options for profiling
+-------------------------
+
+Most profiling runtime options are only available when you compile your
+program for profiling (see :ref:`prof-compiler-options`, and
+:ref:`rts-options-heap-prof` for the runtime options). However, there is
+one profiling option that is available for ordinary non-profiled
+executables:
+
+``-hT``
+ .. index::
+ single: -hT; RTS option
+
+ (can be shortened to ``-h``.) Generates a basic heap profile, in the
+ file ``prog.hp``. To produce the heap profile graph, use ``hp2ps``
+ (see :ref:`hp2ps`). The basic heap profile is broken down by data
+ constructor, with other types of closures (functions, thunks, etc.)
+ grouped into broad categories (e.g. ``FUN``, ``THUNK``). To get a
+ more detailed profile, use the full profiling support
+ (:ref:`profiling`).
+
+.. _rts-eventlog:
+
+Tracing
+-------
+
+.. index::
+ single: tracing
+ single: events
+ single: eventlog files
+
+When the program is linked with the ``-eventlog`` option
+(:ref:`options-linker`), runtime events can be logged in two ways:
+
+- In binary format to a file for later analysis by a variety of tools.
+ One such tool is
+ `ThreadScope <http://www.haskell.org/haskellwiki/ThreadScope>`__\ ThreadScope,
+ which interprets the event log to produce a visual parallel execution
+ profile of the program.
+
+- As text to standard output, for debugging purposes.
+
+``-lflags``
+ .. index::
+ single: -l; RTS option
+
+ Log events in binary format to the file ``program.eventlog``.
+ Without any ⟨flags⟩ specified, this logs a default set of events,
+ suitable for use with tools like ThreadScope.
+
+ For some special use cases you may want more control over which
+ events are included. The ⟨flags⟩ is a sequence of zero or more
+ characters indicating which classes of events to log. Currently
+ these the classes of events that can be enabled/disabled:
+
+ - ``s`` — scheduler events, including Haskell thread creation and start/stop
+ events. Enabled by default.
+
+ - ``g`` — GC events, including GC start/stop. Enabled by default.
+
+ - ``p`` — parallel sparks (sampled). Enabled by default.
+
+ - ``f`` — parallel sparks (fully accurate). Disabled by default.
+
+ - ``u`` — user events. These are events emitted from Haskell code using
+ functions such as ``Debug.Trace.traceEvent``. Enabled by default.
+
+ You can disable specific classes, or enable/disable all classes at
+ once:
+
+ - ``a`` — enable all event classes listed above
+ - ``-⟨x⟩`` — disable the given class of events, for any event class listed above
+ - ``-a`` — disable all classes
+
+ For example, ``-l-ag`` would disable all event classes (``-a``) except for
+ GC events (``g``).
+
+ For spark events there are two modes: sampled and fully accurate.
+ There are various events in the life cycle of each spark, usually
+ just creating and running, but there are some more exceptional
+ possibilities. In the sampled mode the number of occurrences of each
+ kind of spark event is sampled at frequent intervals. In the fully
+ accurate mode every spark event is logged individually. The latter
+ has a higher runtime overhead and is not enabled by default.
+
+ The format of the log file is described by the header
+ ``EventLogFormat.h`` that comes with GHC, and it can be parsed in
+ Haskell using the
+ `ghc-events <http://hackage.haskell.org/package/ghc-events>`__
+ library. To dump the contents of a ``.eventlog`` file as text, use
+ the tool ``ghc-events show`` that comes with the
+ `ghc-events <http://hackage.haskell.org/package/ghc-events>`__
+ package.
+
+``-v [⟨flags⟩]``
+ .. index::
+ single: -v; RTS option
+
+ Log events as text to standard output, instead of to the
+ ``.eventlog`` file. The ⟨flags⟩ are the same as for ``-l``, with the
+ additional option ``t`` which indicates that the each event printed
+ should be preceded by a timestamp value (in the binary ``.eventlog``
+ file, all events are automatically associated with a timestamp).
+
+The debugging options ``-Dx`` also generate events which are logged
+using the tracing framework. By default those events are dumped as text
+to stdout (``-Dx`` implies ``-v``), but they may instead be stored in
+the binary eventlog file by using the ``-l`` option.
+
+.. _rts-options-debugging:
+
+RTS options for hackers, debuggers, and over-interested souls
+-------------------------------------------------------------
+
+.. index::
+ single: RTS options, hacking/debugging
+
+These RTS options might be used (a) to avoid a GHC bug, (b) to see
+"what's really happening", or (c) because you feel like it. Not
+recommended for everyday use!
+
+``-B``
+ .. index::
+ single: -B; RTS option
+
+ Sound the bell at the start of each (major) garbage collection.
+
+ Oddly enough, people really do use this option! Our pal in Durham
+ (England), Paul Callaghan, writes: “Some people here use it for a
+ variety of purposes—honestly!—e.g., confirmation that the
+ code/machine is doing something, infinite loop detection, gauging
+ cost of recently added code. Certain people can even tell what stage
+ [the program] is in by the beep pattern. But the major use is for
+ annoying others in the same office…”
+
+``-D ⟨x⟩``
+ .. index::
+ single: -D; RTS option
+
+ An RTS debugging flag; only available if the program was linked with
+ the ``-debug`` option. Various values of ⟨x⟩ are provided to enable
+ debug messages and additional runtime sanity checks in different
+ subsystems in the RTS, for example ``+RTS -Ds -RTS`` enables debug
+ messages from the scheduler. Use ``+RTS -?`` to find out which debug
+ flags are supported.
+
+ Debug messages will be sent to the binary event log file instead of
+ stdout if the ``-l`` option is added. This might be useful for
+ reducing the overhead of debug tracing.
+
+``-r ⟨file⟩``
+ .. index::
+ single: -r; RTS option
+ single: ticky ticky profiling
+ single: profiling; ticky ticky
+
+ Produce "ticky-ticky" statistics at the end of the program run (only
+ available if the program was linked with ``-debug``). The ⟨file⟩
+ business works just like on the ``-S`` RTS option, above.
+
+ For more information on ticky-ticky profiling, see
+ :ref:`ticky-ticky`.
+
+``-xc``
+ .. index::
+ single: -xc; RTS option
+
+ (Only available when the program is compiled for profiling.) When an
+ exception is raised in the program, this option causes a stack trace
+ to be dumped to ``stderr``.
+
+ This can be particularly useful for debugging: if your program is
+ complaining about a ``head []`` error and you haven't got a clue
+ which bit of code is causing it, compiling with
+ ``-prof -fprof-auto`` and running with ``+RTS -xc -RTS`` will tell
+ you exactly the call stack at the point the error was raised.
+
+ The output contains one report for each exception raised in the
+ program (the program might raise and catch several exceptions during
+ its execution), where each report looks something like this:
+
+ ::
+
+ *** Exception raised (reporting due to +RTS -xc), stack trace:
+ GHC.List.CAF
+ --> evaluated by: Main.polynomial.table_search,
+ called from Main.polynomial.theta_index,
+ called from Main.polynomial,
+ called from Main.zonal_pressure,
+ called from Main.make_pressure.p,
+ called from Main.make_pressure,
+ called from Main.compute_initial_state.p,
+ called from Main.compute_initial_state,
+ called from Main.CAF
+ ...
+
+ The stack trace may often begin with something uninformative like
+ ``GHC.List.CAF``; this is an artifact of GHC's optimiser, which
+ lifts out exceptions to the top-level where the profiling system
+ assigns them to the cost centre "CAF". However, ``+RTS -xc`` doesn't
+ just print the current stack, it looks deeper and reports the stack
+ at the time the CAF was evaluated, and it may report further stacks
+ until a non-CAF stack is found. In the example above, the next stack
+ (after ``--> evaluated by``) contains plenty of information about
+ what the program was doing when it evaluated ``head []``.
+
+ Implementation details aside, the function names in the stack should
+ hopefully give you enough clues to track down the bug.
+
+ See also the function ``traceStack`` in the module ``Debug.Trace``
+ for another way to view call stacks.
+
+``-Z``
+ .. index::
+ single: -Z; RTS option
+
+ Turn *off* "update-frame squeezing" at garbage-collection time.
+ (There's no particularly good reason to turn it off, except to
+ ensure the accuracy of certain data collected regarding thunk entry
+ counts.)
+
+.. _ghc-info:
+
+Getting information about the RTS
+---------------------------------
+
+.. index::
+ single: RTS
+
+It is possible to ask the RTS to give some information about itself. To
+do this, use the ``--info`` flag, e.g.
+
+::
+
+ $ ./a.out +RTS --info
+ [("GHC RTS", "YES")
+ ,("GHC version", "6.7")
+ ,("RTS way", "rts_p")
+ ,("Host platform", "x86_64-unknown-linux")
+ ,("Host architecture", "x86_64")
+ ,("Host OS", "linux")
+ ,("Host vendor", "unknown")
+ ,("Build platform", "x86_64-unknown-linux")
+ ,("Build architecture", "x86_64")
+ ,("Build OS", "linux")
+ ,("Build vendor", "unknown")
+ ,("Target platform", "x86_64-unknown-linux")
+ ,("Target architecture", "x86_64")
+ ,("Target OS", "linux")
+ ,("Target vendor", "unknown")
+ ,("Word size", "64")
+ ,("Compiler unregisterised", "NO")
+ ,("Tables next to code", "YES")
+ ]
+
+The information is formatted such that it can be read as a of type
+``[(String, String)]``. Currently the following fields are present:
+
+``GHC RTS``
+ Is this program linked against the GHC RTS? (always "YES").
+
+``GHC version``
+ The version of GHC used to compile this program.
+
+``RTS way``
+ The variant (“way”) of the runtime. The most common values are
+ ``rts_v`` (vanilla), ``rts_thr`` (threaded runtime, i.e. linked
+ using the ``-threaded`` option) and ``rts_p`` (profiling runtime,
+ i.e. linked using the ``-prof`` option). Other variants include
+ ``debug`` (linked using ``-debug``), and ``dyn`` (the RTS is linked
+ in dynamically, i.e. a shared library, rather than statically linked
+ into the executable itself). These can be combined, e.g. you might
+ have ``rts_thr_debug_p``.
+
+``Target platform``\ ``Target architecture``\ ``Target OS``\ ``Target vendor``
+ These are the platform the program is compiled to run on.
+
+``Build platform``\ ``Build architecture``\ ``Build OS``\ ``Build vendor``
+ These are the platform where the program was built on. (That is, the
+ target platform of GHC itself.) Ordinarily this is identical to the
+ target platform. (It could potentially be different if
+ cross-compiling.)
+
+``Host platform``\ ``Host architecture``\ ``Host OS``\ ``Host vendor``
+ These are the platform where GHC itself was compiled. Again, this
+ would normally be identical to the build and target platforms.
+
+``Word size``
+ Either ``"32"`` or ``"64"``, reflecting the word size of the target
+ platform.
+
+``Compiler unregistered``
+ Was this program compiled with an :ref:`"unregistered" <unreg>`
+ version of GHC? (I.e., a version of GHC that has no
+ platform-specific optimisations compiled in, usually because this is
+ a currently unsupported platform.) This value will usually be no,
+ unless you're using an experimental build of GHC.
+
+``Tables next to code``
+ Putting info tables directly next to entry code is a useful
+ performance optimisation that is not available on all platforms.
+ This field tells you whether the program has been compiled with this
+ optimisation. (Usually yes, except on unusual platforms.)