1 files changed, 1390 insertions, 0 deletions
diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst
new file mode 100644
index 0000000000..34525d12b8
--- /dev/null
+++ b/docs/users_guide/profiling.rst
@@ -0,0 +1,1390 @@
+.. _profiling:
+
+Profiling
+=========
+
+.. index::
+   single: profiling
+   single: cost-centre profiling
+   single: -p; RTS option
+
+GHC comes with a time and space profiling system, so that you can answer
+questions like "why is my program so slow?", or "why is my program using
+so much memory?".
+
+Profiling a program is a three-step process:
+
+1. Re-compile your program for profiling with the ``-prof`` option, and
+   probably one of the options for adding automatic annotations:
+   ``-fprof-auto`` is the most common [1]_. ``-fprof-auto``
+
+   If you are using external packages with ``cabal``, you may need to
+   reinstall these packages with profiling support; typically this is
+   done with ``cabal install -p package --reinstall``.
+
+2. Having compiled the program for profiling, you now need to run it to
+   generate the profile. For example, a simple time profile can be
+   generated by running the program with ``+RTS -p``, which generates a file
+   named ``⟨prog⟩.prof`` where ⟨prog⟩ is the name of your program (without the
+   ``.exe`` extension, if you are on Windows).
+
+   There are many different kinds of profile that can be generated,
+   selected by different RTS options. We will be describing the various
+   kinds of profile throughout the rest of this chapter. Some profiles
+   require further processing using additional tools after running the
+   program.
+
+3. Examine the generated profiling information, use the information to
+   optimise your program, and repeat as necessary.
+
+.. _cost-centres:
+
+Cost centres and cost-centre stacks
+-----------------------------------
+
+GHC's profiling system assigns costs to cost centres. A cost is simply
+the time or space (memory) required to evaluate an expression. Cost
+centres are program annotations around expressions; all costs incurred
+by the annotated expression are assigned to the enclosing cost centre.
+Furthermore, GHC will remember the stack of enclosing cost centres for
+any given expression at run-time and generate a call-tree of cost
+attributions.
+
+Let's take a look at an example:
+
+::
+
+    main = print (fib 30)
+    fib n = if n < 2 then 1 else fib (n-1) + fib (n-2)
+
+Compile and run this program as follows:
+
+::
+
+    $ ghc -prof -fprof-auto -rtsopts Main.hs
+    $ ./Main +RTS -p
+    121393
+    $
+
+When a GHC-compiled program is run with the ``-p`` RTS option, it
+generates a file called ``prog.prof``. In this case, the file will
+contain something like this:
+
+::
+
+            Wed Oct 12 16:14 2011 Time and Allocation Profiling Report  (Final)
+
+               Main +RTS -p -RTS
+
+            total time  =        0.68 secs   (34 ticks @ 20 ms)
+            total alloc = 204,677,844 bytes  (excludes profiling overheads)
+
+    COST CENTRE MODULE  %time %alloc
+
+    fib         Main    100.0  100.0
+
+
+                                                          individual     inherited
+    COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc
+
+    MAIN        MAIN                    102           0    0.0    0.0   100.0  100.0
+     CAF        GHC.IO.Handle.FD        128           0    0.0    0.0     0.0    0.0
+     CAF        GHC.IO.Encoding.Iconv   120           0    0.0    0.0     0.0    0.0
+     CAF        GHC.Conc.Signal         110           0    0.0    0.0     0.0    0.0
+     CAF        Main                    108           0    0.0    0.0   100.0  100.0
+      main      Main                    204           1    0.0    0.0   100.0  100.0
+       fib      Main                    205     2692537  100.0  100.0   100.0  100.0
+
+The first part of the file gives the program name and options, and the
+total time and total memory allocation measured during the run of the
+program (note that the total memory allocation figure isn't the same as
+the amount of *live* memory needed by the program at any one time; the
+latter can be determined using heap profiling, which we will describe
+later in :ref:`prof-heap`).
+
+The second part of the file is a break-down by cost centre of the most
+costly functions in the program. In this case, there was only one
+significant function in the program, namely ``fib``, and it was
+responsible for 100% of both the time and allocation costs of the
+program.
+
+The third and final section of the file gives a profile break-down by
+cost-centre stack. This is roughly a call-tree profile of the program.
+In the example above, it is clear that the costly call to ``fib`` came
+from ``main``.
+
+The time and allocation incurred by a given part of the program is
+displayed in two ways: “individual”, which are the costs incurred by the
+code covered by this cost centre stack alone, and “inherited”, which
+includes the costs incurred by all the children of this node.
+
+The usefulness of cost-centre stacks is better demonstrated by modifying
+the example slightly:
+
+::
+
+    main = print (f 30 + g 30)
+      where
+        f n  = fib n
+        g n  = fib (n `div` 2)
+
+    fib n = if n < 2 then 1 else fib (n-1) + fib (n-2)
+
+Compile and run this program as before, and take a look at the new
+profiling results:
+
+::
+
+    COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc
+
+    MAIN        MAIN                    102           0    0.0    0.0   100.0  100.0
+     CAF        GHC.IO.Handle.FD        128           0    0.0    0.0     0.0    0.0
+     CAF        GHC.IO.Encoding.Iconv   120           0    0.0    0.0     0.0    0.0
+     CAF        GHC.Conc.Signal         110           0    0.0    0.0     0.0    0.0
+     CAF        Main                    108           0    0.0    0.0   100.0  100.0
+      main      Main                    204           1    0.0    0.0   100.0  100.0
+       main.g   Main                    207           1    0.0    0.0     0.0    0.1
+        fib     Main                    208        1973    0.0    0.1     0.0    0.1
+       main.f   Main                    205           1    0.0    0.0   100.0   99.9
+        fib     Main                    206     2692537  100.0   99.9   100.0   99.9
+
+Now although we had two calls to ``fib`` in the program, it is
+immediately clear that it was the call from ``f`` which took all the
+time. The functions ``f`` and ``g`` which are defined in the ``where``
+clause in ``main`` are given their own cost centres, ``main.f`` and
+``main.g`` respectively.
+
+The actual meaning of the various columns in the output is:
+
+    The number of times this particular point in the call tree was
+    entered.
+
+    The percentage of the total run time of the program spent at this
+    point in the call tree.
+
+    The percentage of the total memory allocations (excluding profiling
+    overheads) of the program made by this call.
+
+    The percentage of the total run time of the program spent below this
+    point in the call tree.
+
+    The percentage of the total memory allocations (excluding profiling
+    overheads) of the program made by this call and all of its
+    sub-calls.
+
+In addition you can use the ``-P`` RTS option ``-P`` to get the
+following additional information:
+
+``ticks``
+    The raw number of time “ticks” which were attributed to this
+    cost-centre; from this, we get the ``%time`` figure mentioned above.
+
+``bytes``
+    Number of bytes allocated in the heap while in this cost-centre;
+    again, this is the raw number from which we get the ``%alloc``
+    figure mentioned above.
+
+What about recursive functions, and mutually recursive groups of
+functions? Where are the costs attributed? Well, although GHC does keep
+information about which groups of functions called each other
+recursively, this information isn't displayed in the basic time and
+allocation profile, instead the call-graph is flattened into a tree as
+follows: a call to a function that occurs elsewhere on the current stack
+does not push another entry on the stack, instead the costs for this
+call are aggregated into the caller [2]_.
+
+.. _scc-pragma:
+
+Inserting cost centres by hand
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cost centres are just program annotations. When you say ``-fprof-auto``
+to the compiler, it automatically inserts a cost centre annotation
+around every binding not marked INLINE in your program, but you are
+entirely free to add cost centre annotations yourself.
+
+The syntax of a cost centre annotation is
+
+::
+
+    {-# SCC "name" #-} <expression>
+
+where ``"name"`` is an arbitrary string, that will become the name of
+your cost centre as it appears in the profiling output, and
+``<expression>`` is any Haskell expression. An ``SCC`` annotation
+extends as far to the right as possible when parsing. (SCC stands for
+"Set Cost Centre"). The double quotes can be omitted if ``name`` is a
+Haskell identifier, for example:
+
+::
+
+    {-# SCC my_function #-} <expression>
+
+Here is an example of a program with a couple of SCCs:
+
+::
+
+    main :: IO ()
+    main = do let xs = [1..1000000]
+              let ys = [1..2000000]
+              print $ {-# SCC last_xs #-} last xs
+              print $ {-# SCC last_init_xs #-} last $ init xs
+              print $ {-# SCC last_ys #-} last ys
+              print $ {-# SCC last_init_ys #-} last $ init ys
+
+which gives this profile when run:
+
+::
+
+    COST CENTRE     MODULE                  no.     entries  %time %alloc   %time %alloc
+
+    MAIN            MAIN                    102           0    0.0    0.0   100.0  100.0
+     CAF            GHC.IO.Handle.FD        130           0    0.0    0.0     0.0    0.0
+     CAF            GHC.IO.Encoding.Iconv   122           0    0.0    0.0     0.0    0.0
+     CAF            GHC.Conc.Signal         111           0    0.0    0.0     0.0    0.0
+     CAF            Main                    108           0    0.0    0.0   100.0  100.0
+      main          Main                    204           1    0.0    0.0   100.0  100.0
+       last_init_ys Main                    210           1   25.0   27.4    25.0   27.4
+       main.ys      Main                    209           1   25.0   39.2    25.0   39.2
+       last_ys      Main                    208           1   12.5    0.0    12.5    0.0
+       last_init_xs Main                    207           1   12.5   13.7    12.5   13.7
+       main.xs      Main                    206           1   18.8   19.6    18.8   19.6
+       last_xs      Main                    205           1    6.2    0.0     6.2    0.0
+
+.. _prof-rules:
+
+Rules for attributing costs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While running a program with profiling turned on, GHC maintains a
+cost-centre stack behind the scenes, and attributes any costs (memory
+allocation and time) to whatever the current cost-centre stack is at the
+time the cost is incurred.
+
+The mechanism is simple: whenever the program evaluates an expression
+with an SCC annotation, ``{-# SCC c -#} E``, the cost centre ``c`` is
+pushed on the current stack, and the entry count for this stack is
+incremented by one. The stack also sometimes has to be saved and
+restored; in particular when the program creates a thunk (a lazy
+suspension), the current cost-centre stack is stored in the thunk, and
+restored when the thunk is evaluated. In this way, the cost-centre stack
+is independent of the actual evaluation order used by GHC at runtime.
+
+At a function call, GHC takes the stack stored in the function being
+called (which for a top-level function will be empty), and *appends* it
+to the current stack, ignoring any prefix that is identical to a prefix
+of the current stack.
+
+We mentioned earlier that lazy computations, i.e. thunks, capture the
+current stack when they are created, and restore this stack when they
+are evaluated. What about top-level thunks? They are "created" when the
+program is compiled, so what stack should we give them? The technical
+name for a top-level thunk is a CAF ("Constant Applicative Form"). GHC
+assigns every CAF in a module a stack consisting of the single cost
+centre ``M.CAF``, where ``M`` is the name of the module. It is also
+possible to give each CAF a different stack, using the option
+``-fprof-cafs``. This is especially useful when
+compiling with ``-ffull-laziness`` (as is default with ``-O`` and
+higher), as constants in function bodies will be lifted to the top-level
+and become CAFs. You will probably need to consult the Core
+(``-ddump-simpl``) in order to determine what these CAFs correspond to.
+
+.. index::
+   single: -fprof-cafs
+
+.. _prof-compiler-options:
+
+Compiler options for profiling
+------------------------------
+
+.. index::
+   single: profiling; options
+   single: options; for profiling
+
+``-prof``
+    .. index::
+       single: -prof
+
+    To make use of the profiling system *all* modules must be compiled
+    and linked with the ``-prof`` option. Any ``SCC`` annotations you've
+    put in your source will spring to life.
+
+    Without a ``-prof`` option, your ``SCC``\ s are ignored; so you can
+    compile ``SCC``-laden code without changing it.
+
+There are a few other profiling-related compilation options. Use them
+*in addition to* ``-prof``. These do not have to be used consistently
+for all modules in a program.
+
+``-fprof-auto``
+    .. index::
+       single: -fprof-auto
+
+    *All* bindings not marked INLINE, whether exported or not, top level
+    or nested, will be given automatic ``SCC`` annotations. Functions
+    marked INLINE must be given a cost centre manually.
+
+``-fprof-auto-top``
+    .. index::
+       single: -fprof-auto-top
+       single: cost centres; automatically inserting
+
+    GHC will automatically add ``SCC`` annotations for all top-level
+    bindings not marked INLINE. If you want a cost centre on an INLINE
+    function, you have to add it manually.
+
+``-fprof-auto-exported``
+    .. index::
+       single: -fprof-auto-top
+
+    .. index::
+       single: cost centres; automatically inserting
+
+    GHC will automatically add ``SCC`` annotations for all exported
+    functions not marked INLINE. If you want a cost centre on an INLINE
+    function, you have to add it manually.
+
+``-fprof-auto-calls``
+    .. index::
+       single: -fprof-auto-calls
+
+    Adds an automatic ``SCC`` annotation to all *call sites*. This is
+    particularly useful when using profiling for the purposes of
+    generating stack traces; see the function ``traceStack`` in the
+    module ``Debug.Trace``, or the ``-xc`` RTS flag
+    (:ref:`rts-options-debugging`) for more details.
+
+``-fprof-cafs``
+    .. index::
+       single: -fprof-cafs
+
+    The costs of all CAFs in a module are usually attributed to one
+    “big” CAF cost-centre. With this option, all CAFs get their own
+    cost-centre. An “if all else fails” option…
+
+``-fno-prof-auto``
+    .. index::
+       single: -no-fprof-auto
+
+    Disables any previous ``-fprof-auto``, ``-fprof-auto-top``, or
+    ``-fprof-auto-exported`` options.
+
+``-fno-prof-cafs``
+    .. index::
+       single: -fno-prof-cafs
+
+    Disables any previous ``-fprof-cafs`` option.
+
+``-fno-prof-count-entries``
+    .. index::
+       single: -fno-prof-count-entries
+
+    Tells GHC not to collect information about how often functions are
+    entered at runtime (the "entries" column of the time profile), for
+    this module. This tends to make the profiled code run faster, and
+    hence closer to the speed of the unprofiled code, because GHC is
+    able to optimise more aggressively if it doesn't have to maintain
+    correct entry counts. This option can be useful if you aren't
+    interested in the entry counts (for example, if you only intend to
+    do heap profiling).
+
+.. _prof-time-options:
+
+Time and allocation profiling
+-----------------------------
+
+To generate a time and allocation profile, give one of the following RTS
+options to the compiled program when you run it (RTS options should be
+enclosed between ``+RTS ... -RTS`` as usual):
+
+``-p``, ``-P``, ``-pa``
+    .. index::
+       single: -p; RTS option
+       single: -P; RTS option
+       single: -pa; RTS option
+       single: time profile
+
+    The ``-p`` option produces a standard *time profile* report. It is
+    written into the file ``program.prof``.
+
+    The ``-P`` option produces a more detailed report containing the
+    actual time and allocation data as well. (Not used much.)
+
+    The ``-pa`` option produces the most detailed report containing all
+    cost centres in addition to the actual time and allocation data.
+
+``-Vsecs``
+    .. index::
+       single: -V; RTS option
+
+    Sets the interval that the RTS clock ticks at, which is also the
+    sampling interval of the time and allocation profile. The default is
+    0.02 seconds.
+
+``-xc``
+    .. index::
+       single: -xc; RTS option
+
+    This option causes the runtime to print out the current cost-centre
+    stack whenever an exception is raised. This can be particularly
+    useful for debugging the location of exceptions, such as the
+    notorious ``Prelude.head: empty list`` error. See
+    :ref:`rts-options-debugging`.
+
+.. _prof-heap:
+
+Profiling memory usage
+----------------------
+
+In addition to profiling the time and allocation behaviour of your
+program, you can also generate a graph of its memory usage over time.
+This is useful for detecting the causes of space leaks, when your
+program holds on to more memory at run-time that it needs to. Space
+leaks lead to slower execution due to heavy garbage collector activity,
+and may even cause the program to run out of memory altogether.
+
+To generate a heap profile from your program:
+
+1. Compile the program for profiling (:ref:`prof-compiler-options`).
+
+2. Run it with one of the heap profiling options described below (eg.
+   ``-h`` for a basic producer profile). This generates the file
+   ``prog.hp``.
+
+3. Run ``hp2ps`` to produce a Postscript file, ``prog.ps``. The
+   ``hp2ps`` utility is described in detail in :ref:`hp2ps`.
+
+4. Display the heap profile using a postscript viewer such as Ghostview,
+   or print it out on a Postscript-capable printer.
+
+For example, here is a heap profile produced for the ``sphere`` program
+from GHC's ``nofib`` benchmark suite,
+
+.. image:: images/prof_scc.*
+
+You might also want to take a look at
+`hp2any <http://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
+suite of tools (not distributed with GHC) for displaying heap profiles.
+
+.. _rts-options-heap-prof:
+
+RTS options for heap profiling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are several different kinds of heap profile that can be generated.
+All the different profile types yield a graph of live heap against time,
+but they differ in how the live heap is broken down into bands. The
+following RTS options select which break-down to use:
+
+``-hc``
+    .. index::
+       single: -hc; RTS option
+
+    (can be shortened to ``-h``). Breaks down the graph by the
+    cost-centre stack which produced the data.
+
+``-hm``
+    .. index::
+       single: -hm; RTS option
+
+    Break down the live heap by the module containing the code which
+    produced the data.
+
+``-hd``
+    .. index::
+       single: -hd; RTS option
+
+    Breaks down the graph by closure description. For actual data, the
+    description is just the constructor name, for other closures it is a
+    compiler-generated string identifying the closure.
+
+``-hy``
+    .. index::
+       single: -hy; RTS option
+
+    Breaks down the graph by type. For closures which have function type
+    or unknown/polymorphic type, the string will represent an
+    approximation to the actual type.
+
+``-hr``
+    .. index::
+       single: -hr; RTS option
+
+    Break down the graph by retainer set. Retainer profiling is
+    described in more detail below (:ref:`retainer-prof`).
+
+``-hb``
+    .. index::
+       single: -hb; RTS option
+
+    Break down the graph by biography. Biographical profiling is
+    described in more detail below (:ref:`biography-prof`).
+
+In addition, the profile can be restricted to heap data which satisfies
+certain criteria - for example, you might want to display a profile by
+type but only for data produced by a certain module, or a profile by
+retainer for a certain type of data. Restrictions are specified as
+follows:
+
+``-hc ⟨name⟩``
+    .. index::
+       single: -hc; RTS option
+
+    Restrict the profile to closures produced by cost-centre stacks with
+    one of the specified cost centres at the top.
+
+``-hC ⟨name⟩``
+    .. index::
+       single: -hC; RTS option
+
+    Restrict the profile to closures produced by cost-centre stacks with
+    one of the specified cost centres anywhere in the stack.
+
+``-hm ⟨module⟩``
+    .. index::
+       single: -hm; RTS option
+
+    Restrict the profile to closures produced by the specified modules.
+
+``-hd ⟨desc⟩``
+    .. index::
+       single: -hd; RTS option
+
+    Restrict the profile to closures with the specified description
+    strings.
+
+``-hy ⟨type⟩``
+    .. index::
+       single: -hy; RTS option
+
+    Restrict the profile to closures with the specified types.
+
+``-hr ⟨cc⟩``
+    .. index::
+       single: -hr; RTS option
+
+    Restrict the profile to closures with retainer sets containing
+    cost-centre stacks with one of the specified cost centres at the
+    top.
+
+``-hb ⟨bio⟩``
+    .. index::
+       single: -hb; RTS option
+
+    Restrict the profile to closures with one of the specified
+    biographies, where ⟨bio⟩ is one of ``lag``, ``drag``, ``void``, or
+    ``use``.
+
+For example, the following options will generate a retainer profile
+restricted to ``Branch`` and ``Leaf`` constructors:
+
+::
+
+    prog +RTS -hr -hdBranch,Leaf
+
+There can only be one "break-down" option (eg. ``-hr`` in the example
+above), but there is no limit on the number of further restrictions that
+may be applied. All the options may be combined, with one exception: GHC
+doesn't currently support mixing the ``-hr`` and ``-hb`` options.
+
+There are three more options which relate to heap profiling:
+
+``-isecs``
+    .. index::
+       single: -i
+
+    Set the profiling (sampling) interval to ⟨secs⟩ seconds (the default
+    is 0.1 second). Fractions are allowed: for example ``-i0.2`` will
+    get 5 samples per second. This only affects heap profiling; time
+    profiles are always sampled with the frequency of the RTS clock. See
+    :ref:`prof-time-options` for changing that.
+
+``-xt``
+    .. index::
+       single: -xt; RTS option
+
+    Include the memory occupied by threads in a heap profile. Each
+    thread takes up a small area for its thread state in addition to the
+    space allocated for its stack (stacks normally start small and then
+    grow as necessary).
+
+    This includes the main thread, so using ``-xt`` is a good way to see
+    how much stack space the program is using.
+
+    Memory occupied by threads and their stacks is labelled as “TSO” and
+    “STACK” respectively when displaying the profile by closure
+    description or type description.
+
+``-Lnum``
+    .. index::
+       single: -L; RTS option
+
+    Sets the maximum length of a cost-centre stack name in a heap
+    profile. Defaults to 25.
+
+.. _retainer-prof:
+
+Retainer Profiling
+~~~~~~~~~~~~~~~~~~
+
+Retainer profiling is designed to help answer questions like “why is
+this data being retained?”. We start by defining what we mean by a
+retainer:
+
+    A retainer is either the system stack, an unevaluated closure
+    (thunk), or an explicitly mutable object.
+
+In particular, constructors are *not* retainers.
+
+An object B retains object A if (i) B is a retainer object and (ii)
+object A can be reached by recursively following pointers starting from
+object B, but not meeting any other retainer objects on the way. Each
+live object is retained by one or more retainer objects, collectively
+called its retainer set, or its retainer set, or its retainers.
+
+When retainer profiling is requested by giving the program the ``-hr``
+option, a graph is generated which is broken down by retainer set. A
+retainer set is displayed as a set of cost-centre stacks; because this
+is usually too large to fit on the profile graph, each retainer set is
+numbered and shown abbreviated on the graph along with its number, and
+the full list of retainer sets is dumped into the file ``prog.prof``.
+
+Retainer profiling requires multiple passes over the live heap in order
+to discover the full retainer set for each object, which can be quite
+slow. So we set a limit on the maximum size of a retainer set, where all
+retainer sets larger than the maximum retainer set size are replaced by
+the special set ``MANY``. The maximum set size defaults to 8 and can be
+altered with the ``-R`` RTS option:
+
+``-R ⟨size⟩``
+    Restrict the number of elements in a retainer set to ⟨size⟩ (default
+    8).
+
+Hints for using retainer profiling
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The definition of retainers is designed to reflect a common cause of
+space leaks: a large structure is retained by an unevaluated
+computation, and will be released once the computation is forced. A good
+example is looking up a value in a finite map, where unless the lookup
+is forced in a timely manner the unevaluated lookup will cause the whole
+mapping to be retained. These kind of space leaks can often be
+eliminated by forcing the relevant computations to be performed eagerly,
+using ``seq`` or strictness annotations on data constructor fields.
+
+Often a particular data structure is being retained by a chain of
+unevaluated closures, only the nearest of which will be reported by
+retainer profiling - for example A retains B, B retains C, and C retains
+a large structure. There might be a large number of Bs but only a single
+A, so A is really the one we're interested in eliminating. However,
+retainer profiling will in this case report B as the retainer of the
+large structure. To move further up the chain of retainers, we can ask
+for another retainer profile but this time restrict the profile to B
+objects, so we get a profile of the retainers of B:
+
+::
+
+    prog +RTS -hr -hcB
+
+This trick isn't foolproof, because there might be other B closures in
+the heap which aren't the retainers we are interested in, but we've
+found this to be a useful technique in most cases.
+
+.. _biography-prof:
+
+Biographical Profiling
+~~~~~~~~~~~~~~~~~~~~~~
+
+A typical heap object may be in one of the following four states at each
+point in its lifetime:
+
+-  The lag stage, which is the time between creation and the first use
+   of the object,
+
+-  the use stage, which lasts from the first use until the last use of
+   the object, and
+
+-  The drag stage, which lasts from the final use until the last
+   reference to the object is dropped.
+
+-  An object which is never used is said to be in the void state for its
+   whole lifetime.
+
+A biographical heap profile displays the portion of the live heap in
+each of the four states listed above. Usually the most interesting
+states are the void and drag states: live heap in these states is more
+likely to be wasted space than heap in the lag or use states.
+
+It is also possible to break down the heap in one or more of these
+states by a different criteria, by restricting a profile by biography.
+For example, to show the portion of the heap in the drag or void state
+by producer:
+
+::
+
+    prog +RTS -hc -hbdrag,void
+
+Once you know the producer or the type of the heap in the drag or void
+states, the next step is usually to find the retainer(s):
+
+::
+
+    prog +RTS -hr -hccc...
+
+.. note::
+    This two stage process is required because GHC cannot currently
+    profile using both biographical and retainer information simultaneously.
+
+.. _mem-residency:
+
+Actual memory residency
+~~~~~~~~~~~~~~~~~~~~~~~
+
+How does the heap residency reported by the heap profiler relate to the
+actual memory residency of your program when you run it? You might see a
+large discrepancy between the residency reported by the heap profiler,
+and the residency reported by tools on your system (eg. ``ps`` or
+``top`` on Unix, or the Task Manager on Windows). There are several
+reasons for this:
+
+-  There is an overhead of profiling itself, which is subtracted from
+   the residency figures by the profiler. This overhead goes away when
+   compiling without profiling support, of course. The space overhead is
+   currently 2 extra words per heap object, which probably results in
+   about a 30% overhead.
+
+-  Garbage collection requires more memory than the actual residency.
+   The factor depends on the kind of garbage collection algorithm in
+   use: a major GC in the standard generation copying collector will
+   usually require 3L bytes of memory, where L is the amount of live
+   data. This is because by default (see the ``+RTS -F`` option) we
+   allow the old generation to grow to twice its size (2L) before
+   collecting it, and we require additionally L bytes to copy the live
+   data into. When using compacting collection (see the ``+RTS -c``
+   option), this is reduced to 2L, and can further be reduced by
+   tweaking the ``-F`` option. Also add the size of the allocation area
+   (currently a fixed 512Kb).
+
+-  The stack isn't counted in the heap profile by default. See the
+   ``+RTS -xt`` option.
+
+-  The program text itself, the C stack, any non-heap data (e.g. data
+   allocated by foreign libraries, and data allocated by the RTS), and
+   ``mmap()``\'d memory are not counted in the heap profile.
+
+.. _hp2ps:
+
+``hp2ps`` -- Rendering heap profiles to PostScript
+--------------------------------------------------
+
+.. index::
+   single: hp2ps
+   single: heap profiles
+   single: postscript, from heap profiles
+   single: -h⟨break-down⟩
+
+Usage:
+
+::
+
+    hp2ps [flags] [<file>[.hp]]
+
+The program ``hp2ps`` program converts a heap profile as produced
+by the ``-h<break-down>`` runtime option into a PostScript graph of the
+heap profile. By convention, the file to be processed by ``hp2ps`` has a
+``.hp`` extension. The PostScript output is written to ``<file>@.ps``.
+If ``<file>`` is omitted entirely, then the program behaves as a filter.
+
+``hp2ps`` is distributed in ``ghc/utils/hp2ps`` in a GHC source
+distribution. It was originally developed by Dave Wakeling as part of
+the HBC/LML heap profiler.
+
+The flags are:
+
+``-d``
+    In order to make graphs more readable, ``hp2ps`` sorts the shaded
+    bands for each identifier. The default sort ordering is for the
+    bands with the largest area to be stacked on top of the smaller
+    ones. The ``-d`` option causes rougher bands (those representing
+    series of values with the largest standard deviations) to be stacked
+    on top of smoother ones.
+
+``-b``
+    Normally, ``hp2ps`` puts the title of the graph in a small box at
+    the top of the page. However, if the JOB string is too long to fit
+    in a small box (more than 35 characters), then ``hp2ps`` will choose
+    to use a big box instead. The ``-b`` option forces ``hp2ps`` to use
+    a big box.
+
+``-e<float>[in|mm|pt]``
+    Generate encapsulated PostScript suitable for inclusion in LaTeX
+    documents. Usually, the PostScript graph is drawn in landscape mode
+    in an area 9 inches wide by 6 inches high, and ``hp2ps`` arranges
+    for this area to be approximately centred on a sheet of a4 paper.
+    This format is convenient of studying the graph in detail, but it is
+    unsuitable for inclusion in LaTeX documents. The ``-e`` option
+    causes the graph to be drawn in portrait mode, with float specifying
+    the width in inches, millimetres or points (the default). The
+    resulting PostScript file conforms to the Encapsulated PostScript
+    (EPS) convention, and it can be included in a LaTeX document using
+    Rokicki's dvi-to-PostScript converter ``dvips``.
+
+``-g``
+    Create output suitable for the ``gs`` PostScript previewer (or
+    similar). In this case the graph is printed in portrait mode without
+    scaling. The output is unsuitable for a laser printer.
+
+``-l``
+    Normally a profile is limited to 20 bands with additional
+    identifiers being grouped into an ``OTHER`` band. The ``-l`` flag
+    removes this 20 band and limit, producing as many bands as
+    necessary. No key is produced as it won't fit!. It is useful for
+    creation time profiles with many bands.
+
+``-m<int>``
+    Normally a profile is limited to 20 bands with additional
+    identifiers being grouped into an ``OTHER`` band. The ``-m`` flag
+    specifies an alternative band limit (the maximum is 20).
+
+    ``-m0`` requests the band limit to be removed. As many bands as
+    necessary are produced. However no key is produced as it won't fit!
+    It is useful for displaying creation time profiles with many bands.
+
+``-p``
+    Use previous parameters. By default, the PostScript graph is
+    automatically scaled both horizontally and vertically so that it
+    fills the page. However, when preparing a series of graphs for use
+    in a presentation, it is often useful to draw a new graph using the
+    same scale, shading and ordering as a previous one. The ``-p`` flag
+    causes the graph to be drawn using the parameters determined by a
+    previous run of ``hp2ps`` on ``file``. These are extracted from
+    ``file@.aux``.
+
+``-s``
+    Use a small box for the title.
+
+``-t<float>``
+    Normally trace elements which sum to a total of less than 1% of the
+    profile are removed from the profile. The ``-t`` option allows this
+    percentage to be modified (maximum 5%).
+
+    ``-t0`` requests no trace elements to be removed from the profile,
+    ensuring that all the data will be displayed.
+
+``-c``
+    Generate colour output.
+
+``-y``
+    Ignore marks.
+
+``-?``
+    Print out usage information.
+
+.. _manipulating-hp:
+
+Manipulating the hp file
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+(Notes kindly offered by Jan-Willem Maessen.)
+
+The ``FOO.hp`` file produced when you ask for the heap profile of a
+program ``FOO`` is a text file with a particularly simple structure.
+Here's a representative example, with much of the actual data omitted:
+
+::
+
+    JOB "FOO -hC"
+    DATE "Thu Dec 26 18:17 2002"
+    SAMPLE_UNIT "seconds"
+    VALUE_UNIT "bytes"
+    BEGIN_SAMPLE 0.00
+    END_SAMPLE 0.00
+    BEGIN_SAMPLE 15.07
+      ... sample data ...
+    END_SAMPLE 15.07
+    BEGIN_SAMPLE 30.23
+      ... sample data ...
+    END_SAMPLE 30.23
+    ... etc.
+    BEGIN_SAMPLE 11695.47
+    END_SAMPLE 11695.47
+
+The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
+``VALUE_UNIT``) form a header. Each block of lines starting with
+``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
+(you can think of this as a vertical slice of your heap profile). The
+hp2ps utility should accept any input with a properly-formatted header
+followed by a series of *complete* samples.
+
+Zooming in on regions of your profile
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can look at particular regions of your profile simply by loading a
+copy of the ``.hp`` file into a text editor and deleting the unwanted
+samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
+viewed or printed.
+
+Viewing the heap profile of a running program
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``.hp`` file is generated incrementally as your program runs. In
+principle, running ``hp2ps`` on the incomplete file should produce a
+snapshot of your program's heap usage. However, the last sample in the
+file may be incomplete, causing ``hp2ps`` to fail. If you are using a
+machine with UNIX utilities installed, it's not too hard to work around
+this problem (though the resulting command line looks rather Byzantine):
+
+::
+
+    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+        | hp2ps > FOO.ps
+
+The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
+complete sample in ``FOO.hp``, and labels each sample with its ending
+line number. We then select the line number of the last complete sample
+using ``tail`` and ``cut``. This is used as a parameter to ``head``; the
+result is as if we deleted the final incomplete sample from ``FOO.hp``.
+This results in a properly-formatted .hp file which we feed directly to
+``hp2ps``.
+
+Viewing a heap profile in real time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``gv`` and ``ghostview`` programs have a "watch file" option can be
+used to view an up-to-date heap profile of your program as it runs.
+Simply generate an incremental heap profile as described in the previous
+section. Run ``gv`` on your profile:
+
+::
+
+      gv -watch -orientation=seascape FOO.ps
+
+If you forget the ``-watch`` flag you can still select "Watch file" from
+the "State" menu. Now each time you generate a new profile ``FOO.ps``
+the view will update automatically.
+
+This can all be encapsulated in a little script:
+
+::
+
+      #!/bin/sh
+      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+        | hp2ps > FOO.ps
+      gv -watch -orientation=seascape FOO.ps &
+      while [ 1 ] ; do
+        sleep 10 # We generate a new profile every 10 seconds.
+        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+          | hp2ps > FOO.ps
+      done
+
+Occasionally ``gv`` will choke as it tries to read an incomplete copy of
+``FOO.ps`` (because ``hp2ps`` is still running as an update occurs). A
+slightly more complicated script works around this problem, by using the
+fact that sending a SIGHUP to gv will cause it to re-read its input
+file:
+
+::
+
+      #!/bin/sh
+      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+        | hp2ps > FOO.ps
+      gv FOO.ps &
+      gvpsnum=$!
+      while [ 1 ] ; do
+        sleep 10
+        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+          | hp2ps > FOO.ps
+        kill -HUP $gvpsnum
+      done
+
+.. _prof-threaded:
+
+Profiling Parallel and Concurrent Programs
+------------------------------------------
+
+Combining ``-threaded`` and ``-prof`` is perfectly fine, and indeed it
+is possible to profile a program running on multiple processors with the
+``+RTS -N`` option. [3]_
+
+Some caveats apply, however. In the current implementation, a profiled
+program is likely to scale much less well than the unprofiled program,
+because the profiling implementation uses some shared data structures
+which require locking in the runtime system. Furthermore, the memory
+allocation statistics collected by the profiled program are stored in
+shared memory but *not* locked (for speed), which means that these
+figures might be inaccurate for parallel programs.
+
+We strongly recommend that you use ``-fno-prof-count-entries`` when
+compiling a program to be profiled on multiple cores, because the entry
+counts are also stored in shared memory, and continuously updating them
+on multiple cores is extremely slow.
+
+We also recommend using
+`ThreadScope <http://www.haskell.org/haskellwiki/ThreadScope>`__ for
+profiling parallel programs; it offers a GUI for visualising parallel
+execution, and is complementary to the time and space profiling features
+provided with GHC.
+
+.. _hpc:
+
+Observing Code Coverage
+-----------------------
+
+.. index::
+   single: code coverage
+   single: Haskell Program Coverage
+   single: hpc
+
+Code coverage tools allow a programmer to determine what parts of their
+code have been actually executed, and which parts have never actually
+been invoked. GHC has an option for generating instrumented code that
+records code coverage as part of the Haskell Program Coverage (HPC)
+toolkit, which is included with GHC. HPC tools can be used to render the
+generated code coverage information into human understandable format.
+
+Correctly instrumented code provides coverage information of two kinds:
+source coverage and boolean-control coverage. Source coverage is the
+extent to which every part of the program was used, measured at three
+different levels: declarations (both top-level and local), alternatives
+(among several equations or case branches) and expressions (at every
+level). Boolean coverage is the extent to which each of the values True
+and False is obtained in every syntactic boolean context (ie. guard,
+condition, qualifier).
+
+HPC displays both kinds of information in two primary ways: textual
+reports with summary statistics (``hpc report``) and sources with color
+mark-up (``hpc markup``). For boolean coverage, there are four possible
+outcomes for each guard, condition or qualifier: both True and False
+values occur; only True; only False; never evaluated. In hpc-markup
+output, highlighting with a yellow background indicates a part of the
+program that was never evaluated; a green background indicates an
+always-True expression and a red background indicates an always-False
+one.
+
+A small example: Reciprocation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For an example we have a program, called ``Recip.hs``, which computes
+exact decimal representations of reciprocals, with recurring parts
+indicated in brackets.
+
+::
+
+    reciprocal :: Int -> (String, Int)
+    reciprocal n | n > 1 = ('0' : '.' : digits, recur)
+                 | otherwise = error
+                  "attempting to compute reciprocal of number <= 1"
+      where
+      (digits, recur) = divide n 1 []
+    divide :: Int -> Int -> [Int] -> (String, Int)
+    divide n c cs | c `elem` cs = ([], position c cs)
+                  | r == 0      = (show q, 0)
+                  | r /= 0      = (show q ++ digits, recur)
+      where
+      (q, r) = (c*10) `quotRem` n
+      (digits, recur) = divide n r (c:cs)
+
+    position :: Int -> [Int] -> Int
+    position n (x:xs) | n==x      = 1
+                      | otherwise = 1 + position n xs
+
+    showRecip :: Int -> String
+    showRecip n =
+      "1/" ++ show n ++ " = " ++
+      if r==0 then d else take p d ++ "(" ++ drop p d ++ ")"
+      where
+      p = length d - r
+      (d, r) = reciprocal n
+
+    main = do
+      number <- readLn
+      putStrLn (showRecip number)
+      main
+
+HPC instrumentation is enabled with the -fhpc flag:
+
+::
+
+    $ ghc -fhpc Recip.hs
+
+GHC creates a subdirectory ``.hpc`` in the current directory, and puts
+HPC index (``.mix``) files in there, one for each module compiled. You
+don't need to worry about these files: they contain information needed
+by the ``hpc`` tool to generate the coverage data for compiled modules
+after the program is run.
+
+::
+
+    $ ./Recip
+    1/3
+    = 0.(3)
+
+Running the program generates a file with the ``.tix`` suffix, in this
+case ``Recip.tix``, which contains the coverage data for this run of the
+program. The program may be run multiple times (e.g. with different test
+data), and the coverage data from the separate runs is accumulated in
+the ``.tix`` file. To reset the coverage data and start again, just
+remove the ``.tix`` file.
+
+Having run the program, we can generate a textual summary of coverage:
+
+::
+
+    $ hpc report Recip
+     80% expressions used (81/101)
+     12% boolean coverage (1/8)
+          14% guards (1/7), 3 always True,
+                            1 always False,
+                            2 unevaluated
+           0% 'if' conditions (0/1), 1 always False
+         100% qualifiers (0/0)
+     55% alternatives used (5/9)
+    100% local declarations used (9/9)
+    100% top-level declarations used (5/5)
+
+We can also generate a marked-up version of the source.
+
+::
+
+    $ hpc markup Recip
+    writing Recip.hs.html
+
+This generates one file per Haskell module, and 4 index files,
+``hpc_index.html``, ``hpc_index_alt.html``, ``hpc_index_exp.html``,
+``hpc_index_fun.html``.
+
+Options for instrumenting code for coverage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``-fhpc``
+    Enable code coverage for the current module or modules being
+    compiled.
+
+    Modules compiled with this option can be freely mixed with modules
+    compiled without it; indeed, most libraries will typically be
+    compiled without ``-fhpc``. When the program is run, coverage data
+    will only be generated for those modules that were compiled with
+    ``-fhpc``, and the ``hpc`` tool will only show information about
+    those modules.
+
+The hpc toolkit
+~~~~~~~~~~~~~~~
+
+The hpc command has several sub-commands:
+
+::
+
+    $ hpc
+    Usage: hpc COMMAND ...
+
+    Commands:
+      help        Display help for hpc or a single command
+    Reporting Coverage:
+      report      Output textual report about program coverage
+      markup      Markup Haskell source with program coverage
+    Processing Coverage files:
+      sum         Sum multiple .tix files in a single .tix file
+      combine     Combine two .tix files in a single .tix file
+      map         Map a function over a single .tix file
+    Coverage Overlays:
+      overlay     Generate a .tix file from an overlay file
+      draft       Generate draft overlay that provides 100% coverage
+    Others:
+      show        Show .tix file in readable, verbose format
+      version     Display version for hpc
+
+In general, these options act on a ``.tix`` file after an instrumented
+binary has generated it.
+
+The hpc tool assumes you are in the top-level directory of the location
+where you built your application, and the ``.tix`` file is in the same
+top-level directory. You can use the flag ``--srcdir`` to use ``hpc``
+for any other directory, and use ``--srcdir`` multiple times to analyse
+programs compiled from difference locations, as is typical for packages.
+
+We now explain in more details the major modes of hpc.
+
+hpc report
+^^^^^^^^^^
+
+``hpc report`` gives a textual report of coverage. By default, all
+modules and packages are considered in generating report, unless include
+or exclude are used. The report is a summary unless the ``--per-module``
+flag is used. The ``--xml-output`` option allows for tools to use hpc to
+glean coverage.
+
+::
+
+    $ hpc help report
+    Usage: hpc report [OPTION] .. <TIX_FILE> [<MODULE> [<MODULE> ..]]
+
+    Options:
+
+        --per-module                  show module level detail
+        --decl-list                   show unused decls
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --srcdir=DIR                  path to source directory of .hs files
+                                      multi-use of srcdir possible
+        --hpcdir=DIR                  append sub-directory that contains .mix files
+                                      default .hpc [rarely used]
+        --reset-hpcdirs               empty the list of hpcdir's
+                                      [rarely used]
+        --xml-output                  show output in XML
+
+hpc markup
+^^^^^^^^^^
+
+``hpc markup`` marks up source files into colored html.
+
+::
+
+    $ hpc help markup
+    Usage: hpc markup [OPTION] .. <TIX_FILE> [<MODULE> [<MODULE> ..]]
+
+    Options:
+
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --srcdir=DIR                  path to source directory of .hs files
+                                      multi-use of srcdir possible
+        --hpcdir=DIR                  append sub-directory that contains .mix files
+                                      default .hpc [rarely used]
+        --reset-hpcdirs               empty the list of hpcdir's
+                                      [rarely used]
+        --fun-entry-count             show top-level function entry counts
+        --highlight-covered           highlight covered code, rather that code gaps
+        --destdir=DIR                 path to write output to
+
+hpc sum
+^^^^^^^
+
+``hpc sum`` adds together any number of ``.tix`` files into a single
+``.tix`` file. ``hpc sum`` does not change the original ``.tix`` file;
+it generates a new ``.tix`` file.
+
+::
+
+    $ hpc help sum
+    Usage: hpc sum [OPTION] .. <TIX_FILE> [<TIX_FILE> [<TIX_FILE> ..]]
+    Sum multiple .tix files in a single .tix file
+
+    Options:
+
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --output=FILE                 output FILE
+        --union                       use the union of the module namespace (default is intersection)
+
+hpc combine
+^^^^^^^^^^^
+
+``hpc combine`` is the swiss army knife of ``hpc``. It can be used to
+take the difference between ``.tix`` files, to subtract one ``.tix``
+file from another, or to add two ``.tix`` files. hpc combine does not
+change the original ``.tix`` file; it generates a new ``.tix`` file.
+
+::
+
+    $ hpc help combine
+    Usage: hpc combine [OPTION] .. <TIX_FILE> <TIX_FILE>
+    Combine two .tix files in a single .tix file
+
+    Options:
+
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --output=FILE                 output FILE
+        --function=FUNCTION           combine .tix files with join function, default = ADD
+                                      FUNCTION = ADD | DIFF | SUB
+        --union                       use the union of the module namespace (default is intersection)
+
+hpc map
+^^^^^^^
+
+hpc map inverts or zeros a ``.tix`` file. hpc map does not change the
+original ``.tix`` file; it generates a new ``.tix`` file.
+
+::
+
+    $ hpc help map
+    Usage: hpc map [OPTION] .. <TIX_FILE>
+    Map a function over a single .tix file
+
+    Options:
+
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --output=FILE                 output FILE
+        --function=FUNCTION           apply function to .tix files, default = ID
+                                      FUNCTION = ID | INV | ZERO
+        --union                       use the union of the module namespace (default is intersection)
+
+hpc overlay and hpc draft
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Overlays are an experimental feature of HPC, a textual description of
+coverage. hpc draft is used to generate a draft overlay from a .tix
+file, and hpc overlay generates a .tix files from an overlay.
+
+::
+
+    % hpc help overlay
+    Usage: hpc overlay [OPTION] .. <OVERLAY_FILE> [<OVERLAY_FILE> [...]]
+
+    Options:
+
+        --srcdir=DIR   path to source directory of .hs files
+                       multi-use of srcdir possible
+        --hpcdir=DIR                  append sub-directory that contains .mix files
+                                      default .hpc [rarely used]
+        --reset-hpcdirs               empty the list of hpcdir's
+                                      [rarely used]
+        --output=FILE  output FILE
+    % hpc help draft
+    Usage: hpc draft [OPTION] .. <TIX_FILE>
+
+    Options:
+
+        --exclude=[PACKAGE:][MODULE]  exclude MODULE and/or PACKAGE
+        --include=[PACKAGE:][MODULE]  include MODULE and/or PACKAGE
+        --srcdir=DIR                  path to source directory of .hs files
+                                      multi-use of srcdir possible
+        --hpcdir=DIR                  append sub-directory that contains .mix files
+                                      default .hpc [rarely used]
+        --reset-hpcdirs               empty the list of hpcdir's
+                                      [rarely used]
+        --output=FILE                 output FILE
+
+Caveats and Shortcomings of Haskell Program Coverage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+HPC does not attempt to lock the ``.tix`` file, so multiple concurrently
+running binaries in the same directory will exhibit a race condition.
+There is no way to change the name of the ``.tix`` file generated, apart
+from renaming the binary. HPC does not work with GHCi.
+
+.. _ticky-ticky:
+
+Using “ticky-ticky” profiling (for implementors)
+------------------------------------------------
+
+.. index::
+   single: ticky-ticky profiling
+
+Because ticky-ticky profiling requires a certain familiarity with GHC
+internals, we have moved the documentation to the GHC developers wiki.
+Take a look at its
+:ghc-wiki:`overview of the profiling options <Commentary/Profiling>`,
+which includeds a link to the ticky-ticky profiling page.
+
+.. [1]
+   ``-fprof-auto`` was known as ``-auto-all``\ ``-auto-all`` prior to
+   GHC 7.4.1.
+
+.. [2]
+   Note that this policy has changed slightly in GHC 7.4.1 relative to
+   earlier versions, and may yet change further, feedback is welcome.
+
+.. [3]
+   This feature was added in GHC 7.4.1.