diff options
Diffstat (limited to 'docs/users_guide/profiling.rst')
-rw-r--r-- | docs/users_guide/profiling.rst | 148 |
1 files changed, 148 insertions, 0 deletions
diff --git a/docs/users_guide/profiling.rst b/docs/users_guide/profiling.rst index dcfd42b84e..9a905ba3de 100644 --- a/docs/users_guide/profiling.rst +++ b/docs/users_guide/profiling.rst @@ -1726,6 +1726,37 @@ Using “ticky-ticky” profiling (for implementors) *by* each closure type. See :ghc-flag:`-ticky-allocd` to keep track of allocations *of* each closure type as well. + +GHC's ticky-ticky profiler provides a low-level facility for tracking +entry and allocation counts of particular individual closures. +Ticky-ticky profiling requires a certain familiarity with GHC +internals, so it is best suited for expert users, but can provide an invaluable +precise insight into the allocation behaviour of your programs. + +Getting started with ticky profiling consists of three steps. + +1. Add the ``-ticky`` flag when compiling a Haskell module to enable "ticky-ticky" profiling of that module. This makes GHC emit performance-counting instructions in every STG function. + +2. Add ``-ticky`` to the command line when linking, so that you link against a version of the runtime system that allows you to display the results. In fact, in the link phase -ticky implies -debug, so you get the debug version of the runtime system too. + +3. Then when running your program you can collect the results of the profiling in two ways. + + * Using the eventlog, the :rts-flag:`-lT <-l ⟨flags⟩>` flag will emit ticky samples + to the eventlog periodically. + This has the advantage of being able to resolve dynamic behaviors over the program's + lifetime. See :ref:`ticky-event-format` for details on the event types + reported. The ticky information can be rendered into an interactive table + using eventlog2html. + * A legacy textual format is emitted using the :rts-flag:`-r ⟨file⟩` flag. This + produces a textual table containing information about how much each counter + ticked throughout the duration of the program. + +Additional Ticky Flags +---------------------- + +There are some additional flags which can be used to increase the number of +ticky counters and the quality of the profile. + .. ghc-flag:: -ticky-allocd :shortdesc: Track the number of times each closure type is allocated. :type: dynamic @@ -1765,6 +1796,14 @@ Using “ticky-ticky” profiling (for implementors) Note that these counters are currently not processed well be eventlog2html. So if you want to check them you will have to use the text based interface. +.. ghc-flag:: -ticky-ap-thunk + :shortdesc: Don't use standard AP thunks on order to get more reliable entry counters. + :type: dynamic + :category: + + This allows us to get accurate entry counters for code like `f x y` at the cost of code size. + We do this but not using the precomputed standard AP thunk code. + GHC's ticky-ticky profiler provides a low-level facility for tracking entry and allocation counts of particular individual closures. Because ticky-ticky profiling requires a certain familiarity with GHC @@ -1780,6 +1819,115 @@ advantage of being able to resolve dynamic behaviors over the program's lifetime. See :ref:`ticky-event-format` for details on the event types reported. +Understanding the Output of Ticky-Ticky profiles +------------------------------------------------ + +Once you have your rendered profile then you can begin to understand the allocation +behaviour of your program. There are two classes of ticky-ticky counters. + +Name-specific counters + + Each "name-specific counter" is associated with a name that is defined in the + result of the optimiser. For each such name, there are three possible counters: + entries, heap allocation by the named thing, and heap used to allocate that + named thing. + +Global counters + + Each "global counter" describes some aspect of the entire program execution. + For example, one global counter tracks total heap allocation; another tracks allocation for PAPs. + +In general you are probably interested mostly in the name-specific counters as these +can provided detailed information about where allocates how much in your program. + +Information about name-specific counters +---------------------------------------- + +Name-specific counters provide the following information about a closure. + +* Entries - How many times the closure was entered. +* Allocs - How much (in bytes) is allocated *by* that closure. +* Allod - How often the closure is allocated. +* FVs - The free variables captured by that closure. +* Args - The arguments that closure takes. + +The FVs and Args information is encoded using a small DSL. + ++------------------+---------------------------------------------------+ +|Classification |Description | ++==================+===================================================+ +|+ |dictionary | ++------------------+---------------------------------------------------+ +|\> |function | ++------------------+---------------------------------------------------+ +|{C,I,F,D,W} | char, int, float, double, word | ++------------------+---------------------------------------------------+ +|{c,i,f,d,w} | unboxed ditto | ++------------------+---------------------------------------------------+ +|T | tuple | ++------------------+---------------------------------------------------+ +|P | other primitive type | ++------------------+---------------------------------------------------+ +|p | unboxed primitive type | ++------------------+---------------------------------------------------+ +|L | list | ++------------------+---------------------------------------------------+ +|E | enumeration type | ++------------------+---------------------------------------------------+ +|S | Single constructor type | ++------------------+---------------------------------------------------+ +|M | Multi constructor type | ++------------------+---------------------------------------------------+ +|. |other type | ++------------------+---------------------------------------------------+ +|- |reserved for others to mark as "uninteresting" | ++------------------+---------------------------------------------------+ + +In particular note that you can use the ticky profiler to see any function +calls to dictionary arguments by searching the profile for the ``+`` classifier. +This indicates that the function has failed to specialise for one reason or another. + +Examples +-------- + +A typical use of ticky-ticky would be to generate a ticky report using the eventlog by evoking an +application with RTS arguments like this: + +``app <args> +RTS -l-augT`` + +This will produce an eventlog file which contains results from ticky counters. This file can +be manually inspected like any regular eventlog. However for ticky-ticky eventlog2html has +good support for producing tables from these logs. + +With an up to date version of eventlog2html this can be simply done by invoking eventlog2html +on the produced eventlog. In the example above the invocation would then be ``eventlog2html app.eventlog`` +Which will produce a searchable and sortable table containing all the ticky counters in the log. + +Notes about ticky profiling +--------------------------- + +* You can mix together modules compiled with and without ``-ticky`` but you will + miss out on allocations and counts from uninstrumented modules in the profile. + +* Linking with the ``-ticky`` has a quite severe performance impact on your program. + ``-ticky`` implies using the unoptimised ``-debug`` RTS. Therefore ``-ticky`` + shouldn't be used for production builds. + +* Building with ``-ticky`` doesn't affect core optimisations of your program as the + counters are inserted after the STG pipeline. At which point most optimizations have + already been run. + +* When using the eventlog it is possible to combine together ticky-ticky and IPE + based profiling as each ticky counter definition has an associated info table. + This address can be looked up in the IPE map so that further information (such + as source location) can be determined about that closure. + +* Global ticky counters are only available in the textual ticky output (``+RTS -r``). + But this mode has some limitations (e.g. on column widths) and will contain raw json output + in some columns. For this reason using an eventlog-based approach should be prefered if + possible. + + .. [1] :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to GHC 7.4.1. |