diff options
author | trowbridge.jon <trowbridge.jon@6b5cf1ce-ec42-a296-1ba9-69fdba395a50> | 2006-12-28 22:39:33 +0000 |
---|---|---|
committer | trowbridge.jon <trowbridge.jon@6b5cf1ce-ec42-a296-1ba9-69fdba395a50> | 2006-12-28 22:39:33 +0000 |
commit | 66737d1c2519e4a1622f61139bfe2f683ea3696c (patch) | |
tree | d01660d44ccc1250b6db3516e733b488ae4fef71 /docs | |
parent | 55d679a05f0518ea73a4bca6e8b71b54fcecf68f (diff) | |
download | gperftools-66737d1c2519e4a1622f61139bfe2f683ea3696c.tar.gz |
Import of HTML documentation from SourceForge.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@3 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
Diffstat (limited to 'docs')
29 files changed, 1225 insertions, 0 deletions
diff --git a/docs/html/cpu_profiler.html b/docs/html/cpu_profiler.html new file mode 100644 index 0000000..f05d5ec --- /dev/null +++ b/docs/html/cpu_profiler.html @@ -0,0 +1,409 @@ +<html><head><title>Google CPU Profiler</title></head><body> + +This is the CPU profiler we use at Google. There are three parts to +using it: linking the library into an application, running the code, +and analyzing the output. + + +<h1>Linking in the Library</h1> + +<p>To install the CPU profiler into your executable, add -lprofiler to +the link-time step for your executable. (It's also probably possible +to add in the profiler at run-time using LD_PRELOAD, but this isn't +necessarily recommended.)</p> + +<p>This does <i>not</i> turn on CPU profiling; it just inserts the code. +For that reason, it's practical to just always link -lprofiler into a +binary while developing; that's what we do at Google. (However, since +any user can turn on the profiler by setting an environment variable, +it's not necessarily recommended to install profiler-linked binaries +into a production, running system.)</p> + + +<h1>Running the Code</h1> + +<p>There are two alternatives to actually turn on CPU profiling for a +given run of an executable:</p> + +<ol> +<li> Define the environment variable CPUPROFILE to the filename to dump the + profile to. For instance, to profile /usr/local/netscape: + <pre> $ CPUPROFILE=/tmp/profile /usr/local/netscape # sh + % setenv CPUPROFILE /tmp/profile; /usr/local/netscape # csh + </pre> + OR + +</li><li> In your code, bracket the code you want profiled in calls to + ProfilerStart() and ProfilerStop(). ProfilerStart() will take the + profile-filename as an argument. +</li></ol> + +<p>In Linux 2.6 and above, profiling works correctly with threads, +automatically profiling all threads. In Linux 2.4, profiling only +profiles the main thread (due to a kernel bug involving itimers and +threads). Profiling works correctly with sub-processes: each child +process gets its own profile with its own name (generated by combining +CPUPROFILE with the child's process id).</p> + +<p>For security reasons, CPU profiling will not write to a file -- and +is thus not usable -- for setuid programs.</p> + +<h2>Controlling Behavior via the Environment</h2> + +<p>In addition to the environment variable <code>CPUPROFILE</code>, +which determines where profiles are written, there are several +environment variables which control the performance of the CPU +profile.</p> + +<table cellpadding="5" frame="box" rules="sides" width="100%"> +<tbody><tr> +<td><code>PROFILEFREQUENCY=<i>x</i></code></td> + <td>How many interrupts/second the cpu-profiler samples. + </td> +</tr> +</tbody></table> + +<h1>Analyzing the Output</h1> + +<p>pprof is the script used to analyze a profile. It has many output +modes, both textual and graphical. Some give just raw numbers, much +like the -pg output of gcc, and others show the data in the form of a +dependency graph.</p> + +<p>pprof <b>requires</b> perl5 to be installed to run. It also +requires dot to be installed for any of the graphical output routines, +and gv to be installed for --gv mode (described below).</p> + +<p>Here are some ways to call pprof. These are described in more +detail below.</p> + +<pre>% pprof "program" "profile" + Generates one line per procedure + +% pprof --gv "program" "profile" + Generates annotated call-graph and displays via "gv" + +% pprof --gv --focus=Mutex "program" "profile" + Restrict to code paths that involve an entry that matches "Mutex" + +% pprof --gv --focus=Mutex --ignore=string "program" "profile" + Restrict to code paths that involve an entry that matches "Mutex" + and does not match "string" + +% pprof --list=IBF_CheckDocid "program" "profile" + Generates disassembly listing of all routines with at least one + sample that match the --list=<regexp> pattern. The listing is + annotated with the flat and cumulative sample counts at each line. + +% pprof --disasm=IBF_CheckDocid "program" "profile" + Generates disassembly listing of all routines with at least one + sample that match the --disasm=<regexp> pattern. The listing is + annotated with the flat and cumulative sample counts at each PC value. +</regexp></regexp></pre> + +<h3>Node Information</h3> + +<p>In the various graphical modes of pprof, the output is a call graph +annotated with timing information, like so:</p> + +<a href="http://goog-perftools.sourceforge.net/doc/pprof-test-big.gif"> +<center><table><tbody><tr><td> + <img src="../images/pprof-test.gif"> +</td></tr></tbody></table></center> +</a> + +<p>Each node represents a procedure. +The directed edges indicate caller to callee relations. Each node is +formatted as follows:</p> + +<center><pre>Class Name +Method Name +local (percentage) +<b>of</b> cumulative (percentage) +</pre></center> + +<p>The last one or two lines contains the timing information. (The +profiling is done via a sampling method, where by default we take 100 +samples a second. Therefor one unit of time in the output corresponds +to about 10 milliseconds of execution time.) The "local" time is the +time spent executing the instructions directly contained in the +procedure (and in any other procedures that were inlined into the +procedure). The "cumulative" time is the sum of the "local" time and +the time spent in any callees. If the cumulative time is the same as +the local time, it is not printed. + +</p><p>For instance, the timing information for test_main_thread() +indicates that 155 units (about 1.55 seconds) were spent executing the +code in test_main_thread() and 200 units were spent while executing +test_main_thread() and its callees such as snprintf().</p> + +<p>The size of the node is proportional to the local count. The +percentage displayed in the node corresponds to the count divided by +the total run time of the program (that is, the cumulative count for +main()).</p> + +<h3>Edge Information</h3> + +<p>An edge from one node to another indicates a caller to callee +relationship. Each edge is labelled with the time spent by the callee +on behalf of the caller. E.g, the edge from test_main_thread() to +snprintf() indicates that of the 200 samples in +test_main_thread(), 37 are because of calls to snprintf().</p> + +<p>Note that test_main_thread() has an edge to vsnprintf(), even +though test_main_thread() doesn't call that function directly. This +is because the code was compiled with -O2; the profile reflects the +optimized control flow.</p> + +<h3>Meta Information</h3> + +The top of the display should contain some meta information like: +<pre> /tmp/profiler2_unittest + Total samples: 202 + Focusing on: 202 + Dropped nodes with <= 1 abs(samples) + Dropped edges with <= 0 samples +</pre> + +This section contains the name of the program, and the total samples +collected during the profiling run. If the --focus option is on (see +the <a href="#focus">Focus</a> section below), the legend also +contains the number of samples being shown in the focused display. +Furthermore, some unimportant nodes and edges are dropped to reduce +clutter. The characteristics of the dropped nodes and edges are also +displayed in the legend. + +<h3><a name="focus">Focus and Ignore</a></h3> + +<p>You can ask pprof to generate a display focused on a particular +piece of the program. You specify a regular expression. Any portion +of the call-graph that is on a path which contains at least one node +matching the regular expression is preserved. The rest of the +call-graph is dropped on the floor. For example, you can focus on the +vsnprintf() libc call in profiler2_unittest as follows:</p> + +<pre>% pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof +</pre> +<a href="http://goog-perftools.sourceforge.net/doc/pprof-vsnprintf-big.gif"> +<center><table><tbody><tr><td> + <img src="../images/pprof-vsnprintf.gif"> +</td></tr></tbody></table></center> +</a> + +<p> +Similarly, you can supply the --ignore option to ignore +samples that match a specified regular expression. E.g., +if you are interested in everything except calls to snprintf(), +you can say: +</p><pre>% pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof +</pre> + +<h3><a name="options">pprof Options</a></h3> + +<h4>Output Type</h4> + +<p> +</p><center> +<table cellpadding="5" frame="box" rules="sides" width="100%"> +<tbody><tr valign="top"> + <td><code>--text</code></td> + <td> + Produces a textual listing. This is currently the default + since it does not need to access to an X display, or + dot or gv. However if you + have these programs installed, you will probably be + happier with the --gv output. + </td> +</tr> +<tr valign="top"> + <td><code>--gv</code></td> + <td> + Generates annotated call-graph, converts to postscript, and + displays via gv. + </td> +</tr> +<tr valign="top"> + <td><code>--dot</code></td> + <td> + Generates the annotated call-graph in dot format and + emits to stdout. + </td> +</tr> +<tr valign="top"> + <td><code>--ps</code></td> + <td> + Generates the annotated call-graph in Postscript format and + emits to stdout. + </td> +</tr> +<tr valign="top"> + <td><code>--gif</code></td> + <td> + Generates the annotated call-graph in GIF format and + emits to stdout. + </td> +</tr> +<tr valign="top"> + <td><code>--list=<<i>regexp</i>></code></td> + <td> + <p>Outputs source-code listing of routines whose + name matches <regexp>. Each line + in the listing is annotated with flat and cumulative + sample counts.</p> + + <p>In the presence of inlined calls, the samples + associated with inlined code tend to get assigned + to a line that follows the location of the + inlined call. A more precise accounting can be + obtained by disassembling the routine using the + --disasm flag.</p> + </td> +</tr> +<tr valign="top"> + <td><code>--disasm=<<i>regexp</i>></code></td> + <td> + Generates disassembly of routines that match + <regexp>, annotated with flat and + cumulative sample counts and emits to stdout. + </td> +</tr> +</tbody></table> +</center> + +<h4>Reporting Granularity</h4> + +<p>By default, pprof produces one entry per procedure. However you can +use one of the following options to change the granularity of the +output. The --files option seems to be particularly useless, and may +be removed eventually.</p> + +<center> +<table cellpadding="5" frame="box" rules="sides" width="100%"> +<tbody><tr valign="top"> + <td><code>--addresses</code></td> + <td> + Produce one node per program address. + </td> +</tr> + <tr><td><code>--lines</code></td> + <td> + Produce one node per source line. + </td> +</tr> + <tr><td><code>--functions</code></td> + <td> + Produce one node per function (this is the default). + </td> +</tr> + <tr><td><code>--files</code></td> + <td> + Produce one node per source file. + </td> +</tr> +</tbody></table> +</center> + +<h4>Controlling the Call Graph Display</h4> + +<p>Some nodes and edges are dropped to reduce clutter in the output +display. The following options control this effect:</p> + +<center> +<table cellpadding="5" frame="box" rules="sides" width="100%"> +<tbody><tr valign="top"> + <td><code>--nodecount=<n></code></td> + <td> + This option controls the number of displayed nodes. The nodes + are first sorted by decreasing cumulative count, and then only + the top N nodes are kept. The default value is 80. + </td> +</tr> +<tr valign="top"> + <td><code>--nodefraction=<f></code></td> + <td> + This option provides another mechanism for discarding nodes + from the display. If the cumulative count for a node is + less than this option's value multiplied by the total count + for the profile, the node is dropped. The default value + is 0.005; i.e. nodes that account for less than + half a percent of the total time are dropped. A node + is dropped if either this condition is satisfied, or the + --nodecount condition is satisfied. + </td> +</tr> +<tr valign="top"> + <td><code>--edgefraction=<f></code></td> + <td> + This option controls the number of displayed edges. First of all, + an edge is dropped if either its source or destination node is + dropped. Otherwise, the edge is dropped if the sample + count along the edge is less than this option's value multiplied + by the total count for the profile. The default value is + 0.001; i.e., edges that account for less than + 0.1% of the total time are dropped. + </td> +</tr> +<tr valign="top"> + <td><code>--focus=<re></code></td> + <td> + This option controls what region of the graph is displayed + based on the regular expression supplied with the option. + For any path in the callgraph, we check all nodes in the path + against the supplied regular expression. If none of the nodes + match, the path is dropped from the output. + </td> +</tr> +<tr valign="top"> + <td><code>--ignore=<re></code></td> + <td> + This option controls what region of the graph is displayed + based on the regular expression supplied with the option. + For any path in the callgraph, we check all nodes in the path + against the supplied regular expression. If any of the nodes + match, the path is dropped from the output. + </td> +</tr> +</tbody></table> +</center> + +<p>The dropped edges and nodes account for some count mismatches in +the display. For example, the cumulative count for +snprintf() in the first diagram above was 41. However the local +count (1) and the count along the outgoing edges (12+1+20+6) add up to +only 40.</p> + + +<h1>Caveats</h1> + +<ul> +<li> If the program exits because of a signal, the generated profile + will be <font color="red">incomplete, and may perhaps be + completely empty.</font> +</li><li> The displayed graph may have disconnected regions because + of the edge-dropping heuristics described above. +</li><li> If the program linked in a library that was not compiled + with enough symbolic information, all samples associated + with the library may be charged to the last symbol found + in the program before the libary. This will artificially + inflate the count for that symbol. +</li><li> If you run the program on one machine, and profile it on another, + and the shared libraries are different on the two machines, the + profiling output may be confusing: samples that fall within + the shared libaries may be assigned to arbitrary procedures. +</li><li> If your program forks, the children will also be profiled (since + they inherit the same CPUPROFILE setting). Each process is + profiled separately; to distinguish the child profiles from the + parent profile and from each other, all children will have their + process-id appended to the CPUPROFILE name. +</li><li> Due to a hack we make to work around a possible gcc bug, your + profiles may end up named strangely if the first character of + your CPUPROFILE variable has ascii value greater than 127. This + should be exceedingly rare, but if you need to use such a name, + just set prepend <code>./</code> to your filename: + <code>CPUPROFILE=./Ägypten</code>. +</li></ul> + +<hr> +Last modified: Wed Apr 20 04:54:23 PDT 2005 + +</body></html> diff --git a/docs/html/heap_checker.html b/docs/html/heap_checker.html new file mode 100644 index 0000000..fd82f37 --- /dev/null +++ b/docs/html/heap_checker.html @@ -0,0 +1,133 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html><head><title>Google Heap Checker</title></head><body> +<h1>Automatic Leaks Checking Support</h1> + +This document describes how to check the heap usage of a C++ +program. This facility can be useful for automatically detecting +memory leaks. + +<h2>Linking in the Heap Checker</h2> + +<p> +You can heap-check any program that has the tcmalloc library linked +in. No recompilation is necessary to use the heap checker. +</p> + +<p> +In order to catch all heap leaks, tcmalloc must be linked <i>last</i> into +your executable. The heap checker may mischaracterize some memory +accesses in libraries listed after it on the link line. For instance, +it may report these libraries as leaking memory when they're not. +(See the source code for more details.) +</p> + +<p> +It's safe to link in tcmalloc even if you don't expect to +heap-check your program. Your programs will not run any slower +as long as you don't use any of the heap-checker features. +</p> + +<p> +You can run the heap checker on applications you didn't compile +yourself, by using LD_PRELOAD: +</p> +<pre> $ LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPCHECK=normal <binary> +</binary></pre> +<p> +We don't necessarily recommend this mode of usage. +</p> + +<h2>Turning On Heap Checking</h2> + +<p>There are two alternatives to actually turn on heap checking for a +given run of an executable.</p> + +<ul> +<li> For whole-program heap-checking, define the environment variable + HEAPCHECK to the type of heap + checking you want: normal, strict, or draconian. For instance, + to heap-check <code>/bin/ls</code>: + <pre> $ HEAPCHECK=normal /bin/ls + % setenv HEAPCHECK normal; /bin/ls # csh + </pre> + OR + +</li><li> For partial-code heap-checking, you need to modify your code. + For each piece of code you want heap-checked, bracket the code + by creating a <code>HeapLeakChecker</code> object + (which takes a descriptive label as an argument), and calling + <code>check.NoLeaks()</code> at the end of the code you want + checked. This will verify no more memory is allocated at the + end of the code segment than was allocated in the beginning. To + actually turn on the heap-checking, set the environment variable + HEAPCHECK to <code>local</code>. + + +<p> +Here is an example of the second usage. The following code will +die if <code>Foo()</code> leaks any memory +(i.e. it allocates memory that is not freed by the time it returns): +</p> +<pre> HeapProfileLeakChecker checker("foo"); + Foo(); + assert(checker.NoLeaks()); +</pre> + +<p> +When the <code>checker</code> object is allocated, it creates +one heap profile. When <code>checker.NoLeaks()</code> is invoked, +it creates another heap profile and compares it to the previously +created profile. If the new profile indicates memory growth +(or any memory allocation change if one +uses <code>checker.SameHeap()</code> instead), <code>NoLeaks()</code> +will return false and the program will abort. An error message will +also be printed out saying how <code>pprof</code> command can be run +to get a detailed analysis of the actual leaks. +</p> + +<p> +See the comments for <code>HeapProfileLeakChecker</code> class in +<code>heap-checker.h</code> and the code in +<code>heap-checker_unittest.cc</code> for more information and +examples. (TODO: document it all here instead!) +</p> + +<p> +<b>IMPORTANT NOTE</b>: pthreads handling is currently incomplete. +Heap leak checks will fail with bogus leaks if there are pthreads live +at construction or leak checking time. One solution, for global +heap-checking, is to make sure all threads but the main thread have +exited at program-end time. We hope (as of March 2005) to have a fix +soon. +</p> + +<h2>Disabling Heap-checking of Known Leaks</h2> + +<p> +Sometimes your code has leaks that you know about and are willing to +accept. You would like the heap checker to ignore them when checking +your program. You can do this by bracketing the code in question with +an appropriate heap-checking object: +</p> +<pre> #include <google> + ... + void *mark = HeapLeakChecker::GetDisableChecksStart(); + <leaky code> + HeapLeakChecker::DisableChecksToHereFrom(mark); +</google></pre> + +<p> +Some libc routines allocate memory, and may need to be 'disabled' in +this way. As time goes on, we hope to encode proper handling of +these routines into the heap-checker library code, so applications +needn't worry about them, but that process is not yet complete. +</p> + +<hr> +<address><a href="mailto:opensource@google.com">Maxim Lifantsev</a></address> +<!-- Created: Tue Dec 19 10:43:14 PST 2000 --> +<!-- hhmts start --> +Last modified: Thu Mar 3 05:51:40 PST 2005 +<!-- hhmts end --> + +</li></ul></body></html>
\ No newline at end of file diff --git a/docs/html/heap_profiler.html b/docs/html/heap_profiler.html new file mode 100644 index 0000000..6936901 --- /dev/null +++ b/docs/html/heap_profiler.html @@ -0,0 +1,310 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html><head><title>Google Heap Profiler</title></head><body> +<h1>Profiling heap usage</h1> + +This document describes how to profile the heap usage of a C++ +program. This facility can be useful for +<ul> +<li> Figuring out what is in the program heap at any given time +</li><li> Locating memory leaks +</li><li> Finding places that do a lot of allocation +</li></ul> + +<h2>Linking in the Heap Profiler</h2> + +<p> +You can profile any program that has the tcmalloc library linked +in. No recompilation is necessary to use the heap profiler. +</p> + +<p> +It's safe to link in tcmalloc even if you don't expect to +heap-profiler your program. Your programs will not run any slower +as long as you don't use any of the heap-profiler features. +</p> + +<p> +You can run the heap profiler on applications you didn't compile +yourself, by using LD_PRELOAD: +</p> +<pre> $ LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=... <binary> +</binary></pre> +<p> +We don't necessarily recommend this mode of usage. +</p> + + +<h2>Turning On Heap Profiling</h2> + +<p> +Define the environment variable HEAPPROFILE to the filename to dump the +profile to. For instance, to profile /usr/local/netscape: +</p> +<pre> $ HEAPPROFILE=/tmp/profile /usr/local/netscape # sh + % setenv HEAPPROFILE /tmp/profile; /usr/local/netscape # csh +</pre> + +<p>Profiling also works correctly with sub-processes: each child +process gets its own profile with its own name (generated by combining +HEAPPROFILE with the child's process id).</p> + +<p>For security reasons, heap profiling will not write to a file -- +and it thus not usable -- for setuid programs.</p> + + + +<h2>Extracting a profile</h2> + +<p> +If heap-profiling is turned on in a program, the program will periodically +write profiles to the filesystem. The sequence of profiles will be named: +</p> +<pre> <prefix>.0000.heap + <prefix>.0001.heap + <prefix>.0002.heap + ... +</pre> +<p> +where <code><prefix></code> is the value supplied in +<code>HEAPPROFILE</code>. Note that if the supplied prefix +does not start with a <code>/</code>, the profile files will be +written to the program's working directory. +</p> + +<p> +By default, a new profile file is written after every 1GB of +allocation. The profile-writing interval can be adjusted by calling +HeapProfilerSetAllocationInterval() from your program. This takes one +argument: a numeric value that indicates the number of bytes of allocation +between each profile dump. +</p> + +<p> +You can also generate profiles from specific points in the program +by inserting a call to <code>HeapProfile()</code>. Example: +</p> +<pre> extern const char* HeapProfile(); + const char* profile = HeapProfile(); + fputs(profile, stdout); + free(const_cast<char*>(profile)); +</pre> + +<h2>What is profiled</h2> + +The profiling system instruments all allocations and frees. It keeps +track of various pieces of information per allocation site. An +allocation site is defined as the active stack trace at the call to +<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or, +<code>new</code>. + +<h2>Interpreting the profile</h2> + +The profile output can be viewed by passing it to the +<code>pprof</code> tool. The <code>pprof</code> tool can print both +CPU usage and heap usage information. It is documented in detail +on the <a href="http://goog-perftools.sourceforge.net/doc/cpu_profiler.html">CPU Profiling</a> page. +Heap-profile-specific flags and usage are explained below. + +<p> +Here are some examples. These examples assume the binary is named +<code>gfs_master</code>, and a sequence of heap profile files can be +found in files named: +</p> +<pre> profile.0001.heap + profile.0002.heap + ... + profile.0100.heap +</pre> + +<h3>Why is a process so big</h3> + +<pre> % pprof --gv gfs_master profile.0100.heap +</pre> + +This command will pop-up a <code>gv</code> window that displays +the profile information as a directed graph. Here is a portion +of the resulting output: + +<p> +</p><center> +<img src="../images/heap-example1.png"> +</center> +<p></p> + +A few explanations: +<ul> +<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB + of the live memory, which is 25% of the total live memory. +</li><li> <code>GFS_MasterChunkTable::UpdateState</code> is directly + accountable for 176.2 MB of the live memory (i.e., it directly + allocated 176.2 MB that has not been freed yet). Furthermore, + it and its callees are responsible for 729.9 MB. The + labels on the outgoing edges give a good indication of the + amount allocated by each callee. +</li></ul> + +<h3>Comparing Profiles</h3> + +<p> +You often want to skip allocations during the initialization phase of +a program so you can find gradual memory leaks. One simple way to do +this is to compare two profiles -- both collected after the program +has been running for a while. Specify the name of the first profile +using the <code>--base</code> option. Example: +</p> +<pre> % pprof --base=profile.0004.heap gfs_master profile.0100.heap +</pre> + +<p> +The memory-usage in <code>profile.0004.heap</code> will be subtracted from +the memory-usage in <code>profile.0100.heap</code> and the result will +be displayed. +</p> + +<h3>Text display</h3> + +<pre>% pprof gfs_master profile.0100.heap + 255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer + 184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create + 176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState + 169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone + 76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc + 49.5 4.8% 88.0% 49.5 4.8% hashtable::resize + ... +</pre> + +<p> +</p><ul> +<li> The first column contains the direct memory use in MB. +</li><li> The fourth column contains memory use by the procedure + and all of its callees. +</li><li> The second and fifth columns are just percentage representations + of the numbers in the first and fifth columns. +</li><li> The third column is a cumulative sum of the second column + (i.e., the <code>k</code>th entry in the third column is the + sum of the first <code>k</code> entries in the second column.) +</li></ul> + +<h3>Ignoring or focusing on specific regions</h3> + +The following command will give a graphical display of a subset of +the call-graph. Only paths in the call-graph that match the +regular expression <code>DataBuffer</code> are included: +<pre>% pprof --gv --focus=DataBuffer gfs_master profile.0100.heap +</pre> + +Similarly, the following command will omit all paths subset of the +call-graph. All paths in the call-graph that match the regular +expression <code>DataBuffer</code> are discarded: +<pre>% pprof --gv --ignore=DataBuffer gfs_master profile.0100.heap +</pre> + +<h3>Total allocations + object-level information</h3> + +<p> +All of the previous examples have displayed the amount of in-use +space. I.e., the number of bytes that have been allocated but not +freed. You can also get other types of information by supplying +a flag to <code>pprof</code>: +</p> + +<center> +<table cellpadding="5" frame="box" rules="sides" width="100%"> + +<tbody><tr valign="top"> + <td><code>--inuse_space</code></td> + <td> + Display the number of in-use megabytes (i.e. space that has + been allocated but not freed). This is the default. + </td> +</tr> + +<tr valign="top"> + <td><code>--inuse_objects</code></td> + <td> + Display the number of in-use objects (i.e. number of + objects that have been allocated but not freed). + </td> +</tr> + +<tr valign="top"> + <td><code>--alloc_space</code></td> + <td> + Display the number of allocated megabytes. This includes + the space that has since been de-allocated. Use this + if you want to find the main allocation sites in the + program. + </td> +</tr> + +<tr valign="top"> + <td><code>--alloc_objects</code></td> + <td> + Display the number of allocated objects. This includes + the objects that have since been de-allocated. Use this + if you want to find the main allocation sites in the + program. + </td> + +</tr></tbody></table> +</center> + +<h2>Caveats</h2> + +<ul> +<li> <p> + Heap profiling requires the use of libtcmalloc. This requirement + may be removed in a future version of the heap profiler, and the + heap profiler separated out into its own library. + </p> + +</li><li> <p> + If the program linked in a library that was not compiled + with enough symbolic information, all samples associated + with the library may be charged to the last symbol found + in the program before the libary. This will artificially + inflate the count for that symbol. + </p> + +</li><li> <p> + If you run the program on one machine, and profile it on another, + and the shared libraries are different on the two machines, the + profiling output may be confusing: samples that fall within + the shared libaries may be assigned to arbitrary procedures. + </p> + +</li><li> <p> + Several libraries, such as some STL implementations, do their own + memory management. This may cause strange profiling results. We + have code in libtcmalloc to cause STL to use tcmalloc for memory + management (which in our tests is better than STL's internal + management), though it only works for some STL implementations. + </p> + +</li><li> <p> + If your program forks, the children will also be profiled (since + they inherit the same HEAPPROFILE setting). Each process is + profiled separately; to distinguish the child profiles from the + parent profile and from each other, all children will have their + process-id attached to the HEAPPROFILE name. + </p> + +</li><li> <p> + Due to a hack we make to work around a possible gcc bug, your + profiles may end up named strangely if the first character of + your HEAPPROFILE variable has ascii value greater than 127. This + should be exceedingly rare, but if you need to use such a name, + just set prepend <code>./</code> to your filename: + <code>HEAPPROFILE=./Ägypten</code>. + </p> + +</li></ul> + +<hr> +<address><a href="mailto:opensource@google.com">Sanjay Ghemawat</a></address> +<!-- Created: Tue Dec 19 10:43:14 PST 2000 --> +<!-- hhmts start --> +Last modified: Wed Apr 20 05:46:16 PDT 2005 +<!-- hhmts end --> + +</body></html> diff --git a/docs/html/tcmalloc.html b/docs/html/tcmalloc.html new file mode 100644 index 0000000..aa2d3ee --- /dev/null +++ b/docs/html/tcmalloc.html @@ -0,0 +1,373 @@ +<!DOCTYPE html PUBLIC "-//w3c//dtd html 4.01 transitional//en"> +<html><head><!-- $Id: $ --><title>TCMalloc : Thread-Caching Malloc</title> + + + + +<style type="text/css"> + em { + color: red; + font-style: normal; + } +</style></head><body> + +<h1>TCMalloc : Thread-Caching Malloc</h1> + +<address>Sanjay Ghemawat, Paul Menage <opensource@google.com></address> + +<h2>Motivation</h2> + +TCMalloc is faster than the glibc 2.3 malloc (available as a separate +library called ptmalloc2) and other mallocs +that I have tested. ptmalloc2 takes approximately 300 nanoseconds to +execute a malloc/free pair on a 2.8 GHz P4 (for small objects). The +TCMalloc implementation takes approximately 50 nanoseconds for the +same operation pair. Speed is important for a malloc implementation +because if malloc is not fast enough, application writers are inclined +to write their own custom free lists on top of malloc. This can lead +to extra complexity, and more memory usage unless the application +writer is very careful to appropriately size the free lists and +scavenge idle objects out of the free list + +<p> +TCMalloc also reduces lock contention for multi-threaded programs. +For small objects, there is virtually zero contention. For large +objects, TCMalloc tries to use fine grained and efficient spinlocks. +ptmalloc2 also reduces lock contention by using per-thread arenas but +there is a big problem with ptmalloc2's use of per-thread arenas. In +ptmalloc2 memory can never move from one arena to another. This can +lead to huge amounts of wasted space. For example, in one Google application, the first phase would +allocate approximately 300MB of memory for its data +structures. When the first phase finished, a second phase would be +started in the same address space. If this second phase was assigned a +different arena than the one used by the first phase, this phase would +not reuse any of the memory left after the first phase and would add +another 300MB to the address space. Similar memory blowup problems +were also noticed in other applications. + +</p><p> +Another benefit of TCMalloc is space-efficient representation of small +objects. For example, N 8-byte objects can be allocated while using +space approximately <code>8N * 1.01</code> bytes. I.e., a one-percent +space overhead. ptmalloc2 uses a four-byte header for each object and +(I think) rounds up the size to a multiple of 8 bytes and ends up +using <code>16N</code> bytes. + + +</p><h2>Usage</h2> + +<p>To use TCmalloc, just link tcmalloc into your application via the +"-ltcmalloc" linker flag.</p> + +<p> +You can use tcmalloc in applications you didn't compile yourself, by +using LD_PRELOAD: +</p> +<pre> $ LD_PRELOAD="/usr/lib/libtcmalloc.so" <binary> +</binary></pre> +<p> +LD_PRELOAD is tricky, and we don't necessarily recommend this mode of +usage. +</p> + +<p>TCMalloc includes a <a href="http://goog-perftools.sourceforge.net/doc/heap_checker.html">heap checker</a> +and <a href="http://goog-perftools.sourceforge.net/doc/heap_profiler.html">heap profiler</a> as well.</p> + +<p>If you'd rather link in a version of TCMalloc that does not include +the heap profiler and checker (perhaps to reduce binary size for a +static binary), you can link in <code>libtcmalloc_minimal</code> +instead.</p> + + +<h2>Overview</h2> + +TCMalloc assigns each thread a thread-local cache. Small allocations +are satisfied from the thread-local cache. Objects are moved from +central data structures into a thread-local cache as needed, and +periodic garbage collections are used to migrate memory back from a +thread-local cache into the central data structures. +<center><img src="../images/overview.gif"></center> + +<p> +TCMalloc treates objects with size <= 32K ("small" objects) +differently from larger objects. Large objects are allocated +directly from the central heap using a page-level allocator +(a page is a 4K aligned region of memory). I.e., a large object +is always page-aligned and occupies an integral number of pages. + +</p><p> +A run of pages can be carved up into a sequence of small objects, each +equally sized. For example a run of one page (4K) can be carved up +into 32 objects of size 128 bytes each. + +</p><h2>Small Object Allocation</h2> + +Each small object size maps to one of approximately 170 allocatable +size-classes. For example, all allocations in the range 961 to 1024 +bytes are rounded up to 1024. The size-classes are spaced so that +small sizes are separated by 8 bytes, larger sizes by 16 bytes, even +larger sizes by 32 bytes, and so forth. The maximal spacing (for sizes +>= ~2K) is 256 bytes. + +<p> +A thread cache contains a singly linked list of free objects per size-class. +</p><center><img src="../images/threadheap.gif"></center> + +When allocating a small object: (1) We map its size to the +corresponding size-class. (2) Look in the corresponding free list in +the thread cache for the current thread. (3) If the free list is not +empty, we remove the first object from the list and return it. When +following this fast path, TCMalloc acquires no locks at all. This +helps speed-up allocation significantly because a lock/unlock pair +takes approximately 100 nanoseconds on a 2.8 GHz Xeon. + +<p> +If the free list is empty: (1) We fetch a bunch of objects from a +central free list for this size-class (the central free list is shared +by all threads). (2) Place them in the thread-local free list. (3) +Return one of the newly fetched objects to the applications. + +</p><p> +If the central free list is also empty: (1) We allocate a run of pages +from the central page allocator. (2) Split the run into a set of +objects of this size-class. (3) Place the new objects on the central +free list. (4) As before, move some of these objects to the +thread-local free list. + +</p><h2>Large Object Allocation</h2> + +A large object size (> 32K) is rounded up to a page size (4K) and +is handled by a central page heap. The central page heap is again an +array of free lists. For <code>i < 256</code>, the +<code>k</code>th entry is a free list of runs that consist of +<code>k</code> pages. The <code>256</code>th entry is a free list of +runs that have length <code>>= 256</code> pages: +<center><img src="../images/pageheap.gif"></center> + +<p> +An allocation for <code>k</code> pages is satisfied by looking in the +<code>k</code>th free list. If that free list is empty, we look in +the next free list, and so forth. Eventually, we look in the last +free list if necessary. If that fails, we fetch memory from the +system (using sbrk, mmap, or by mapping in portions of /dev/mem). + +</p><p> +If an allocation for <code>k</code> pages is satisfied by a run +of pages of length > <code>k</code>, the remainder of the +run is re-inserted back into the appropriate free list in the +page heap. + +</p><h2>Spans</h2> + +The heap managed by TCMalloc consists of a set of pages. A run of +contiguous pages is represented by a <code>Span</code> object. A span +can either be <em>allocated</em>, or <em>free</em>. If free, the span +is one of the entries in a page heap linked-list. If allocated, it is +either a large object that has been handed off to the application, or +a run of pages that have been split up into a sequence of small +objects. If split into small objects, the size-class of the objects +is recorded in the span. + +<p> +A central array indexed by page number can be used to find the span to +which a page belongs. For example, span <em>a</em> below occupies 2 +pages, span <em>b</em> occupies 1 page, span <em>c</em> occupies 5 +pages and span <em>d</em> occupies 3 pages. +</p><center><img src="../images/spanmap.gif"></center> +A 32-bit address space can fit 2^20 4K pages, so this central array +takes 4MB of space, which seems acceptable. On 64-bit machines, we +use a 3-level radix tree instead of an array to map from a page number +to the corresponding span pointer. + +<h2>Deallocation</h2> + +When an object is deallocated, we compute its page number and look it up +in the central array to find the corresponding span object. The span tells +us whether or not the object is small, and its size-class if it is +small. If the object is small, we insert it into the appropriate free +list in the current thread's thread cache. If the thread cache now +exceeds a predetermined size (2MB by default), we run a garbage +collector that moves unused objects from the thread cache into central +free lists. + +<p> +If the object is large, the span tells us the range of pages covered +by the object. Suppose this range is <code>[p,q]</code>. We also +lookup the spans for pages <code>p-1</code> and <code>q+1</code>. If +either of these neighboring spans are free, we coalesce them with the +<code>[p,q]</code> span. The resulting span is inserted into the +appropriate free list in the page heap. + +</p><h2>Central Free Lists for Small Objects</h2> + +As mentioned before, we keep a central free list for each size-class. +Each central free list is organized as a two-level data structure: +a set of spans, and a linked list of free objects per span. + +<p> +An object is allocated from a central free list by removing the +first entry from the linked list of some span. (If all spans +have empty linked lists, a suitably sized span is first allocated +from the central page heap.) + +</p><p> +An object is returned to a central free list by adding it to the +linked list of its containing span. If the linked list length now +equals the total number of small objects in the span, this span is now +completely free and is returned to the page heap. + +</p><h2>Garbage Collection of Thread Caches</h2> + +A thread cache is garbage collected when the combined size of all +objects in the cache exceeds 2MB. The garbage collection threshold +is automatically decreased as the number of threads increases so that +we don't waste an inordinate amount of memory in a program with lots +of threads. + +<p> +We walk over all free lists in the cache and move some number of +objects from the free list to the corresponding central list. + +</p><p> +The number of objects to be moved from a free list is determined using +a per-list low-water-mark <code>L</code>. <code>L</code> records the +minimum length of the list since the last garbage collection. Note +that we could have shortened the list by <code>L</code> objects at the +last garbage collection without requiring any extra accesses to the +central list. We use this past history as a predictor of future +accesses and move <code>L/2</code> objects from the thread cache free +list to the corresponding central free list. This algorithm has the +nice property that if a thread stops using a particular size, all +objects of that size will quickly move from the thread cache to the +central free list where they can be used by other threads. + +</p><h2>Performance Notes</h2> + +<h3>PTMalloc2 unittest</h3> +The PTMalloc2 package (now part of glibc) contains a unittest program +t-test1.c. This forks a number of threads and performs a series of +allocations and deallocations in each thread; the threads do not +communicate other than by synchronization in the memory allocator. + +<p> t-test1 (included in google-perftools/tests/tcmalloc, and compiled +as ptmalloc_unittest1) was run with a varying numbers of threads +(1-20) and maximum allocation sizes (64 bytes - 32Kbytes). These tests +were run on a 2.4GHz dual Xeon system with hyper-threading enabled, +using Linux glibc-2.3.2 from RedHat 9, with one million operations per +thread in each test. In each case, the test was run once normally, and +once with LD_PRELOAD=libtcmalloc.so. + +</p><p>The graphs below show the performance of TCMalloc vs PTMalloc2 for +several different metrics. Firstly, total operations (millions) per elapsed +second vs max allocation size, for varying numbers of threads. The raw +data used to generate these graphs (the output of the "time" utility) +is available in t-test1.times.txt. + +</p><p> +<table> +<tbody><tr> +<td><img src="../images/tcmalloc-opspersec_004.png"></td> +<td><img src="../images/tcmalloc-opspersec_009.png"></td> +<td><img src="../images/tcmalloc-opspersec_005.png"></td> +</tr> +<tr> +<td><img src="../images/tcmalloc-opspersec.png"></td> +<td><img src="../images/tcmalloc-opspersec_006.png"></td> +<td><img src="../images/tcmalloc-opspersec_008.png"></td> +</tr> +<tr> +<td><img src="../images/tcmalloc-opspersec_003.png"></td> +<td><img src="../images/tcmalloc-opspersec_002.png"></td> +<td><img src="../images/tcmalloc-opspersec_007.png"></td> +</tr> +</tbody></table> + + +</p><ul> + +<li> TCMalloc is much more consistently scalable than PTMalloc2 - for +all thread counts >1 it achieves ~7-9 million ops/sec for small +allocations, falling to ~2 million ops/sec for larger allocations. The +single-thread case is an obvious outlier, since it is only able to +keep a single processor busy and hence can achieve fewer +ops/sec. PTMalloc2 has a much higher variance on operations/sec - +peaking somewhere around 4 million ops/sec for small allocations and +falling to <1 million ops/sec for larger allocations. + +</li><li> TCMalloc is faster than PTMalloc2 in the vast majority of cases, +and particularly for small allocations. Contention between threads is +less of a problem in TCMalloc. + +</li><li> TCMalloc's performance drops off as the allocation size +increases. This is because the per-thread cache is garbage-collected +when it hits a threshold (defaulting to 2MB). With larger allocation +sizes, fewer objects can be stored in the cache before it is +garbage-collected. + +</li><li> There is a noticeably drop in the TCMalloc performance at ~32K +maximum allocation size; at larger sizes performance drops less +quickly. This is due to the 32K maximum size of objects in the +per-thread caches; for objects larger than this tcmalloc allocates +from the central page heap. + +</li></ul> + +<p> Next, operations (millions) per second of CPU time vs number of threads, for +max allocation size 64 bytes - 128 Kbytes. + +</p><p> +<table> +<tbody><tr> +<td><img src="../images/tcmalloc-opspercpusec_005.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_006.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_009.png"></td> +</tr> +<tr> +<td><img src="../images/tcmalloc-opspercpusec_003.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_002.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_008.png"></td> +</tr> +<tr> +<td><img src="../images/tcmalloc-opspercpusec.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_007.png"></td> +<td><img src="../images/tcmalloc-opspercpusec_004.png"></td> +</tr> +</tbody></table> + +</p><p> Here we see again that TCMalloc is both more consistent and more +efficient than PTMalloc2. For max allocation sizes <32K, TCMalloc +typically achieves ~2-2.5 million ops per second of CPU time with a +large number of threads, whereas PTMalloc achieves generally 0.5-1 +million ops per second of CPU time, with a lot of cases achieving much +less than this figure. Above 32K max allocation size, TCMalloc drops +to 1-1.5 million ops per second of CPU time, and PTMalloc drops almost +to zero for large numbers of threads (i.e. with PTMalloc, lots of CPU +time is being burned spinning waiting for locks in the heavily +multi-threaded case). + +</p><h2>Caveats</h2> + +<p>For some systems, TCMalloc may not work correctly on with +applications that aren't linked against libpthread.so (or the +equivalent on your OS). It should work on Linux using glibc 2.3, but +other OS/libc combinations have not been tested. + +</p><p>TCMalloc may be somewhat more memory hungry than other mallocs, +though it tends not to have the huge blowups that can happen with +other mallocs. In particular, at startup TCMalloc allocates +approximately 6 MB of memory. It would be easy to roll a specialized +version that trades a little bit of speed for more space efficiency. + +</p><p> +TCMalloc currently does not return any memory to the system. + +</p><p> +Don't try to load TCMalloc into a running binary (e.g., using +JNI in Java programs). The binary will have allocated some +objects using the system malloc, and may try to pass them +to TCMalloc for deallocation. TCMalloc will not be able +to handle such objects. + + + +</p></body></html> diff --git a/docs/images/heap-example1.png b/docs/images/heap-example1.png Binary files differnew file mode 100644 index 0000000..9a14b6f --- /dev/null +++ b/docs/images/heap-example1.png diff --git a/docs/images/overview.gif b/docs/images/overview.gif Binary files differnew file mode 100644 index 0000000..43828da --- /dev/null +++ b/docs/images/overview.gif diff --git a/docs/images/pageheap.gif b/docs/images/pageheap.gif Binary files differnew file mode 100644 index 0000000..6632981 --- /dev/null +++ b/docs/images/pageheap.gif diff --git a/docs/images/pprof-test.gif b/docs/images/pprof-test.gif Binary files differnew file mode 100644 index 0000000..9eeab8a --- /dev/null +++ b/docs/images/pprof-test.gif diff --git a/docs/images/pprof-vsnprintf.gif b/docs/images/pprof-vsnprintf.gif Binary files differnew file mode 100644 index 0000000..42a8547 --- /dev/null +++ b/docs/images/pprof-vsnprintf.gif diff --git a/docs/images/spanmap.gif b/docs/images/spanmap.gif Binary files differnew file mode 100644 index 0000000..a0627f6 --- /dev/null +++ b/docs/images/spanmap.gif diff --git a/docs/images/tcmalloc-opspercpusec.png b/docs/images/tcmalloc-opspercpusec.png Binary files differnew file mode 100644 index 0000000..18715e3 --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec.png diff --git a/docs/images/tcmalloc-opspercpusec_002.png b/docs/images/tcmalloc-opspercpusec_002.png Binary files differnew file mode 100644 index 0000000..3a99cbc --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_002.png diff --git a/docs/images/tcmalloc-opspercpusec_003.png b/docs/images/tcmalloc-opspercpusec_003.png Binary files differnew file mode 100644 index 0000000..642e245 --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_003.png diff --git a/docs/images/tcmalloc-opspercpusec_004.png b/docs/images/tcmalloc-opspercpusec_004.png Binary files differnew file mode 100644 index 0000000..183a77b --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_004.png diff --git a/docs/images/tcmalloc-opspercpusec_005.png b/docs/images/tcmalloc-opspercpusec_005.png Binary files differnew file mode 100644 index 0000000..3a080de --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_005.png diff --git a/docs/images/tcmalloc-opspercpusec_006.png b/docs/images/tcmalloc-opspercpusec_006.png Binary files differnew file mode 100644 index 0000000..6213021 --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_006.png diff --git a/docs/images/tcmalloc-opspercpusec_007.png b/docs/images/tcmalloc-opspercpusec_007.png Binary files differnew file mode 100644 index 0000000..48ebdb6 --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_007.png diff --git a/docs/images/tcmalloc-opspercpusec_008.png b/docs/images/tcmalloc-opspercpusec_008.png Binary files differnew file mode 100644 index 0000000..db59d61 --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_008.png diff --git a/docs/images/tcmalloc-opspercpusec_009.png b/docs/images/tcmalloc-opspercpusec_009.png Binary files differnew file mode 100644 index 0000000..8c0ae6b --- /dev/null +++ b/docs/images/tcmalloc-opspercpusec_009.png diff --git a/docs/images/tcmalloc-opspersec.png b/docs/images/tcmalloc-opspersec.png Binary files differnew file mode 100644 index 0000000..d7c79ef --- /dev/null +++ b/docs/images/tcmalloc-opspersec.png diff --git a/docs/images/tcmalloc-opspersec_002.png b/docs/images/tcmalloc-opspersec_002.png Binary files differnew file mode 100644 index 0000000..e8a3c9f --- /dev/null +++ b/docs/images/tcmalloc-opspersec_002.png diff --git a/docs/images/tcmalloc-opspersec_003.png b/docs/images/tcmalloc-opspersec_003.png Binary files differnew file mode 100644 index 0000000..d45458a --- /dev/null +++ b/docs/images/tcmalloc-opspersec_003.png diff --git a/docs/images/tcmalloc-opspersec_004.png b/docs/images/tcmalloc-opspersec_004.png Binary files differnew file mode 100644 index 0000000..37d406d --- /dev/null +++ b/docs/images/tcmalloc-opspersec_004.png diff --git a/docs/images/tcmalloc-opspersec_005.png b/docs/images/tcmalloc-opspersec_005.png Binary files differnew file mode 100644 index 0000000..1093e81 --- /dev/null +++ b/docs/images/tcmalloc-opspersec_005.png diff --git a/docs/images/tcmalloc-opspersec_006.png b/docs/images/tcmalloc-opspersec_006.png Binary files differnew file mode 100644 index 0000000..779eec6 --- /dev/null +++ b/docs/images/tcmalloc-opspersec_006.png diff --git a/docs/images/tcmalloc-opspersec_007.png b/docs/images/tcmalloc-opspersec_007.png Binary files differnew file mode 100644 index 0000000..da0328a --- /dev/null +++ b/docs/images/tcmalloc-opspersec_007.png diff --git a/docs/images/tcmalloc-opspersec_008.png b/docs/images/tcmalloc-opspersec_008.png Binary files differnew file mode 100644 index 0000000..76c125a --- /dev/null +++ b/docs/images/tcmalloc-opspersec_008.png diff --git a/docs/images/tcmalloc-opspersec_009.png b/docs/images/tcmalloc-opspersec_009.png Binary files differnew file mode 100644 index 0000000..52d7aee --- /dev/null +++ b/docs/images/tcmalloc-opspersec_009.png diff --git a/docs/images/threadheap.gif b/docs/images/threadheap.gif Binary files differnew file mode 100644 index 0000000..c43d0a3 --- /dev/null +++ b/docs/images/threadheap.gif |