Update the documentation to match the current GC implementation

* README.md: Update the documentation to match the current implementation of the collector. * doc/README.Mac: Likewise. * doc/README.autoconf: Likewise. * doc/README.darwin: Likewise. * doc/README.hp: Likewise. * doc/README.linux: Likewise. * doc/README.macros: Likewise. * doc/README.solaris2: Likewise. * doc/README.win32: Likewise. * doc/debugging.md: Likewise. * doc/finalization.md: Likewise. * doc/gc.man: Likewise. * doc/gcdescr.md: Likewise. * doc/gcinterface.md: Likewise. * doc/leak.md: Likewise. * doc/overview.md: Likewise. * doc/porting.md: Likewise. * doc/scale.md: Likewise.
author: Ivan Maidanski <ivmai@mail.ru> 2019-03-26 08:37:11 +0300
committer: Ivan Maidanski <ivmai@mail.ru> 2019-03-26 09:26:09 +0300
commit: 6f8f39af54d73711d1226bb109ef2107a5e5b04d (patch)
tree: d6f96a8df3bb5321ceae78c661696b4e6728e7e0 /doc
parent: 90840b249b78071840821edf6200692ce9cb0601 (diff)
download: bdwgc-6f8f39af54d73711d1226bb109ef2107a5e5b04d.tar.gz
17 files changed, 225 insertions, 211 deletions
diff --git a/doc/README.Mac b/doc/README.Mac
index f2206d48..b317a2a5 100644
--- a/doc/README.Mac
+++ b/doc/README.Mac
@@ -199,9 +199,10 @@ Files to build the GC libraries:
     checksums.c
     dbg_mlc.c
     finalize.c
+    fnlz_mlc.c
     headers.c
     mach_dep.c
-    MacOS.c    -- contains MacOS code
+    extra/MacOS.c -- contains MacOS code
     malloc.c
     mallocx.c
     mark.c
@@ -213,9 +214,7 @@ Files to build the GC libraries:
     ptr_chck.c
     reclaim.c
     typd_mlc.c
-    gc++.cc    -- this is 'gc_cpp.cc' with less 'inline' and
-               -- throw std::bad_alloc when out of memory
-               -- gc_cpp.cc works just fine too
+    gc_cpp.cc
 
 == 2. Test that the library works with 'test.c' ==
 
diff --git a/doc/README.autoconf b/doc/README.autoconf
index 7f9de22c..dd373402 100644
--- a/doc/README.autoconf
+++ b/doc/README.autoconf
@@ -51,8 +51,9 @@ Important options to configure:
 
 
 Unless --prefix is set (or --exec-prefix or one of the more obscure options),
-make install will install libgc.a and libgc.so in /usr/local/bin, which
-would typically require the "make install" to be run as root.
+"make install" will install libgc.a and libgc.so in /usr/local/lib and
+/usr/local/bin, respectively, which would typically require the "make install"
+to be run as root.
 
 It is not recommended to turn off parallel marking for multiprocessors unless
 a poor support of the feature on the platform.
diff --git a/doc/README.darwin b/doc/README.darwin
index 2727d0b1..4bc6a538 100644
--- a/doc/README.darwin
+++ b/doc/README.darwin
@@ -11,18 +11,18 @@ CFLAGS="-arch ppc -arch i386 -arch x86_64" ./configure --disable-dependency-trac
 
 == Important Usage Notes ==
 
-GC_init() MUST be called before calling any other GC functions. This
+GC_INIT() MUST be called before calling any other GC functions. This
 is necessary to properly register segments in dynamic libraries. This
 call is required even if you code does not use dynamic libraries as the
 dyld code handles registering all data segments.
 
 When your use of the garbage collector is confined to dylibs and you
-cannot call GC_init() before your libraries' static initializers have
+cannot call GC_INIT() before your libraries' static initializers have
 run and perhaps called GC_malloc(), create an initialization routine
-for each library to call GC_init():
+for each library to call GC_INIT(), e.g.:
 
 #include "gc.h"
-extern "C" void my_library_init() { GC_init(); }
+extern "C" void my_library_init() { GC_INIT(); }
 
 Compile this code into a my_library_init.o, and link it into your
 dylib. When you link the dylib, pass the -init argument with
@@ -31,10 +31,10 @@ my_library_init.o -init _my_library_init). This causes
 my_library_init() to be called before any static initializers, and
 will initialize the garbage collector properly.
 
-Note: It doesn't hurt to call GC_init() more than once, so it's best,
+Note: It doesn't hurt to call GC_INIT() more than once, so it's best,
 if you have an application or set of libraries that all use the
 garbage collector, to create an initialization routine for each of
-them that calls GC_init(). Better safe than sorry.
+them that calls GC_INIT(). Better safe than sorry.
 
 The incremental collector is still a bit flaky on darwin. It seems to
 work reliably with workarounds for a few possible bugs in place however
diff --git a/doc/README.hp b/doc/README.hp
index 83708ea0..cc31b18d 100644
--- a/doc/README.hp
+++ b/doc/README.hp
@@ -15,4 +15,4 @@ Define GC_THREADS macro for the build.  Incremental collection still does not
 work in combination with it.
 
 The stack finding code can be confused by putenv calls before collector
-initialization.  Call GC_malloc or GC_init before any putenv calls.
+initialization.  Call GC_malloc() or GC_INIT() before any putenv() calls.
diff --git a/doc/README.linux b/doc/README.linux
index 7f23c2c0..29084fa0 100644
--- a/doc/README.linux
+++ b/doc/README.linux
@@ -1,7 +1,7 @@
 See README.alpha for Linux on DEC AXP info.
 
-This file applies mostly to Linux/Intel IA32.  Ports to Linux on an M68K,
-IA64, SPARC, MIPS, Alpha and PowerPC are integrated too.  They should behave
+This file applies mostly to Linux/Intel IA-32.  Ports to Linux on an M68K,
+IA-64, SPARC, MIPS, Alpha and PowerPC are integrated too.  They should behave
 similarly, except that the PowerPC port lacks incremental GC support, and
 it is unknown to what extent the Linux threads code is functional.
 See below for M68K specific notes.
@@ -29,8 +29,8 @@ To use threads, you need to abide by the following requirements:
    in the Makefile.
 
 3a) Every file that makes thread calls should define GC_THREADS, and then
-   include gc.h.  Gc.h redefines some of the pthread primitives as macros
-   which also provide the collector with information it requires.
+   include gc.h.  The latter redefines some of the pthread primitives as
+   macros which also provide the collector with information it requires.
 
 3b) A new alternative to (3a) is to build the collector and compile GC clients
    with -DGC_USE_LD_WRAP, and to link the final program with
diff --git a/doc/README.macros b/doc/README.macros
index 94596e98..886bb608 100644
--- a/doc/README.macros
+++ b/doc/README.macros
@@ -83,12 +83,13 @@ GC_NOT_DLL      User-settable macro that overrides _DLL, e.g. if runtime
                 dynamic libraries are used, but the collector is in a static
                 library.  Tested by gc_config_macros.h.
 
-GC_REQUIRE_WCSDUP       Force GC to export GC_wcsdup() (the Unicode version
-                of GC_strdup); could be useful in the leak-finding mode.
-
 
 These define arguments influence the collector configuration:
 
+GC_REQUIRE_WCSDUP       Force GC to export GC_wcsdup() (the Unicode version
+  of GC_strdup); could be useful in the leak-finding mode.  Clients should
+  define it before including gc.h if the function is needed.
+
 FIND_LEAK       Causes GC_find_leak to be initially set.  This causes the
   collector to assume that all inaccessible objects should have been
   explicitly deallocated, and reports exceptions.  Finalization and the test
@@ -111,13 +112,13 @@ SUNOS5SIGS      Solaris-like signal handling.  This is probably misnamed,
 PCR     Set if the collector is being built as part of the Xerox Portable
   Common Runtime.
 
-IMPORTANT: Any of the _THREADS options must normally also be defined in
-  the client before including gc.h.  This redefines thread primitives to
-  invoke the GC_ versions instead.  Alternatively, linker-based symbol
-  interception can be used on a few platforms.
-
 GC_THREADS      Should set the appropriate one of the below macros,
   except GC_WIN32_PTHREADS, which must be set explicitly.  Tested by gc.h.
+  IMPORTANT: GC_THREADS macro (or the relevant platform-specific deprecated
+  one) must normally also be defined by the client before including gc.h.
+  This redefines thread primitives to invoke the GC_ wrappers instead.
+  Alternatively, linker-based symbol interception can be used on a few
+  platforms.
 
 GC_SOLARIS_THREADS      Enables support for Solaris pthreads.
   Must also define _REENTRANT.  Deprecated, use GC_THREADS instead.
@@ -155,8 +156,7 @@ GC_DGUX386_THREADS      Enables support for DB/UX on I386 threads.
   See README.DGUX386.  (Probably has not been tested recently.)  Deprecated,
   use GC_THREADS instead.
 
-GC_WIN32_THREADS        Enables support for Win32 threads.  That makes sense
-  for Makefile (and Makefile.direct) only under Cygwin or MinGW.  Deprecated,
+GC_WIN32_THREADS        Enables support for Win32 threads.  Deprecated,
   use GC_THREADS instead.
 
 GC_WIN32_PTHREADS       Enables support for pthreads-win32 (or other
@@ -183,9 +183,9 @@ NO_CLOCK        Do not use system clock.  Disables some statistic printing.
 
 GC_DISABLE_INCREMENTAL  Turn off the incremental collection support.
 
-NO_INCREMENTAL  Causes the gctest program to not invoke the incremental
-  collector.  This has no impact on the generated library, only on the test
-  program.  (This is often useful for debugging failures unrelated to
+NO_INCREMENTAL  Causes the GC test programs to not invoke the incremental mode
+  of the collector.  This has no impact on the generated library, only on the
+  test programs.  (This is often useful for debugging failures unrelated to
   incremental GC.)
 
 LARGE_CONFIG    Tunes the collector for unusually large heaps.
@@ -211,7 +211,7 @@ NO_EXECUTE_PERMISSION   May cause some or all of the heap to not
   execute permission is required.
 
 GC_NO_OPERATOR_NEW_ARRAY        Declares that the C++ compiler does not
-  support the  new syntax "operator new[]" for allocating and deleting arrays.
+  support the new syntax "operator new[]" for allocating and deleting arrays.
   See gc_cpp.h for details.  No effect on the C part of the collector.
   This is defined implicitly in a few environments.  Must also be defined
   by clients that use gc_cpp.h.
@@ -327,9 +327,9 @@ KEEP_BACK_PTRS  Add code to save back pointers in debugging headers
   for debugging/profiling purposes.  The gc_backptr.h interface is
   implemented only if this is defined.
 
-GC_ASSERTIONS   Enable some internal GC assertion checking.  Currently
-  this facility is only used in a few places.  It is intended primarily
-  for debugging of the garbage collector itself, but could also...
+GC_ASSERTIONS   Enable some internal GC assertion checking.  It is intended
+  primarily for debugging of the garbage collector itself, but could also
+  help to identify cases of incorrect GC usage by a client.
 
 DBG_HDRS_ALL    Make sure that all objects have debug headers.  Increases
   the reliability (from 99.9999% to 100% mod. bugs) of some of the debugging
@@ -350,10 +350,10 @@ SHORT_DBG_HDRS  Assume that all objects have debug headers.  Shorten
 SAVE_CALL_COUNT=<n>     Set the number of call frames saved with objects
   allocated through the debugging interface.  Affects the amount of
   information generated in leak reports.  Only matters on platforms
-  on which we can quickly generate call stacks, currently Linux/(X86 & SPARC)
-  and Solaris/SPARC and platforms that provide execinfo.h.
-  Default is zero.  On X86, client
-  code should NOT be compiled with -fomit-frame-pointer.
+  on which we can quickly generate call stacks, currently Linux/X86,
+  Linux/SPARC, Solaris/SPARC, and platforms that provide execinfo.h.
+  Default is zero.  On X86, client code should NOT be compiled with
+  -fomit-frame-pointer.
 
 SAVE_CALL_NARGS=<n>     Set the number of functions arguments to be saved
   with each call frame.  Default is zero.  Ignored if we don't know how to
@@ -367,13 +367,11 @@ GC_GCJ_SUPPORT  Includes support for gcj (and possibly other systems
   that include a pointer to a type descriptor in each allocated object).
 
 USE_I686_PREFETCH       Causes the collector to issue Pentium III style
-  prefetch instructions.  No effect except on X86 Linux platforms.
-  Assumes a very recent gcc-compatible compiler and assembler.
-  (Gas prefetcht0 support was added around May 1999.)
+  prefetch instructions.  No effect except on Linux/X86 platforms.
   Empirically the code appears to still run correctly on Pentium II
   processors, though with no performance benefit.  May not run on other
-  X86 processors?  In some cases this improves performance by
-  15% or so.
+  X86 processors probably.  In some cases this improves performance by 15%
+  or so.
 
 USE_3DNOW_PREFETCH      Causes the collector to issue AMD 3DNow style
   prefetch instructions.  Same restrictions as USE_I686_PREFETCH.
@@ -396,8 +394,7 @@ GC_USE_DLOPEN_WRAP      Causes the collector to redefine malloc and
 THREAD_LOCAL_ALLOC      Defines GC_malloc(), GC_malloc_atomic() and
   GC_gcj_malloc() to use a per-thread set of free-lists. These then allocate
   in a way that usually does not involve acquisition of a global lock.
-  Recommended for multiprocessors.  Requires explicit GC_INIT() call, unless
-  REDIRECT_MALLOC is defined and GC_malloc is used first.
+  Recommended for multiprocessors.
 
 USE_COMPILER_TLS        Causes thread local allocation to use
   the compiler-supported "__thread" thread-local variables.  This is the
@@ -535,7 +532,7 @@ DONT_USE_USER32_DLL (Win32 only)        Don't use "user32" DLL import library
 GC_PREFER_MPROTECT_VDB  Choose MPROTECT_VDB manually in case of multiple
   virtual dirty bit strategies are implemented (at present useful on Win32 and
   Solaris to force MPROTECT_VDB strategy instead of the default GWW_VDB or
-  PROC_VDB ones).
+  PROC_VDB ones, respectively).
 
 GC_IGNORE_GCJ_INFO      Disable GCJ-style type information (useful for
   debugging on WinCE).
diff --git a/doc/README.solaris2 b/doc/README.solaris2
index 7d8814fe..85f0ea22 100644
--- a/doc/README.solaris2
+++ b/doc/README.solaris2
@@ -10,7 +10,7 @@ the collector normally obtains memory through sbrk.  There is some reason
 to expect that this is not safe if the client program also calls the system
 malloc, or especially realloc.  The sbrk man page strongly suggests this is
 not safe: "Many library routines use malloc() internally, so use brk()
-and sbrk() only when you know  that malloc() definitely will not be used by
+and sbrk() only when you know that malloc() definitely will not be used by
 any library routine."  This doesn't make a lot of sense to me, since there
 seems to be no documentation as to which routines can transitively call malloc.
 Nonetheless, under Solaris2, the collector now allocates
@@ -33,9 +33,9 @@ by configure --disable-parallel-mark option).
 
 It is also essential that gc.h be included in files that call pthread_create,
 pthread_join, pthread_detach, or dlopen.  gc.h macro defines these to also do
-GC bookkeeping, etc.  gc.h must be included with one or both of these macros
-defined, otherwise these replacements are not visible.  A collector built in
-this way way only be used by programs that are linked with the threads library.
+GC bookkeeping, etc.  gc.h must be included with GC_THREADS macro defined
+first, otherwise these replacements are not visible.  A collector built in
+this way may only be used by programs that are linked with the threads library.
 
 Unless USE_PROC_FOR_LIBRARIES is defined, dlopen disables collection
 temporarily.  In some unlikely cases, this can result in unpleasant heap
@@ -46,11 +46,11 @@ GC_malloc, it is necessary to call GC_INIT explicitly before forking the
 first thread.  (This avoids a deadlock arising from calling GC_thr_init
 with the allocation lock held.)
 
-It appears that there is a problem in using gc_cpp.h in conjunction with
-Solaris threads and Sun's C++ runtime.  Apparently the overloaded new operator
-is invoked by some iostream initialization code before threads are correctly
-initialized.  As a result, call to thr_self() in garbage collector
-initialization  SEGV faults.  Currently the only known workaround is to not
+There could be an issue when using gc_cpp.h in conjunction with Solaris
+threads and Sun's C++ runtime.  Apparently the overloaded new operator
+may be invoked by some iostream initialization code before threads are
+correctly initialized.  This may cause a SIGSEGV during initialization
+of the garbage collector.  Currently the only known workaround is to not
 invoke the garbage collector from a user defined global operator new, or to
 have it invoke the garbage-collector's allocators only after main has started.
 (Note that the latter requires a moderately expensive test in operator
diff --git a/doc/README.win32 b/doc/README.win32
index 19038c40..48a95fa2 100644
--- a/doc/README.win32
+++ b/doc/README.win32
@@ -1,8 +1,7 @@
-The collector has at various times been compiled under Windows 95 & later, NT,
-and XP, with the original Microsoft SDK, with Visual C++ 2.0, 4.0, and 6, with
-the GNU win32 tools, with Borland C++ Builder, with Watcom C, and
-with the Digital Mars compiler.  It is likely that some of these have been
-broken in the meantime.  Patches are appreciated.
+The collector has at various times been compiled under Windows 95 and later,
+NT, and XP, with the original Microsoft SDK, with Visual C++ 2.0, 4.0, and 6,
+with the GNU win32 tools, with Borland C++ Builder, with Watcom C, with EMX,
+and with the Digital Mars compiler (DMC).
 
 For historical reasons,
 the collector test program "gctest" is linked as a GUI application,
@@ -11,8 +10,8 @@ but does not open any windows.  Its output normally appears in the file
 cursor may appear as long as it's running.  If it is started from the
 command line, it will usually run in the background.  Wait a few
 minutes (a few seconds on a modern machine) before you check the output.
-You should see either a failure indication or a "Collector appears to
-work" message.
+You should see either a failure indication or a "Collector appears to work"
+message.
 
 A toy editor (cord/de.exe) based on cords (heavyweight
 strings represented as trees) has been ported and is included.
@@ -40,6 +39,7 @@ since we now separate heap sections with an unused page.)
 
 Microsoft Tools
 ---------------
+
 For Microsoft development tools, type
 "nmake -f NT_MAKEFILE cpu=i386 make_as_lib=1 nothreads=1 nodebug=1"
 to build the release variant of the collector as a static library without
@@ -61,6 +61,7 @@ collector was built as a static library.
 
 GNU Tools
 ---------
+
 The collector should be buildable under Cygwin with the
 "./configure; make check" machinery.
 
@@ -78,6 +79,7 @@ Memory unmapping could be turned off by "--disable-munmap" option.
 
 Borland Tools
 -------------
+
 [Rarely tested.]
 For Borland tools, use BCC_MAKEFILE.  Note that
 Borland's compiler defaults to 1 byte alignment in structures (-a1),
@@ -132,7 +134,6 @@ If the gc is compiled as dll, the macro "GC_DLL" should be defined before
 including "gc.h" (for example, with -DGC_DLL compiler option). It's
 important, otherwise resulting programs will not run.
 
-
 Special note for OpenWatcom users: the C (unlike the C++) compiler (of the
 latest stable release, not sure for older ones) doesn't force pointer global
 variables (i.e. not struct fields, not sure for locals) to be aligned unless
@@ -141,9 +142,9 @@ pragma) only controls alignment for structs; I don't know whether it's a bug or
 a feature (see an old report of same kind -
 http://bugzilla.openwatcom.org/show_bug.cgi?id=664), so You are warned.
 
-
 Incremental Collection
 ----------------------
+
 There is some support for incremental collection.  By default, the
 collector chooses between explicit page protection, and GetWriteWatch-based
 write tracking automatically, depending on the platform.
@@ -199,7 +200,7 @@ CMakeLists.txt).
 
 For the normal, non-dll-based thread tracking to work properly,
 threads should be created with GC_CreateThread or GC_beginthreadex,
-and exit normally or call GC_endthreadex or GC_ExitThread.  (For Cygwin, the
+and exit normally, or call GC_endthreadex or GC_ExitThread.  (For Cygwin, the
 standard pthread_create/exit calls could be used instead.)  As in the pthread
 case, including gc.h will redefine CreateThread, _beginthreadex,
 _endthreadex, and ExitThread to call the GC_ versions instead.
diff --git a/doc/debugging.md b/doc/debugging.md
index 10727f27..c43c355c 100644
--- a/doc/debugging.md
+++ b/doc/debugging.md
@@ -43,13 +43,8 @@ currently uses SIGPWR and SIGXCPU by default.
 The garbage collector generates warning messages of the form:
 
 
-    Needed to allocate blacklisted block at 0x...
-
-
-or
-
-
     Repeated allocation of very large block ...
+    May lead to memory leak and poor performance
 
 
 when it needs to allocate a block at a location that it knows to be referenced
diff --git a/doc/finalization.md b/doc/finalization.md
index 75428cd6..49aee982 100644
--- a/doc/finalization.md
+++ b/doc/finalization.md
@@ -37,19 +37,18 @@ In general the following guidelines should be followed:
   recently used logically open files. Any other needed files would be closed
   after saving their state. They would then be reopened on demand.
   Finalization would logically close the file, closing the real descriptor
-  only if it happened to be cached.) Note that most modern systems (e.g. Irix)
-  allow hundreds or thousands of open files, and this is typically not
-  an issue.
+  only if it happened to be cached.) Note that most modern systems allow
+  thousands of open files, and this is typically not an issue.
   * Finalization code may be run anyplace an allocation or other call to the
   collector takes place. In multi-threaded programs, finalizers have to obey
   the normal locking conventions to ensure safety. Code run directly from
   finalizers should not acquire locks that may be held during allocation.
-  This restriction can be easily circumvented by registering a finalizer which
-  enqueues the real action for execution in a separate thread.
+  This restriction can be easily circumvented by calling
+  `GC_set_finalize_on_demand(1)` at program start and creating a separate
+  thread dedicated to periodic invocation of `GC_invoke_finalizers()`.
 
-In single-threaded code, it is also often easiest to have finalizers queue
-actions, which are then explicitly run during an explicit call by the user's
-program.
+In single-threaded code, it is also often easiest to have finalizers queued
+and, then to have them explicitly executed by `GC_invoke_finalizers()`.
 
 ## Topologically ordered finalization
 
diff --git a/doc/gc.man b/doc/gc.man
index 590bf544..dd3da022 100644
--- a/doc/gc.man
+++ b/doc/gc.man
@@ -1,4 +1,4 @@
-.TH BDWGC 3 "15 Aug 2018"
+.TH BDWGC 3 "26 Mar 2019"
 .SH NAME
 GC_malloc, GC_malloc_atomic, GC_free, GC_realloc, GC_enable_incremental, GC_register_finalizer, GC_malloc_ignore_off_page, GC_malloc_atomic_ignore_off_page, GC_set_warn_proc \- Garbage collecting malloc replacement
 .SH SYNOPSIS
@@ -6,10 +6,20 @@ GC_malloc, GC_malloc_atomic, GC_free, GC_realloc, GC_enable_incremental, GC_regi
 .br
 void * GC_malloc(size_t size);
 .br
+void * GC_malloc_atomic(size_t size);
+.br
 void GC_free(void *ptr);
 .br
 void * GC_realloc(void *ptr, size_t size);
 .br
+void GC_enable_incremental();
+.br
+void * GC_malloc_ignore_off_page(size_t size);
+.br
+void * GC_malloc_atomic_ignore_off_page(size_t size);
+.br
+void GC_set_warn_proc(void (*proc)(char *, GC_word));
+.br
 .sp
 cc ... -lgc
 .LP
@@ -67,7 +77,7 @@ inform the collector that the client code will always maintain a pointer to near
 .LP
 It is also possible to use the collector to find storage leaks in programs destined to be run with standard malloc/free.  The collector can be compiled for thread-safe operation.  Unlike standard malloc, it is safe to call malloc after a previous malloc call was interrupted by a signal, provided the original malloc call is not resumed.
 .LP
-The collector may, on rare occasion produce warning messages.  On UNIX machines these appear on stderr.  Warning messages can be filtered, redirected, or ignored with
+The collector may, on rare occasion, produce warning messages.  On UNIX machines these appear on stderr.  Warning messages can be filtered, redirected, or ignored with
 .I
 GC_set_warn_proc
 This is recommended for production code.  See gc.h for details.
@@ -75,7 +85,7 @@ This is recommended for production code.  See gc.h for details.
 Fully portable code should call
 .I
 GC_INIT
-from the main program before making any other GC calls.
+from the primordial thread of the main program before making any other GC calls.
 On most platforms this does nothing and the collector is initialized on first use.
 On a few platforms explicit initialization is necessary.  And it can never hurt.
 .LP
diff --git a/doc/gcdescr.md b/doc/gcdescr.md
index 923ccf82..daaf5203 100644
--- a/doc/gcdescr.md
+++ b/doc/gcdescr.md
@@ -58,17 +58,16 @@ of the garbage collector is stored inside the `_GC_arrays` structure. This
 allows the garbage collector to easily ignore the collectors own data
 structures when it searches for root pointers. Other allocator and collector
 internal data structures are allocated dynamically with `GC_scratch_alloc`.
-`GC_scratch_alloc` does not allow for deallocation, and is therefore used only
-for permanent data structures.
+The latter does not allow for deallocation, and is therefore used only for
+permanent data structures.
 
-The allocator allocates objects of different _kinds_. Different kinds are
+The allocator returns objects of different _kinds_. Different _kinds_ are
 handled somewhat differently by certain parts of the garbage collector.
 Certain kinds are scanned for pointers, others are not. Some may have
 per-object type descriptors that determine pointer locations. Or a specific
 kind may correspond to one specific object layout. Two built-in kinds are
-uncollectible.
-In spite of that, it is very likely that most C clients of the collector
-currently use at most two kinds: `NORMAL` and `PTRFREE` objects. The
+uncollectible. In spite of that, it is very likely that most C clients of the
+collector currently use at most two kinds: `NORMAL` and `PTRFREE` objects. The
 [GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcj/) runtime also makes heavy
 use of a kind (allocated with `GC_gcj_malloc`) that stores type information
 at a known offset in method tables.
@@ -181,7 +180,7 @@ variables are located, it scans the following _root segments_ for pointers:
   a length. (For other possibilities, see `gc_mark.h`.)
 
 At the beginning of the mark phase, all root segments (as described above) are
-pushed on the stack by `GC_push_roots`. (Registers and eagerly processed stack
+pushed on the stack by `GC_push_roots`. (Registers and eagerly scanned stack
 sections are processed by pushing the referenced objects instead of the stack
 section itself.) If `ALL_INTERIOR_POINTERS` is not defined, then stack roots
 require special treatment. In this case, the normal marking code ignores
@@ -233,9 +232,9 @@ forward progress, even in case of repeated mark stack overflows. Every mark
 attempt results in additional marked objects.
 
 Each mark stack entry is processed by examining all candidate pointers in the
-range described by the entry. If the region has no associated type
-information, then this typically requires that each 4-byte aligned quantity
-(8-byte aligned with 64-bit pointers) be considered a candidate pointer.
+range described by the entry. If the region has no associated type information
+then this typically requires that each 4-byte aligned quantity (8-byte aligned
+if 64-bit pointers) be considered a candidate pointer.
 
 We determine whether a candidate pointer is actually the address of a heap
 block. This is done in the following steps:
@@ -248,8 +247,10 @@ block. This is done in the following steps:
   * The candidate pointer is divided into two pieces; the most significant
   bits identify a `HBLKSIZE`-sized page in the address space, and the least
   significant bits specify an offset within that page. (A hardware page may
-  actually consist of multiple such pages. HBLKSIZE is usually the page size
-  divided by a small power of two.)
+  actually consist of multiple such pages. Normally, HBLKSIZE is usually the
+  page size divided by a small power of two. Alternatively, if the collector
+  is built with `-DLARGE_CONFIG`, such a page may consist of multiple hardware
+  pages.)
   * The page address part of the candidate pointer is looked up in
   a [table](tree.md). Each table entry contains either 0, indicating that
   the page is not part of the garbage collected heap, a small integer _n_,
@@ -268,8 +269,8 @@ block. This is done in the following steps:
   operation in computing the object start address.
   * The mark bit for the target object is checked and set. If the object was
   previously unmarked, the object is pushed on the mark stack. The descriptor
-  is read from the page descriptor. (This is computed from information
-  `GC_obj_kinds` when the page is first allocated.)
+  is read from the page descriptor. (This is computed from information stored
+  in `GC_obj_kinds` when the page is first allocated.)
 
 At the end of the mark phase, mark bits for left-over free lists are cleared,
 in case a free list was accidentally marked due to a stray pointer.
@@ -372,7 +373,7 @@ collector, and hence provoking unneeded heap growth.
 
 In incremental mode, the heap is always expanded when we encounter
 insufficient space for an allocation. Garbage collection is triggered whenever
-we notice that more than `GC_heap_size`/2 * `GC_free_space_divisor` bytes
+we notice that more than `GC_heap_size / 2 * GC_free_space_divisor` bytes
 of allocation have taken place. After `GC_full_freq` minor collections a major
 collection is started.
 
@@ -473,27 +474,27 @@ allocation in the next section.
 
 ## Thread-local allocation
 
-If thread-local allocation is enabled, the collector keeps separate arrays
-of free lists for each thread. Thread-local allocation is currently only
-supported on a few platforms.
+If thread-local allocation is enabled (which is true in the default
+configuration for most supported platforms), the collector keeps separate
+arrays of free lists for each thread.
 
 The free list arrays associated with each thread are only used to satisfy
 requests for objects that are both very small, and belong to one of a small
-number of well-known kinds. These currently include _normal_ and pointer-free
-objects. Depending on the configuration, _gcj_ objects may also be included.
+number of well-known kinds. These include _normal_, pointer-free, _gcj_ and
+_disclaim_ objects.
 
 Thread-local free list entries contain either a pointer to the first element
 of a free list, or they contain a counter of the number of allocation
 granules, corresponding to objects of this size, allocated so far. Initially
 they contain the value one, i.e. a small counter value.
 
-Thread-local allocation allocates directly through the global allocator,
-if the object is of a size or kind not covered by the local free lists.
+Thread-local allocation goes directly through the global allocator if the
+object is of a size or kind not covered by the local free lists.
 
 If there is an appropriate local free list, the allocator checks whether
 it contains a sufficiently small counter value. If so, the counter is simply
-incremented by the counter value, and the global allocator is used. In this
-way, the initial few allocations of a given size bypass the local allocator.
+incremented by a value, and the global allocator is used. In this way,
+the initial few allocations of a given size bypass the local allocator.
 A thread that only allocates a handful of objects of a given size will not
 build up its own free list for that size. This avoids wasting space for
 unpopular objects sizes or kinds.
diff --git a/doc/gcinterface.md b/doc/gcinterface.md
index 9fb1b849..cbce0983 100644
--- a/doc/gcinterface.md
+++ b/doc/gcinterface.md
@@ -15,7 +15,8 @@ on how the collector is built, this will be `gc.a` or `libgc.{a,so}`.
 The following describes the standard C interface to the garbage collector.
 It is not a complete definition of the interface. It describes only the most
 commonly used functionality, approximately in decreasing order of frequency
-of use. The full interface is described in `gc.h` file.
+of use. This somewhat duplicates the information in `gc.man` file. The full
+interface is described in `gc.h` file.
 
 Clients should include `gc.h` (i.e., not `gc_config_macros.h`,
 `gc_pthread_redirects.h`, `gc_version.h`). In the case of multi-threaded code,
@@ -27,11 +28,11 @@ to cooperate with the GC on many platforms.
 Thread users should also be aware that on many platforms objects reachable
 only from thread-local variables may be prematurely reclaimed. Thus objects
 pointed to by thread-local variables should also be pointed to by a globally
-visible data structure. (This is viewed as a bug, but as one that
-is exceedingly hard to fix without some `libc` hooks.)
+visible data area, e.g. thread's stack. (This behavior is viewed as a bug, but
+as one that is exceedingly hard to fix without some `libc` hooks.)
 
-**void * `GC_MALLOC`(size_t _nbytes_)** - Allocates and clears _nbytes_
-of storage. Requires (amortized) time proportional to _nbytes_. The resulting
+**void * `GC_MALLOC`(size_t _bytes_)** - Allocates and clears _bytes_
+of storage. Requires (amortized) time proportional to _bytes_. The resulting
 object will be automatically deallocated when unreferenced. References from
 objects allocated with the system malloc are usually not considered by the
 collector. (See `GC_MALLOC_UNCOLLECTABLE`, however. Building the collector
@@ -40,33 +41,33 @@ with `-DREDIRECT_MALLOC=GC_malloc_uncollectable` is often a way around this.)
 is defined before `gc.h` is included, a debugging version that checks
 occasionally for overwrite errors, and the like.
 
-**void * `GC_MALLOC_ATOMIC`(size_t _nbytes_)** - Allocates _nbytes_
-of storage. Requires (amortized) time proportional to _nbytes_. The resulting
+**void * `GC_MALLOC_ATOMIC`(size_t _bytes_)** - Allocates _bytes_
+of storage. Requires (amortized) time proportional to _bytes_. The resulting
 object will be automatically deallocated when unreferenced. The client
 promises that the resulting object will never contain any pointers. The memory
 is not cleared. This is the preferred way to allocate strings, floating point
 arrays, bitmaps, etc. More precise information about pointer locations can be
 communicated to the collector using the interface in `gc_typed.h`.
 
-**void * `GC_MALLOC_UNCOLLECTABLE`(size_t _nbytes_)** - Identical
+**void * `GC_MALLOC_UNCOLLECTABLE`(size_t _bytes_)** - Identical
 to `GC_MALLOC`, except that the resulting object is not automatically
 deallocated. Unlike the system-provided `malloc`, the collector does scan the
 object for pointers to garbage-collectible memory, even if the block itself
 does not appear to be reachable. (Objects allocated in this way are
 effectively treated as roots by the collector.)
 
-**void * `GC_REALLOC`(void * _old_, size_t _new_size_)** - Allocates a new
-object of the indicated size and copy (a prefix of) the old object into the
+**void * `GC_REALLOC`(void * _old_object_, size_t _new_bytes_)** - Allocates
+a new object of the indicated size and copy the old object's content into the
 new object. The old object is reused in place if convenient. If the original
 object was allocated with `GC_MALLOC_ATOMIC`, the new object is subject to the
 same constraints. If it was allocated as an uncollectible object, then the new
 object is uncollectible, and the old object (if different) is deallocated.
 
-**void `GC_FREE`(void * _dead_)** - Explicitly deallocates an object.
+**void `GC_FREE`(void * _object_)** - Explicitly deallocates an _object_.
 Typically not useful for small collectible objects.
 
-**void * `GC_MALLOC_IGNORE_OFF_PAGE`(size_t _nbytes_)** and
-**void * `GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE`(size_t _nbytes_)** - Analogous
+**void * `GC_MALLOC_IGNORE_OFF_PAGE`(size_t _bytes_)** and
+**void * `GC_MALLOC_ATOMIC_IGNORE_OFF_PAGE`(size_t _bytes_)** - Analogous
 to `GC_MALLOC` and `GC_MALLOC_ATOMIC`, respectively, except that the client
 guarantees that as long as the resulting object is of use, a pointer
 is maintained to someplace inside the first 512 bytes of the object. This
@@ -75,7 +76,9 @@ optimizations. (Other nonvolatile pointers to the object may exist as well.)
 This is the preferred way to allocate objects that are likely to be
 more than 100 KB in size. It greatly reduces the risk that such objects will
 be accidentally retained when they are no longer needed. Thus space usage may
-be significantly reduced.
+be significantly reduced. Another way is `GC_set_all_interior_pointers(0)`
+called at program start (this, however, is generally not suitable for C++ code
+because of multiple inheretance).
 
 **void `GC_INIT()`** - On some platforms, it is necessary to invoke this _from
 the main executable_, _not from a dynamic library_, before the initial
@@ -89,9 +92,9 @@ as possible.
 to perform a small amount of work every few invocations of `GC_MALLOC` or the
 like, instead of performing an entire collection at once. This is likely
 to increase total running time. It will improve response on a platform that
-either has suitable support in the garbage collector (Linux and most Unix
-versions, Win32 if the collector was suitably built). On many platforms this
-interacts poorly with system calls that write to the garbage collected heap.
+has suitable support in the garbage collector (Linux and most Unix versions,
+Win32 if the collector was suitably built). On many platforms this interacts
+poorly with system calls that write to the garbage collected heap.
 
 **void `GC_set_warn_proc`(GC_warn_proc)** - Replaces the default procedure
 used by the collector to print warnings. The collector may otherwise
@@ -105,20 +108,17 @@ releasing system resources (e.g. closing files) when the object referencing
 them becomes inaccessible. It is not an acceptable method to perform actions
 that must be performed in a timely fashion. See `gc.h` for details of the
 interface. See also [here](finalization.md) for a more detailed discussion
-of the design.
-
-Note that an object may become inaccessible before client code is done
-operating on objects referenced by its fields. Suitable synchronization
-is usually required. See
-[here](http://portal.acm.org/citation.cfm?doid=604131.604153)
-or [here](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for
-details.
+of the design. Note that an object may become inaccessible before client code
+is done operating on objects referenced by its fields. Suitable
+synchronization is usually required. See
+[here](http://portal.acm.org/citation.cfm?doid=604131.604153) or
+[here](http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html) for details.
 
 If you are concerned with multiprocessor performance and scalability, you
 should consider enabling and using thread local allocation.
 
-If your platform supports it, you should build the collector with parallel
-marking support (`-DPARALLEL_MARK`); configure has it on by default.
+If your platform supports it, you should also build the collector with
+parallel marking support (`-DPARALLEL_MARK`); configure has it on by default.
 
 If the collector is used in an environment in which pointer location
 information for heap objects is easily available, this can be passed on to the
diff --git a/doc/leak.md b/doc/leak.md
index cf5992c4..9d5f32ba 100644
--- a/doc/leak.md
+++ b/doc/leak.md
@@ -19,26 +19,26 @@ of paging.
 The garbage collector provides leak detection support. This includes the
 following features:
 
-  1. Leak detection mode can be initiated at run-time by setting
-  `GC_find_leak` instead of building the collector with `FIND_LEAK` defined.
-  This variable should be set to a nonzero value at program startup.
+  1. Leak detection mode can be initiated at run-time by `GC_set_find_leak(1)`
+  call at program startup instead of building the collector with `FIND_LEAK`
+  macro defined.
   2. Leaked objects should be reported and then correctly garbage collected.
 
-To use the collector as a leak detector, follow the following steps:
+To use the collector as a leak detector, do the following steps:
 
-  1. Build the collector with `-DFIND_LEAK`. Otherwise use default build
-  options.
+  1. Activate the leak detection mode as described above.
   2. Change the program so that all allocation and deallocation goes through
   the garbage collector.
-  3. Arrange to call `GC_gcollect` at appropriate points to check for leaks.
-  (For sufficiently long running programs, this will happen implicitly, but
-  probably not with sufficient frequency.)
+  3. Arrange to call `GC_gcollect` (or `CHECK_LEAKS()`) at appropriate points
+  to check for leaks. (This happens implicitly but probably not with
+  a sufficient frequency for long running programs.)
 
 The second step can usually be accomplished with the
 `-DREDIRECT_MALLOC=GC_malloc` option when the collector is built, or by
-defining `malloc`, `calloc`, `realloc` and `free` to call the corresponding
+defining `malloc`, `calloc`, `realloc`, `free` (as well as `strdup`,
+`strndup`, `wcsdup`, `memalign`, `posix_memalign`) to call the corresponding
 garbage collector functions. But this, by itself, will not yield very
-informative diagnostics, since the collector does not keep track of
+informative diagnostics, since the collector does not keep track of the
 information about how objects were allocated. The error reports will include
 only object addresses.
 
@@ -57,7 +57,7 @@ The same is generally true of thread support. However, the correct leak
 reports should be generated with linuxthreads, at least.
 
 On a few platforms (currently Solaris/SPARC, Irix, and, with
--DSAVE_CALL_CHAIN, Linux/X86), `GC_MALLOC` also causes some more information
+`-DSAVE_CALL_CHAIN`, Linux/X86), `GC_MALLOC` also causes some more information
 about its call stack to be saved in the object. Such information is reproduced
 in the error reports in very non-symbolic form, but it can be very useful with
 the aid of a debugger.
@@ -70,13 +70,14 @@ distribution.
 Assume the collector has been built with `-DFIND_LEAK` or
 `GC_set_find_leak(1)` exists as the first statement in `main`.
 
-The program to be tested for leaks can then look like "leak_test.c" file
-in the "tests" subdirectory of the distribution.
+The program to be tested for leaks could look like `tests/leak_test.c` file
+of the distribution.
 
 On an Intel X86 Linux system this produces on the stderr stream:
 
 
-    Leaked composite object at 0x806dff0 (leak_test.c:8, sz=4)
+    Found 1 leaked objects:
+    0x806dff0 (tests/leak_test.c:19, sz=4, NORMAL)
 
 
 (On most unmentioned operating systems, the output is similar to this. If the
@@ -87,7 +88,8 @@ not be compiled with `-fomit_frame_pointer`.)
 On Irix it reports:
 
 
-    Leaked composite object at 0x10040fe0 (leak_test.c:8, sz=4)
+    Found 1 leaked objects:
+    0x10040fe0 (tests/leak_test.c:19, sz=4, NORMAL)
             Caller at allocation:
                     ##PC##= 0x10004910
 
@@ -95,7 +97,8 @@ On Irix it reports:
 and on Solaris the error report is:
 
 
-    Leaked composite object at 0xef621fc8 (leak_test.c:8, sz=4)
+    Found 1 leaked objects:
+    0xef621fc8 (tests/leak_test.c:19, sz=4, NORMAL)
             Call chain at allocation:
                     args: 4 (0x4), 200656 (0x30FD0)
                     ##PC##= 0x14ADC
@@ -106,14 +109,13 @@ and on Solaris the error report is:
 In the latter two cases some additional information is given about how malloc
 was called when the leaked object was allocated. For Solaris, the first line
 specifies the arguments to `GC_debug_malloc` (the actual allocation routine),
-The second the program counter inside main, the third the arguments to `main`,
-and finally the program counter inside the caller to main (i.e. in the
-C startup code).
-
-In the Irix case, only the address inside the caller to main is given.
+The second one specifies the program counter inside `main`, the third one
+specifies the arguments to `main`, and, finally, the program counter inside
+the caller to `main` (i.e. in the C startup code). In the Irix case, only the
+address inside the caller to `main` is given.
 
 In many cases, a debugger is needed to interpret the additional information.
-On systems supporting the "adb" debugger, the `tools/callprocs.sh` script can
+On systems supporting the `adb` debugger, the `tools/callprocs.sh` script can
 be used to replace program counter values with symbolic names. The collector
 tries to generate symbolic names for call stacks if it knows how to do so on
 the platform. This is true on Linux/X86, but not on most other platforms.
diff --git a/doc/overview.md b/doc/overview.md
index 2a28a0fa..3edccf1f 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -6,9 +6,9 @@
   * Platforms
   * Some collector details
   * Further reading
-  * Local Links for this collector
-  * Local Background Links
-  * Contacts and Mailing List
+  * Information provided on the BDWGC site
+  * More background information
+  * Contacts and new release announcements
 
 [ This is an updated version of the page formerly at
 `www.hpl.hp.com/personal/Hans_Boehm/gc/`, before that at
@@ -39,6 +39,8 @@ legacy. Usually you should use the one marked as the _latest stable_ release.
 Preview versions may contain additional features, platform support, but are
 likely to be less well tested. The list of changes for each version
 is specified on the [releases](https://github.com/ivmai/bdwgc/releases) page.
+The development version (snapshot) is available in the master branch of
+[bdwgc git](https://github.com/ivmai/bdwgc) repository on GitHub.
 
 The arguments for and against conservative garbage collection in C and C++ are
 briefly discussed [here](http://www.hboehm.info/gc/issues.html). The
@@ -48,28 +50,30 @@ beginnings of a frequently-asked-questions list are
 The garbage collector code is copyrighted by
 [Hans-J. Boehm](http://www.hboehm.info), Alan J. Demers,
 [Xerox Corporation](http://www.xerox.com/),
-[Silicon Graphics](http://www.sgi.com/), and
-[Hewlett-Packard Company](http://www.hp.com/). It may be used and copied
-without payment of a fee under minimal restrictions. See the README.md file
-in the distribution or the [license](http://www.hboehm.info/gc/license.txt)
-for more details. **IT IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
-EXPRESSED OR IMPLIED. ANY USE IS AT YOUR OWN RISK**.
+[Silicon Graphics](http://www.sgi.com/),
+[Hewlett-Packard Company](http://www.hp.com/),
+[Ivan Maidanski](https://github.com/ivmai), and partially by some others.
+It may be used and copied without payment of a fee under minimal restrictions.
+See the README.md file in the distribution or the
+[license](http://www.hboehm.info/gc/license.txt) for more details.
+**IT IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED OR IMPLIED.
+ANY USE IS AT YOUR OWN RISK.**
 
 Empirically, this collector works with most unmodified C programs, simply
-by replacing `malloc` with `GC_malloc` calls, replacing `realloc` with
-`GC_realloc` calls, and removing free calls. Exceptions are discussed
+by replacing `malloc` and `calloc` with `GC_malloc` calls, replacing `realloc`
+with `GC_realloc` calls, and removing `free` calls. Exceptions are discussed
 [here](http://www.hboehm.info/gc/issues.html).
 
 ## Platforms
 
 The collector is not completely portable, but the distribution includes ports
 to most standard PC and UNIX/Linux platforms. The collector should work
-on Linux, *BSD, recent Windows versions, MacOS X, HP/UX, Solaris, Tru64, Irix
-and a few other operating systems. Some ports are more polished than others.
+on Linux, Android, BSD variants, OS/2, Windows (Win32 and Win64), MacOS X,
+iOS, HP/UX, Solaris, Tru64, Irix, Symbian and other operating systems. Some
+platforms are more polished (better supported) than others.
 
-Irix pthreads, Linux threads, Win32 threads, Solaris threads (pthreads only),
-HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported in recent
-versions.
+Irix pthreads, Linux threads, Windows threads, Solaris threads (pthreads
+only), HP/UX 11 pthreads, Tru64 pthreads, and MacOS X threads are supported.
 
 ## Some Collector Details
 
@@ -77,7 +81,7 @@ The collector uses a [mark-sweep](http://www.hboehm.info/gc/complexity.html)
 algorithm. It provides incremental and generational collection under operating
 systems which provide the right kind of virtual memory support. (Currently
 this includes SunOS[45], IRIX, OSF/1, Linux, and Windows, with varying
-restrictions.) It allows [_finalization_](finalization.md) code to be invoked
+restrictions.) It allows [finalization](finalization.md) code to be invoked
 when an object is collected. It can take advantage of type information
 to locate pointers if such information is provided, but it is usually used
 without such information. See the README and `gc.h` files in the distribution
@@ -102,16 +106,18 @@ thread-local allocation, it may in some cases significantly outperform
 `malloc`/`free` allocation in time.
 
 We also expect that in many cases any additional overhead will be more than
-compensated for by decreased copying etc. if programs are written and tuned
+compensated for by e.g. decreased copying if programs are written and tuned
 for garbage collection.
 
 ## Further reading
 
 **The beginnings of a frequently asked questions list for this collector are
-[here](http://www.hboehm.info/gc/faq.html)**.
+[here](http://www.hboehm.info/gc/faq.html).**
 
-**The following provide information on garbage collection in general**: Paul
-Wilson's [garbage collection ftp archive](ftp://ftp.cs.utexas.edu/pub/garbage)
+**The following provide information on garbage collection in general:**
+
+Paul Wilson's
+[garbage collection ftp archive](ftp://ftp.cs.utexas.edu/pub/garbage)
 and [GC survey](ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps).
 
 The Ravenbrook
@@ -124,7 +130,7 @@ Richard Jones'
 and his [book](http://www.cs.kent.ac.uk/people/staff/rej/gcbook/gcbook.html).
 
 **The following papers describe the collector algorithms we use and the
-underlying design decisions at a higher level.**
+underlying design decisions at a higher level:**
 
 (Some of the lower level details can be found [here](gcdescr.md).)
 
@@ -181,7 +187,7 @@ version. Includes a discussion of a collector facility to much more reliably
 test for the potential of unbounded heap growth.
 
 **The following papers discuss language and compiler restrictions necessary
-to guaranteed safety of conservative garbage collection.**
+to guaranteed safety of conservative garbage collection:**
 
 We thank John Levine and JCLT for allowing us to make the second paper
 available electronically, and providing PostScript for the final version.
diff --git a/doc/porting.md b/doc/porting.md
index e71346d0..de7e5036 100644
--- a/doc/porting.md
+++ b/doc/porting.md
@@ -6,8 +6,7 @@ as scanning the stack(s), that are not possible in portable C code.
 
 All of the following assumes that the collector is being ported to
 a byte-addressable 32- or 64-bit machine. Currently all successful ports
-to 64-bit machines involve LP64 targets. The code base includes some
-provisions for P64 targets (notably Win64), but that has not been tested. You
+to 64-bit machines involve LP64 and LLP64 targets (notably Win64). You
 are hereby discouraged from attempting a port to non-byte-addressable,
 or 8-bit, or 16-bit machines.
 
@@ -96,7 +95,7 @@ operating system:
   is found. This often works on Posix-like platforms. It makes it harder
   to debug client programs, since startup involves generating and catching
   a segmentation fault, which tends to confuse users.
-  * `DATAEND` - Set to the end of the main data segment. Defaults to `end`,
+  * `DATAEND` - Set to the end of the main data segment. Defaults to `_end`,
   where that is declared as an array. This works in some cases, since the
   linker introduces a suitable symbol.
   * `DATASTART2`, `DATAEND2` - Some platforms have two discontiguous main data
@@ -131,11 +130,12 @@ operating system:
   plausible page boundary, and use that as the stack base.
   * `DYNAMIC_LOADING` - Should be defined if `dyn_load.c` has been updated for
   this platform and tracing of dynamic library roots is supported.
-  * `MPROTECT_VDB`, `PROC_VDB` - May be defined if the corresponding
-  _virtual dirty bit_ implementation in `os_dep.c` is usable on this platform.
-  This allows incremental/generational garbage collection. `MPROTECT_VDB`
-  identifies modified pages by write protecting the heap and catching faults.
-  `PROC_VDB` uses the /proc primitives to read dirty bits.
+  * `GWW_VDB`, `MPROTECT_VDB`, `PROC_VDB` - May be defined if the
+  corresponding _virtual dirty bit_ implementation in `os_dep.c` is usable on
+  this platform. This allows incremental/generational garbage collection.
+  (`GWW_VDB` uses the Win32 `GetWriteWatch` function to read dirty bits,
+  `MPROTECT_VDB` identifies modified pages by write protecting the heap and
+  catching faults. `PROC_VDB` uses the /proc primitives to read dirty bits.)
   * `PREFETCH`, `GC_PREFETCH_FOR_WRITE` - The collector uses `PREFETCH(x)`
   to preload the cache with the data at _x_ address. This defaults to a no-op.
   * `CLEAR_DOUBLE` - If `CLEAR_DOUBLE` is defined, then `CLEAR_DOUBLE(x)`
@@ -209,7 +209,7 @@ stopped with signals. In this case, the changes involve:
   workarounds are common.  Non-preemptive threads packages will probably
   require further work. Similarly thread-local allocation and parallel marking
   requires further work in `pthread_support.c`, and may require better
-  `atomic_ops` support.
+  `atomic_ops` support for the designed platform.
 
 ## Dynamic library support
 
diff --git a/doc/scale.md b/doc/scale.md
index c98edd9d..855e04eb 100644
--- a/doc/scale.md
+++ b/doc/scale.md
@@ -1,12 +1,14 @@
 # Garbage collector scalability
 
-In its default configuration, the Boehm-Demers-Weiser garbage collector is not
-thread-safe. It can be made thread-safe for a number of environments
-by building the collector with `-DGC_THREADS` compilation flag. This has
-primarily two effects:
+If Makefile.direct is used, in its default configuration the
+Boehm-Demers-Weiser garbage collector is not thread-safe. Generally, it can be
+made thread-safe by building the collector with `-DGC_THREADS` compilation
+flag. This has primarily the following effects:
 
   1. It causes the garbage collector to stop all other threads when it needs
-  to see a consistent memory state.
+  to see a consistent memory state. It intercepts thread creation and
+  termination events to maintain a list of client threads to be stopped when
+  needed.
   2. It causes the collector to acquire a lock around essentially all
   allocation and garbage collection activity.  Since a single lock is used for
   all allocation-related activity, only one thread can be allocating
@@ -16,9 +18,9 @@ primarily two effects:
 On most platforms, the allocator/collector lock is implemented as a spin lock
 with exponential back-off. Longer wait times are implemented by yielding
 and/or sleeping. If a collection is in progress, the pure spinning stage
-is skipped. This has the advantage that uncontested and thus most uniprocessor
-lock acquisitions are very cheap. It has the disadvantage that the application
-may sleep for small periods of time even when there is work to be done. And
+is skipped. This has the uncontested advantage that most uniprocessor lock
+acquisitions are very cheap. It has the disadvantage that the application may
+sleep for small periods of time even when there is work to be done. And
 threads may be unnecessarily woken up for short periods. Nonetheless, this
 scheme empirically outperforms native queue-based mutual exclusion
 implementations in most cases, sometimes drastically so.
@@ -31,18 +33,18 @@ to Makefile.direct again.)
 
   * Building the collector with `-DPARALLEL_MARK` allows the collector to run
   the mark phase in parallel in multiple threads, and thus on multiple
-  processors. The mark phase typically consumes the large majority of the
-  collection time. Thus this largely parallelizes the garbage collector
-  itself, though not the allocation process. Currently the marking
+  processors (or processor cores). The mark phase typically consumes the large
+  majority of the collection time. Thus, this largely parallelizes the garbage
+  collector itself, though not the allocation process. Currently the marking
   is performed by the thread that triggered the collection, together with
-  _N_ - 1 dedicated threads, where _N_ is the number of processors detected
-  by the collector. The dedicated threads are created once at initialization
-  time. A second effect of this flag is to switch to a more concurrent
-  implementation of `GC_malloc_many`, so that free lists can be built, and
-  memory can be cleared, by more than one thread concurrently.
+  _N_ - 1 dedicated threads, where _N_ is the number of processors (cores)
+  detected by the collector. The dedicated marker threads are created once at
+  initialization time. Another effect of this flag is to switch to a more
+  concurrent implementation of `GC_malloc_many`, so that free lists can be
+  built and memory can be cleared by more than one thread concurrently.
   * Building the collector with `-DTHREAD_LOCAL_ALLOC` adds support for
-  thread-local allocation. This causes `GC_malloc`, `GC_malloc_atomic`, and
-  `GC_gcj_malloc` to be redefined to perform thread-local allocation.
+  thread-local allocation. This causes `GC_malloc` (actually `GC_malloc_kind`)
+  and `GC_gcj_malloc` to be redefined to perform thread-local allocation.
 
 Memory returned from thread-local allocators is completely interchangeable
 with that returned by the standard allocators. It may be used by other
@@ -55,7 +57,7 @@ An important side effect of this flag is to replace the default
 spin-then-sleep lock to be replaced by a spin-then-queue based implementation.
 This _reduces performance_ for the standard allocation functions, though
 it usually improves performance when thread-local allocation is used heavily,
-and thus the number of short-duration lock acquisitions is greatly reduced.
+and, thus, the number of short-duration lock acquisitions is greatly reduced.
 
 ## The Parallel Marking Algorithm
 
@@ -93,8 +95,9 @@ allocation and incremental collection. They should work correctly with one or
 the other, but not both.
 
 The number of marker threads is set on startup to the number of available
-processors (or to the value of the `GC_NPROCS` environment variable). If only
-a single processor is detected, parallel marking is disabled.
+processor cores (or to the value of either `GC_MARKERS` or `GC_NPROCS`
+environment variable, if provided). If only a single processor is detected,
+parallel marking is disabled.
 
 Note that setting `GC_NPROCS` to 1 also causes some lock acquisitions inside
 the collector to immediately yield the processor instead of busy waiting
@@ -117,7 +120,7 @@ the simple thread-safe collector, built with `-DGC_THREADS`, the execution
 time increased to 10.3 seconds, or 23.5 elapsed seconds with two clients. (The
 times for the `malloc`/`free` version with glibc `malloc` are 10.51 (standard
 library, pthreads not linked), 20.90 (one thread, pthreads linked), and 24.55
-seconds respectively. The benchmark favors a garbage collector, since most
+seconds, respectively. The benchmark favors a garbage collector, since most
 objects are small.)
 
 The following table gives execution times for the collector built with
@@ -161,7 +164,7 @@ processor as 2 clients on 2 processors) is probably not achievable on this
 kind of hardware even with such a small number of processors, since the memory
 system is a major constraint for the garbage collector, the processors usually
 share a single memory bus, and thus the aggregate memory bandwidth does not
-increase in proportion to the number of processors.
+increase in proportion to the number of processors (cores).
 
 These results are likely to be very sensitive to both hardware and OS issues.
 Preliminary experiments with an older Pentium Pro machine running an older
author	Ivan Maidanski <ivmai@mail.ru>	2019-03-26 08:37:11 +0300
committer	Ivan Maidanski <ivmai@mail.ru>	2019-03-26 09:26:09 +0300
commit	6f8f39af54d73711d1226bb109ef2107a5e5b04d (patch)
tree	d6f96a8df3bb5321ceae78c661696b4e6728e7e0 /doc
parent	90840b249b78071840821edf6200692ce9cb0601 (diff)
download	bdwgc-6f8f39af54d73711d1226bb109ef2107a5e5b04d.tar.gz