delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reorganisation of the source tree	Simon Marlow	2006-04-07	1	-1156/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
*	lag/drag/void: add an extra assertion, and some commentary	Simon Marlow	2006-02-23	1	-0/+17
\|
*	make the smp way RTS-only, normal libraries now work with -smp	Simon Marlow	2006-02-08	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had to bite the bullet here and add an extra word to every thunk, to enable running ordinary libraries on SMP. Otherwise, we would have needed to ship an extra set of libraries with GHC 6.6 in addition to the two sets we already ship (normal + profiled), and all Cabal packages would have to be compiled for SMP too. We decided it best just to take the hit now, making SMP easily accessible to everyone in GHC 6.6. Incedentally, although this increases allocation by around 12% on average, the performance hit is around 5%, and much less if your inner loop doesn't use any laziness.
*	fix bug #664 in printSample()	Simon Marlow	2006-01-30	1	-1/+1
\| \| \| \| \| \|	printSample() was attempting to round the fractional part of the time, but not propagated to the non-fractional part. It's probably better not to attempt to round the time at all.
*	[project @ 2006-01-18 11:00:35 by simonmar]	simonmar	2006-01-18	1	-4/+0
\| \| \| \|	Remove dead code
*	[project @ 2006-01-17 16:13:18 by simonmar]	simonmar	2006-01-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the GC behaviour of IORefs (see Ticket #650). This is a small change to the way IORefs interact with the GC, which should improve GC performance for programs with plenty of IORefs. Previously we had a single closure type for mutable variables, MUT_VAR. Mutable variables were always on the mutable list in older generations, and always traversed on every GC. Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY. The latter is on the mutable list, but the former is not. (NB. this differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which are on the mutable list). writeMutVar# now implements a write barrier, by calling dirty_MUT_VAR() in the runtime, that does the necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding to the mutable list if necessary. This results in some pretty dramatic speedups for GHC itself. I've just measureed a 30% overall speedup compiling a 31-module program (anna) with the default heap settings :-D
*	[project @ 2006-01-17 16:03:47 by simonmar]	simonmar	2006-01-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the GC behaviour of IOArrays/STArrays See Ticket #650 This is a small change to the way mutable arrays interact with the GC, that can have a dramatic effect on performance, and make tricks with unsafeThaw/unsafeFreeze redundant. Data.HashTable should be faster now (I haven't measured it yet). We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY. Both are on the mutable list if the array is in an old generation. writeArray# sets the type to MUT_ARR_PTRS_DIRTY. The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it finds that no element of the array points into a younger generation (discovering this required a small addition to evacuate(), but rough tests indicate that it doesn't measurably affect performance). NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only boxed arrays (IOArray/STArray). We could go further and extend the DIRTY bit to be per-block rather than for the whole array, but for now this is an easy improvement.
*	[project @ 2005-10-21 14:02:17 by simonmar]	simonmar	2005-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Big re-hash of the threaded/SMP runtime This is a significant reworking of the threaded and SMP parts of the runtime. There are two overall goals here: - To push down the scheduler lock, reducing contention and allowing more parts of the system to run without locks. In particular, the scheduler does not require a lock any more in the common case. - To improve affinity, so that running Haskell threads stick to the same OS threads as much as possible. At this point we have the basic structure working, but there are some pieces missing. I believe it's reasonably stable - the important parts of the testsuite pass in all the (normal,threaded,SMP) ways. In more detail: - Each capability now has a run queue, instead of one global run queue. The Capability and Task APIs have been completely rewritten; see Capability.h and Task.h for the details. - Each capability has its own pool of worker Tasks. Hence, Haskell threads on a Capability's run queue will run on the same worker Task(s). As long as the OS is doing something reasonable, this should mean they usually stick to the same CPU. Another way to look at this is that we're assuming each Capability is associated with a fixed CPU. - What used to be StgMainThread is now part of the Task structure. Every OS thread in the runtime has an associated Task, and it can ask for its current Task at any time with myTask(). - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead (it is now defined for SMP too). - The RtsAPI has had to change; we must explicitly pass a Capability around now. The previous interface assumed some global state. SchedAPI has also changed a lot. - The OSThreads API now supports thread-local storage, used to implement myTask(), although it could be done more efficiently using gcc's __thread extension when available. - I've moved some POSIX-specific stuff into the posix subdirectory, moving in the direction of separating out platform-specific implementations. - lots of lock-debugging and assertions in the runtime. In particular, when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is also an ASSERT_LOCK_HELD() call. What's missing so far: - I have almost certainly broken the Win32 build, will fix soon. - any kind of thread migration or load balancing. This is high up the agenda, though. - various performance tweaks to do - throwTo and forkProcess still do not work in SMP mode
*	[project @ 2005-10-14 13:11:21 by simonmar]	simonmar	2005-10-14	1	-1/+0
\| \| \| \| \|	StrHash doesn't appear to be used; remove it. I think it was an earlier version of the string hashing code in Hash.c.
*	[project @ 2005-09-16 09:59:26 by simonmar]	simonmar	2005-09-16	1	-1/+32
\| \| \| \| \|	Add some missing cases to heapCensus(); should fix heap profiling of code that uses STM.
*	[project @ 2005-07-26 14:42:53 by simonmar]	simonmar	2005-07-26	1	-0/+1
\| \| \| \| \|	add missing case for MUT_ARR_PTRS_FROZEN0 (might fix some cases of "internal error: heapCensus" in the HEAD)
*	[project @ 2005-07-25 14:12:48 by simonmar]	simonmar	2005-07-25	1	-2/+0
\| \| \| \| \| \|	Remove the ForeignObj# type, and all its PrimOps. The new efficient representation of ForeignPtr doesn't use ForeignObj# underneath, and there seems no need to keep it.
*	[project @ 2005-07-25 14:01:42 by simonmar]	simonmar	2005-07-25	1	-1/+1
\| \| \| \|	catching up with GC tweaks.
*	[project @ 2005-05-21 16:09:18 by panne]	panne	2005-05-21	1	-7/+7
\| \| \| \|	Warning police (format strings, unused variables)
*	[project @ 2005-04-22 09:32:39 by simonmar]	simonmar	2005-04-22	1	-9/+18
\| \| \| \| \| \| \|	SMP: the rest of the changes to support safe thunk entry & updates. I thought the compiler changes were independent, but I ended up breaking the HEAD, so I'll have to commit the rest. non-SMP compilation should not be affected.
*	[project @ 2005-04-05 12:19:54 by simonmar]	simonmar	2005-04-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some multi-processor hackery, including - Don't hang blocked threads off BLACKHOLEs any more, instead keep them all on a separate queue which is checked periodically for threads to wake up. This is good because (a) we don't have to worry about locking the closure in SMP mode when we want to block on it, and (b) it means the standard update code doesn't need to wake up any threads or check for a BLACKHOLE_BQ, simplifying the update code. The downside is that if there are lots of threads blocked on BLACKHOLEs, we might have to do a lot of repeated list traversal. We don't expect this to be common, though. conc023 goes slower with this change, but we expect most programs to benefit from the shorter update code. - Fixing up the Capability code to handle multiple capabilities (SMP mode), and related changes to get the SMP mode at least building.
*	[project @ 2005-04-05 09:30:11 by simonmar]	simonmar	2005-04-05	1	-7/+7
\| \| \| \|	printf format type fixup
*	[project @ 2005-03-27 13:41:13 by panne]	panne	2005-03-27	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	* Some preprocessors don't like the C99/C++ '//' comments after a directive, so use '/* /' instead. For consistency, a lot of '//' in the include files were converted, too. UnDOSified libraries/base/cbits/runProcess.c. * My favourite sport: Killed $Id$s.
*	[project @ 2005-02-10 13:01:52 by simonmar]	simonmar	2005-02-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GC changes: instead of threading old-generation mutable lists through objects in the heap, keep it in a separate flat array. This has some advantages: - the IND_OLDGEN object is now only 2 words, so the minimum size of a THUNK is now 2 words instead of 3. This saves some amount of allocation (about 2% on average according to my measurements), and is more friendly to the cache by squashing objects together more. - keeping the mutable list separate from the IND object will be necessary for our multiprocessor implementation. - removing the mut_link field makes the layout of some objects more uniform, leading to less complexity and special cases. - I also unified the two mutable lists (mut_once_list and mut_list) into a single mutable list, which lead to more simplifications in the GC.
*	[project @ 2004-11-10 02:13:12 by wolfgang]	wolfgang	2004-11-10	1	-1/+1
\| \| \| \| \| \| \|	64-bit fixes. Don't assume that sizeof(int) == sizeof(StgInt). This assumption creeped in in many places since 6.2.
*	[project @ 2004-09-12 11:27:10 by panne]	panne	2004-09-12	1	-1/+0
\| \| \| \| \|	Removed the annoying "Id" CVS keywords, they're a real PITA when it comes to merging...
*	[project @ 2004-09-03 15:28:18 by simonmar]	simonmar	2004-09-03	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Cleanup: all (well, most) messages from the RTS now go through the functions in RtsUtils: barf(), debugBelch() and errorBelch(). The latter two were previously called belch() and prog_belch() respectively. See the comments for the right usage of these message functions. One reason for doing this is so that we can avoid spurious uses of stdout/stderr by Haskell apps on platforms where we shouldn't be using them (eg. non-console apps on Windows).
*	[project @ 2004-08-13 13:04:50 by simonmar]	simonmar	2004-08-13	1	-1/+2
\| \| \| \|	Merge backend-hacking-branch onto HEAD. Yay!
*	[project @ 2004-05-11 18:36:10 by panne]	panne	2004-05-11	1	-8/+20
\| \| \| \|	Make the printing of samples really locale-independent
*	[project @ 2004-03-19 23:20:20 by panne]	panne	2004-03-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	Fixed the JOB line in heap profiles, it contained superfluous spaces and an evil line break. Merge to STABLE (This fix looks quite right, but again, I leave this to the Master of Releases (tm), because there might already be tools depending on the slightly wrong old format.)
*	[project @ 2003-11-12 17:49:05 by sof]	sof	2003-11-12	1	-6/+6
\| \| \| \| \| \|	Tweaks to have RTS (C) sources compile with MSVC. Apart from wibbles related to the handling of 'inline', changed Schedule.h:POP_RUN_QUEUE() not to use expression-level statement blocks.
*	[project @ 2003-10-24 14:45:38 by stolz]	stolz	2003-10-24	1	-1/+17
\| \| \| \| \| \| \| \|	Initialize hp_file for heap profiling (code stolen from Profiling.c). This bug might suggest some general reviewing of this code-path... Closes: SF bug [ 827485 ] Heap profile w/ debugging RTS dumps core http://sourceforge.net/tracker/index.php?func=detail&aid=827485&group_id=8032&atid=108032
*	[project @ 2003-09-23 15:38:35 by simonmar]	simonmar	2003-09-23	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	Add a BF_PINNED block flag, and attach it to blocks containing pinned objects (in addition to the usual BF_LARGE). In heapCensus, we now ignore blocks containing pinned objects, because they might contain gaps, and in any case it isn't clear that we want to include the whole block in a heap census, because much of it might well be dead. Ignoring it isn't right either, though, so this patch just fixes the crash and leaves a ToDo.
*	[project @ 2003-08-22 22:24:12 by sof]	sof	2003-08-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	setupRtsFlags(): don't overwrite argv[0] with its basename: - argv[] may not point to writeable memory - System.Environment.getProgName strips off the 'dirname' portion anyway. - Not possible to get at the untransformed argv[0] from Haskell code, should such a need arise. Uses of prog_argv[0] within the RTS has now been replaced with prog_name, which is the basename of prog_argv[0].
*	[project @ 2003-05-16 14:16:53 by simonmar]	simonmar	2003-05-16	1	-1/+2
\| \| \| \| \|	heapCensus should grok IND_OLDGEN objects, because compacting GC doesn't always eliminate them (perhaps it should).
*	[project @ 2003-03-24 14:46:53 by simonmar]	simonmar	2003-03-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix some bugs in compacting GC. Bug 1: When threading the fields of an AP or PAP, we were grabbing the info table of the function without unthreading it first. Bug 2: eval_thunk_selector() might accidentally find itself in to-space when going through indirections in a compacted generation. We must check for this case and bale out if necessary. Bug 3: This is somewhat more nasty. When we have an AP or PAP that points to a BCO, the layout info for the AP/PAP is in the BCO's instruction array, which is two objects deep from the AP/PAP itself. The trouble is, during compacting GC, we can only safely look one object deep from the current object, because pointers from objects any deeper might have been already updated to point to their final destinations. The solution is to put the arity and bitmap info for a BCO into the BCO object itself. This means BCOs become variable-length, which is a slight annoyance, but it also means that looking up the arity/bitmap is quicker. There is a slight reduction in complexity in the byte code generator due to not having to stuff the bitmap at the front of the instruction stream.
*	[project @ 2003-03-18 14:36:56 by simonmar]	simonmar	2003-03-18	1	-25/+24
\| \| \| \| \| \| \| \| \| \|	Fix a profiling crash on Windows. fprint_ccs used snprintf() to avoid overflowing a buffer; on mingw32 where snprintf() doesn't exist we were just using the straight sprintf(), which inevitably lead to a crash. Rewritten to use a homegrown non-overflowing string copying function - it actually looks nicer now, anwyay.
*	[project @ 2003-02-20 15:39:59 by simonmar]	simonmar	2003-02-20	1	-7/+13
\| \| \| \| \| \| \| \| \|	closureSatisfiesConstraints: check whether the retainer set is valid before attempting to match it against a constraint. It might not be valid if the object is an ex-weak-pointer which was finalized after the last GC. MERGE TO STABLE
*	[project @ 2003-01-23 16:39:30 by simonmar]	simonmar	2003-01-23	1	-2/+7
\| \| \| \|	Fix compilation with DEBUG
*	[project @ 2003-01-23 12:13:10 by simonmar]	simonmar	2003-01-23	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Add a new flag, -xt, which enables inclusion of TSOs in a heap profile. - Include large objects in heap profiles (except TSOs unless the -xt flag is given). - In order to make this work, I had to set the bd->free field of the block descriptor for a large object to the correct value. Previously, it pointed to the start of the block (i.e. the same as bd->start). I hope this doesn't have any other consequences; it looks more correct this way in any case.
*	[project @ 2002-12-11 15:36:20 by simonmar]	simonmar	2002-12-11	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge the eval-apply-branch on to the HEAD ------------------------------------------ This is a change to GHC's evaluation model in order to ultimately make GHC more portable and to reduce complexity in some areas. At some point we'll update the commentary to describe the new state of the RTS. Pending that, the highlights of this change are: - No more Su. The Su register is gone, update frames are one word smaller. - Slow-entry points and arg checks are gone. Unknown function calls are handled by automatically-generated RTS entry points (AutoApply.hc, generated by the program in utils/genapply). - The stack layout is stricter: there are no "pending arguments" on the stack any more, the stack is always strictly a sequence of stack frames. This means that there's no need for LOOKS_LIKE_GHC_INFO() or LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know how to find the boundary between the text and data segments (BIG WIN!). - A couple of nasty hacks in the mangler caused by the neet to identify closure ptrs vs. info tables have gone away. - Info tables are a bit more complicated. See InfoTables.h for the details. - As a side effect, GHCi can now deal with polymorphic seq. Some bugs in GHCi which affected primitives and unboxed tuples are now fixed. - Binary sizes are reduced by about 7% on x86. Performance is roughly similar, some programs get faster while some get slower. I've seen GHCi perform worse on some examples, but haven't investigated further yet (GHCi performance should be about the same or better in theory). - Internally the code generator is rather better organised. I've moved info-table generation from the NCG into the main codeGen where it is shared with the C back-end; info tables are now emitted as arrays of words in both back-ends. The NCG is one step closer to being able to support profiling. This has all been fairly thoroughly tested, but no doubt I've messed up the commit in some way.
*	[project @ 2002-11-01 11:05:46 by simonmar]	simonmar	2002-11-01	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the heapCensus crash. It turned out that after a GC, the small_alloc_list might be non-empty if a new finalizer thread had been started. The last block on small_alloc_list doesn't have the free pointer set correctly (as a small optimisation, we don't normally set the free pointer after each allocation, only when the block is full). The result was that the free pointer contains the wrong value, and the heap census traverses garbage. The fix is to set the free pointer correctly before traversing small_alloc_list. The bug doesn't show up when DEBUG is on, because extra DEBUG checks cause the free pointer to be initialised to a sensible(-ish) value. Hence my difficulty in reproducing the bug. To reproduce: compile ghc-regress/lib/should_run/memo002 with profiling and run it with a sufficiently small sample interval (-i0.02 did it for me). Thanks to the kind folks at ARM for helping out with the debugging of this one. MERGE TO STABLE
*	[project @ 2002-08-16 13:29:05 by simonmar]	simonmar	2002-08-16	1	-3/+3
\| \| \| \| \|	Global and common variable sweep: staticize many variables that don't need to be globally visible.
*	[project @ 2002-07-18 09:12:34 by simonmar]	simonmar	2002-07-18	1	-1/+3
\| \| \| \|	#include wibbles
*	[project @ 2002-05-07 04:58:15 by sof]	sof	2002-05-07	1	-1/+11
\| \| \| \|	fprintf_ccs(): use snprintf() only if available
*	[project @ 2001-12-12 15:59:33 by simonmar]	simonmar	2001-12-12	1	-2/+4
\| \| \| \| \|	Include the CCS ID in the heap profile, so you can find the full CCS description in <foo>.prof or the XML profile output.
*	[project @ 2001-12-12 15:01:25 by simonmar]	simonmar	2001-12-12	1	-3/+3
\| \| \| \|	Fix a couple of assertions.
*	[project @ 2001-12-12 14:31:42 by simonmar]	simonmar	2001-12-12	1	-26/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Add a new type of restriction: -hC, which restricts to closures whose CCS contains the specified CCs anywhere (not just at the top). - Complain if the user tries to request both retainer and biographical profiling. We don't support both simultaneously, because they use the same header word in the closure. - Allow for the fact that the heap might contain some closures which don't have a valid retainer set during the heap census. The only known closures of this kind so far are DEAD_WEAK closures. - Some cruft-removal and renaming of functions to follow conventions.
*	[project @ 2001-11-29 16:38:13 by simonmar]	simonmar	2001-11-29	1	-10/+2
\| \| \| \| \| \| \| \| \|	Fix for heap profiling when selecting by lag/drag/void/use: I forgot to make the final LdvCensusKillAll() call just before outputing the census info. Having tested this stuff on the compiler itself, I now declare it to be working (famous last words!).
*	[project @ 2001-11-28 17:45:13 by simonmar]	simonmar	2001-11-28	1	-17/+18
\| \| \| \|	oops, I broke standard -hb profiles. Unbreak them again.
*	[project @ 2001-11-28 15:43:23 by simonmar]	simonmar	2001-11-28	1	-11/+22
\| \| \| \| \| \|	Make it work in a DEBUG world again (when DEBUG is on we have ancient support for doing a heap profile based on info-tables - it is still there, but I haven't tested it).
*	[project @ 2001-11-28 15:01:02 by simonmar]	simonmar	2001-11-28	1	-176/+384
\| \| \| \| \| \| \| \|	As promised: allow selecting by lag, drag, void or use. Currently this involves keeping around all the information about previous censuses, so memory use could get quite large. If this turns out to be a problem, then we have a plan to throw away some of the info after each census.
*	[project @ 2001-11-27 15:30:06 by simonmar]	simonmar	2001-11-27	1	-1/+7
\| \| \| \|	Don't forget about the "MANY" retainer set when dumping a retainer profile.
*	[project @ 2001-11-26 16:54:21 by simonmar]	simonmar	2001-11-26	1	-314/+353
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Profiling cleanup. This commit eliminates some duplication in the various heap profiling subsystems, and generally centralises much of the machinery. The key concept is the separation of a heap census (which is now done in one place only instead of three) from the calculation of retainer sets. Previously the retainer profiling code also did a heap census on the fly, and lag-drag-void profiling had its own census machinery. Value-adds: - you can now restrict a heap profile to certain retainer sets, but still display by cost centre (or type, or closure or whatever). - I've added an option to restrict the maximum retainer set size (+RTS -R<size>, defaulting to 8). - I've cleaned up the heap profiling options at the request of Simon PJ. See the help text for details. The new scheme is backwards compatible with the old. - I've removed some odd bits of LDV or retainer profiling-specific code from various parts of the system. - the time taken doing heap censuses (and retainer set calculation) is now accurately reported by the RTS when you say +RTS -Sstderr. Still to come: - restricting a profile to a particular biography (lag/drag/void/use). This requires keeping old heap censuses around, but the infrastructure is now in place to do this.
*	[project @ 2001-11-22 14:25:11 by simonmar]	simonmar	2001-11-22	1	-213/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Retainer Profiling / Lag-drag-void profiling. This is mostly work by Sungwoo Park, who spent a summer internship at MSR Cambridge this year implementing these two types of heap profiling in GHC. Relative to Sungwoo's original work, I've made some improvements to the code: - it's now possible to apply constraints to retainer and LDV profiles in the same way as we do for other types of heap profile (eg. +RTS -hc{foo,bar} -hR -RTS gives you a retainer profiling considering only closures with cost centres 'foo' and 'bar'). - the heap-profile timer implementation is cleaned up. - heap profiling no longer has to be run in a two-space heap. - general cleanup of the code and application of the SDM C coding style guidelines. Profiling will be a little slower and require more space than before, mainly because closures have an extra header word to support either retainer profiling or LDV profiling (you can't do both at the same time). We've used the new profiling tools on GHC itself, with moderate success. Fixes for some space leaks in GHC to follow...