delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reorganisation of the source tree	Simon Marlow	2006-04-07	1	-1137/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
*	fix profiling on Win32	Simon Marlow	2006-03-30	1	-0/+5
\| \| \| \| \|	The recent patch to free memory in hs_exit() on Win32 unfortunately broke profiling, because it freed the memory slightly too early.
*	Free all memory when shutting down. XXX not implemented for Posix.	lennart.augustsson@credit-suisse.com	2006-03-02	1	-0/+1
\|
*	oops, initialize atomic_modify_mutvar_mutex	Simon Marlow	2006-02-22	1	-0/+1
\|
*	fix a deadlock in atomicModifyMutVar#	Simon Marlow	2006-02-21	1	-1/+6
\| \| \| \| \| \| \| \| \|	atomicModifyMutVar# was re-using the storage manager mutex (sm_mutex) to get its atomicity guarantee in SMP mode. But recently the addition of a call to dirty_MUT_VAR() to implement the read barrier lead to a rare deadlock case, because dirty_MUT_VAR() very occasionally needs to allocate a new block to chain on the mutable list, which requires sm_mutex.
*	warning fix	Simon Marlow	2006-02-21	1	-1/+1
\|
*	fix for dirty_MUT_VAR: don't try to recordMutableCap in gen 0	Simon Marlow	2006-02-10	1	-1/+3
\|
*	Merge the smp and threaded RTS ways	Simon Marlow	2006-02-09	1	-14/+14
\| \| \| \| \| \| \|	Now, the threaded RTS also includes SMP support. The -smp flag is a synonym for -threaded. The performance implications of this are small to negligible, and it results in a code cleanup and reduces the number of combinations we have to test.
*	change dirty_MUT_VAR() to use recordMutableCap()	Simon Marlow	2006-02-09	1	-2/+3
\| \| \| \|	rather than recordMutableGen(), the former works better in SMP
*	[project @ 2006-01-17 16:13:18 by simonmar]	simonmar	2006-01-17	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the GC behaviour of IORefs (see Ticket #650). This is a small change to the way IORefs interact with the GC, which should improve GC performance for programs with plenty of IORefs. Previously we had a single closure type for mutable variables, MUT_VAR. Mutable variables were always on the mutable list in older generations, and always traversed on every GC. Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY. The latter is on the mutable list, but the former is not. (NB. this differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which are on the mutable list). writeMutVar# now implements a write barrier, by calling dirty_MUT_VAR() in the runtime, that does the necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding to the mutable list if necessary. This results in some pretty dramatic speedups for GHC itself. I've just measureed a 30% overall speedup compiling a 31-module program (anna) with the default heap settings :-D
*	[project @ 2005-11-04 12:02:04 by simonmar]	simonmar	2005-11-04	1	-1/+1
\| \| \| \|	Win32: Use CriticalSections instead of Mutexes, they are much faster.
*	[project @ 2005-10-27 15:26:06 by simonmar]	simonmar	2005-10-27	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Very simple work-sharing amongst Capabilities: whenever a Capability detects that it has more than 1 thread in its run queue, it runs around looking for empty Capabilities, and shares the threads on its run queue equally with the free Capabilities it finds. - unlock the garbage collector's mutable lists, by having private mutable lists per capability (and per generation). The private mutable lists are moved onto the main mutable lists at each GC. This pulls the old-generation update code out of the storage manager mutex, which is one of the last remaining causes of (alleged) contention. - Fix some problems with synchronising when a GC is required. We should synchronise quicker now.
*	[project @ 2005-10-25 15:27:22 by simonmar]	simonmar	2005-10-25	1	-1/+1
\| \| \| \|	Fix bug in allocateLocal, we weren't assigning bd->step properly
*	[project @ 2005-10-21 14:02:17 by simonmar]	simonmar	2005-10-21	1	-13/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Big re-hash of the threaded/SMP runtime This is a significant reworking of the threaded and SMP parts of the runtime. There are two overall goals here: - To push down the scheduler lock, reducing contention and allowing more parts of the system to run without locks. In particular, the scheduler does not require a lock any more in the common case. - To improve affinity, so that running Haskell threads stick to the same OS threads as much as possible. At this point we have the basic structure working, but there are some pieces missing. I believe it's reasonably stable - the important parts of the testsuite pass in all the (normal,threaded,SMP) ways. In more detail: - Each capability now has a run queue, instead of one global run queue. The Capability and Task APIs have been completely rewritten; see Capability.h and Task.h for the details. - Each capability has its own pool of worker Tasks. Hence, Haskell threads on a Capability's run queue will run on the same worker Task(s). As long as the OS is doing something reasonable, this should mean they usually stick to the same CPU. Another way to look at this is that we're assuming each Capability is associated with a fixed CPU. - What used to be StgMainThread is now part of the Task structure. Every OS thread in the runtime has an associated Task, and it can ask for its current Task at any time with myTask(). - removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead (it is now defined for SMP too). - The RtsAPI has had to change; we must explicitly pass a Capability around now. The previous interface assumed some global state. SchedAPI has also changed a lot. - The OSThreads API now supports thread-local storage, used to implement myTask(), although it could be done more efficiently using gcc's __thread extension when available. - I've moved some POSIX-specific stuff into the posix subdirectory, moving in the direction of separating out platform-specific implementations. - lots of lock-debugging and assertions in the runtime. In particular, when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is also an ASSERT_LOCK_HELD() call. What's missing so far: - I have almost certainly broken the Win32 build, will fix soon. - any kind of thread migration or load balancing. This is high up the agenda, though. - various performance tweaks to do - throwTo and forkProcess still do not work in SMP mode
*	[project @ 2005-10-12 12:56:30 by simonmar]	simonmar	2005-10-12	1	-0/+4
\| \| \| \|	Fix assertion failure in memInventory() with SMP
*	[project @ 2005-08-02 14:59:39 by simonmar]	simonmar	2005-08-02	1	-5/+1
\| \| \| \|	small tidyup for memInventory
*	[project @ 2005-07-25 13:59:09 by simonmar]	simonmar	2005-07-25	1	-6/+12
\| \| \| \| \|	Tweaks to the GC to improve perforrmance. Might be as much as 10% on some programs.
*	[project @ 2005-05-12 11:36:50 by stolz]	stolz	2005-05-12	1	-0/+2
\| \| \| \|	C89ify recent change
*	[project @ 2005-05-12 10:03:42 by simonmar]	simonmar	2005-05-12	1	-0/+20
\| \| \| \|	Fix more bugginess in allocateLocal().
*	[project @ 2005-05-11 12:44:26 by simonmar]	simonmar	2005-05-11	1	-1/+1
\| \| \| \| \|	allocateLocal(): bump the block count on the step, not the global alloc_blocks count.
*	[project @ 2005-05-11 09:09:03 by simonmar]	simonmar	2005-05-11	1	-4/+2
\| \| \| \|	Fix double-linking bug in new allocateLocal(), and fix one warning
*	[project @ 2005-05-10 13:25:41 by simonmar]	simonmar	2005-05-10	1	-43/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two SMP-related changes: - New storage manager interface: bdescr allocateLocal(StgRegTable reg, nat words) which allocates from the current thread's nursery (being careful not to clash with the heap pointer). It can do this without taking any locks; the lock only has to be taken if a block needs to be allocated. allocateLocal() is now used instead of allocate() in a few PrimOps. This removes locks from most Integer operations, cutting down the overhead for SMP a bit more. To make this work, we have to be able to grab the current thread's Capability out of thin air (i.e. when called from GMP), so the Capability subsystem needs to keep a hash from thread IDs to Capabilities. - Small MVar optimisation: instead of taking the global storage-manager lock, do our own locking of MVars with a bit of inline assembly (x86 only for now).
*	[project @ 2005-04-28 15:44:16 by simonmar]	simonmar	2005-04-28	1	-1/+1
\| \| \| \|	calcAllocated: fix small mis-calculation in the SMP case
*	[project @ 2005-04-27 14:37:26 by simonmar]	simonmar	2005-04-27	1	-6/+12
\| \| \| \| \| \|	When using -H<size> in SMP mode, divide the total nursery size amongst the various nurseries. -H<size> now does something reasonable with SMP.
*	[project @ 2005-04-27 14:25:17 by simonmar]	simonmar	2005-04-27	1	-1/+0
\| \| \| \| \| \| \|	Hold the sm_mutex around access to the mutable list. The SMP RTS now seems quite stable, I've run my simple test program with 64 threads without crashes.
*	[project @ 2005-04-22 13:12:41 by simonmar]	simonmar	2005-04-22	1	-4/+4
\| \| \| \|	checkSanity: fix bug in nursery checking
*	[project @ 2005-04-22 12:28:00 by simonmar]	simonmar	2005-04-22	1	-12/+1
\| \| \| \| \| \| \| \| \| \| \| \|	- Now that labels are always prefixed with '&' in .hc code, we have to fix some sloppiness in the RTS .cmm code. Fortunately it's not too painful. - SMP: acquire/release the storage manager lock around atomicModifyMutVar#. This is a hack: atomicModifyMutVar# isn't atomic under SMP otherwise, but the SM lock is a large sledgehammer. I think I'll apply the sledgehammer to the MVar primitives too, for the time being.
*	[project @ 2005-04-12 09:04:23 by simonmar]	simonmar	2005-04-12	1	-152/+230
\| \| \| \| \| \| \| \| \|	Per-task nurseries for SMP. This was kind-of implemented before, but it's much cleaner now. There is now one step per capability, so we have somewhere to hang the block count. So for SMP, there are simply multiple instances of generation 0 step 0. The rNursery entry in the register table now points to the step rather than the head block of the nurersy.
*	[project @ 2005-04-10 21:44:10 by simonmar]	simonmar	2005-04-10	1	-0/+3
\| \| \| \|	Fix for Storage.c assertion failure
*	[project @ 2005-04-07 15:53:01 by simonmar]	simonmar	2005-04-07	1	-14/+7
\| \| \| \|	resetNurseries: tidy up
*	[project @ 2005-04-05 12:19:54 by simonmar]	simonmar	2005-04-05	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some multi-processor hackery, including - Don't hang blocked threads off BLACKHOLEs any more, instead keep them all on a separate queue which is checked periodically for threads to wake up. This is good because (a) we don't have to worry about locking the closure in SMP mode when we want to block on it, and (b) it means the standard update code doesn't need to wake up any threads or check for a BLACKHOLE_BQ, simplifying the update code. The downside is that if there are lots of threads blocked on BLACKHOLEs, we might have to do a lot of repeated list traversal. We don't expect this to be common, though. conc023 goes slower with this change, but we expect most programs to benefit from the shorter update code. - Fixing up the Capability code to handle multiple capabilities (SMP mode), and related changes to get the SMP mode at least building.
*	[project @ 2005-03-09 08:51:31 by wolfgang]	wolfgang	2005-03-09	1	-13/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Retain all CAFs when dynamic Haskell libraries are used from GHCi. The Linker usually replaces references to newCAF with references to newDynCAF, but the system dynamic linker won't do that for us. Also, the situation is slightly different - we never want CAFs from dylibs to be reverted, because the dylibs might be used both by the interpreted program and by GHCi itself. So instead of just caf_list, there's now both caf_list and revertible_caf_list. newDynCAF adds a CAF to revertible_caf_list, and newCAF either adds the CAF to caf_list or to the mutable list, depending on whether we are in GHCi. This hack is only active when Linker.c has loaded libHSbase_dyn.[so\|dylib], but for now, it applies to all CAFs, not just dynamically-linked ones. If that is worth fixing, we could do that by checking whether the the CAF closure or it's info pointer is in the main executable's address range. MERGE TO STABLE
*	[project @ 2005-02-10 13:01:52 by simonmar]	simonmar	2005-02-10	1	-22/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GC changes: instead of threading old-generation mutable lists through objects in the heap, keep it in a separate flat array. This has some advantages: - the IND_OLDGEN object is now only 2 words, so the minimum size of a THUNK is now 2 words instead of 3. This saves some amount of allocation (about 2% on average according to my measurements), and is more friendly to the cache by squashing objects together more. - keeping the mutable list separate from the IND object will be necessary for our multiprocessor implementation. - removing the mut_link field makes the layout of some objects more uniform, leading to less complexity and special cases. - I also unified the two mutable lists (mut_once_list and mut_list) into a single mutable list, which lead to more simplifications in the GC.
*	[project @ 2004-09-03 15:28:18 by simonmar]	simonmar	2004-09-03	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Cleanup: all (well, most) messages from the RTS now go through the functions in RtsUtils: barf(), debugBelch() and errorBelch(). The latter two were previously called belch() and prog_belch() respectively. See the comments for the right usage of these message functions. One reason for doing this is so that we can avoid spurious uses of stdout/stderr by Haskell apps on platforms where we shouldn't be using them (eg. non-console apps on Windows).
*	[project @ 2004-08-13 13:04:50 by simonmar]	simonmar	2004-08-13	1	-3/+2
\| \| \| \|	Merge backend-hacking-branch onto HEAD. Yay!
*	[project @ 2004-07-21 10:47:28 by simonmar]	simonmar	2004-07-21	1	-2/+2
\| \| \| \| \| \| \|	Make total_allocated be an ullong, to accommodate programs that do a lot of allocation. MERGE TO STABLE
*	[project @ 2003-10-24 09:56:45 by simonmar]	simonmar	2003-10-24	1	-1/+2
\| \| \| \| \| \| \|	When allocating a large object in gen 0, update the n_large_blocks count. I think this is just an accounting issue, and doesn't actually cause a space leak, but it does result in an assertion failure when running with sanity checking on.
*	[project @ 2003-10-24 09:52:51 by simonmar]	simonmar	2003-10-24	1	-6/+1
\| \| \| \|	Remove a comment that appears to contradict the code.
*	[project @ 2003-09-23 15:38:35 by simonmar]	simonmar	2003-09-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Add a BF_PINNED block flag, and attach it to blocks containing pinned objects (in addition to the usual BF_LARGE). In heapCensus, we now ignore blocks containing pinned objects, because they might contain gaps, and in any case it isn't clear that we want to include the whole block in a heap census, because much of it might well be dead. Ignoring it isn't right either, though, so this patch just fixes the crash and leaves a ToDo.
*	[project @ 2003-03-26 18:59:34 by sof]	sof	2003-03-26	1	-20/+17
\| \| \| \|	allocatePinned(): move large-object check to the front
*	[project @ 2003-03-26 17:33:49 by sof]	sof	2003-03-26	1	-13/+9
\| \| \| \|	initStorage(): sm_mutex not correctly initialised (guess no one's done an SMP build for the past 12 months :)
*	[project @ 2003-03-21 16:18:37 by sof]	sof	2003-03-21	1	-3/+2
\| \| \| \| \| \|	Friday morning code-wibbling: - made RetainerProfile.c:firstStack a 'static' - added RetainerProfile.c:retainerStackBlocks()
*	[project @ 2003-02-01 09:10:16 by mthomas]	mthomas	2003-02-01	1	-1/+2
\| \| \| \|	Initialize stp->n_to_blocks as 0. Add function for MinGW32 in Signals.c.
*	[project @ 2003-01-29 10:28:56 by simonmar]	simonmar	2003-01-29	1	-5/+10
\| \| \| \| \| \| \| \|	Multi-init protection. Multiple inits now don't crash, but they still don't do anything sensible because the finalizers have been run during the first hs_exit().
*	[project @ 2003-01-23 12:13:10 by simonmar]	simonmar	2003-01-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Add a new flag, -xt, which enables inclusion of TSOs in a heap profile. - Include large objects in heap profiles (except TSOs unless the -xt flag is given). - In order to make this work, I had to set the bd->free field of the block descriptor for a large object to the correct value. Previously, it pointed to the start of the block (i.e. the same as bd->start). I hope this doesn't have any other consequences; it looks more correct this way in any case.
*	[project @ 2002-12-19 14:33:22 by simonmar]	simonmar	2002-12-19	1	-10/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Terrible hack to restore CAF handling behaviour in GHCi (it's currently broken). The story used to be this: in newCAF(), if the CAF is in dynamically loaded code, then we save the CAF's info ptr in a spare slot in the closure, and add the CAF to the caf_list. The GC will retain everything on the caf_list. At any point the CAFs can all be reverted by replacing their info pointers from the saved copies. CAFs need to be retained for GHCi because they might be required in a future execution; an optimisation would be to avoid retaining the CAFs if we're in "revert mode"; i.e. the CAFs are all going to be reverted after execution anyway. Also, this only applies to CAFs in compiled code; CAFs in interpreted code are currently always retained. Anyway, the old story is harder now that I removed the code that checks whether a pointer is dynamically loaded or not (:-)). Rather than re-instate that code, I created a new version of newCAF (newDynCAF), and arranged that the dynamic linker redirects any references to newCAF to point to newDynCAF instead. The result is more efficient than before, and takes less code.
*	[project @ 2002-12-13 19:17:02 by wolfgang]	wolfgang	2002-12-13	1	-21/+1
\| \| \| \|	Remove Mac OS X-specific code for determining memory layout (no longer needed).
*	[project @ 2002-12-11 15:36:20 by simonmar]	simonmar	2002-12-11	1	-20/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge the eval-apply-branch on to the HEAD ------------------------------------------ This is a change to GHC's evaluation model in order to ultimately make GHC more portable and to reduce complexity in some areas. At some point we'll update the commentary to describe the new state of the RTS. Pending that, the highlights of this change are: - No more Su. The Su register is gone, update frames are one word smaller. - Slow-entry points and arg checks are gone. Unknown function calls are handled by automatically-generated RTS entry points (AutoApply.hc, generated by the program in utils/genapply). - The stack layout is stricter: there are no "pending arguments" on the stack any more, the stack is always strictly a sequence of stack frames. This means that there's no need for LOOKS_LIKE_GHC_INFO() or LOOKS_LIKE_STATIC_CLOSURE() any more, and GHC doesn't need to know how to find the boundary between the text and data segments (BIG WIN!). - A couple of nasty hacks in the mangler caused by the neet to identify closure ptrs vs. info tables have gone away. - Info tables are a bit more complicated. See InfoTables.h for the details. - As a side effect, GHCi can now deal with polymorphic seq. Some bugs in GHCi which affected primitives and unboxed tuples are now fixed. - Binary sizes are reduced by about 7% on x86. Performance is roughly similar, some programs get faster while some get slower. I've seen GHCi perform worse on some examples, but haven't investigated further yet (GHCi performance should be about the same or better in theory). - Internally the code generator is rather better organised. I've moved info-table generation from the NCG into the main codeGen where it is shared with the C back-end; info tables are now emitted as arrays of words in both back-ends. The NCG is one step closer to being able to support profiling. This has all been fairly thoroughly tested, but no doubt I've messed up the commit in some way.
*	[project @ 2002-11-01 11:05:46 by simonmar]	simonmar	2002-11-01	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the heapCensus crash. It turned out that after a GC, the small_alloc_list might be non-empty if a new finalizer thread had been started. The last block on small_alloc_list doesn't have the free pointer set correctly (as a small optimisation, we don't normally set the free pointer after each allocation, only when the block is full). The result was that the free pointer contains the wrong value, and the heap census traverses garbage. The fix is to set the free pointer correctly before traversing small_alloc_list. The bug doesn't show up when DEBUG is on, because extra DEBUG checks cause the free pointer to be initialised to a sensible(-ish) value. Hence my difficulty in reproducing the bug. To reproduce: compile ghc-regress/lib/should_run/memo002 with profiling and run it with a sufficiently small sample interval (-i0.02 did it for me). Thanks to the kind folks at ARM for helping out with the debugging of this one. MERGE TO STABLE
*	[project @ 2002-10-15 11:02:32 by simonmar]	simonmar	2002-10-15	1	-2/+10
\| \| \| \|	Slight fix to the allocated memory calculation