summaryrefslogtreecommitdiff
path: root/ghc/includes
Commit message (Collapse)AuthorAgeFilesLines
* Reorganisation of the source treeSimon Marlow2006-04-0749-10937/+0
| | | | | | | | | | | | | | | Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.
* add freeStorage() prototypeSimon Marlow2006-04-051-0/+1
|
* Add a new primitive forkOn#, for forking a thread on a specific CapabilitySimon Marlow2006-03-272-1/+8
| | | | | | | | | | | | | | | | | | | | This gives some control over affinity, while we figure out the best way to automatically schedule threads to make best use of the available parallelism. In addition to the primitive, there is also: GHC.Conc.forkOnIO :: Int -> IO () -> IO ThreadId where 'forkOnIO i m' creates a thread on Capability (i `rem` N), where N is the number of available Capabilities set by +RTS -N. Threads forked by forkOnIO do not automatically migrate when there are free Capabilities, like normal threads do. Still, if you're using forkOnIO exclusively, it's a good idea to do +RTS -qm to disable work pushing anyway (work pushing takes too much time when the run queues are large, this is something we need to fix).
* eliminate a warningSimon Marlow2006-03-271-1/+1
|
* Add some more flexibility to the multiproc schedulerSimon Marlow2006-03-242-26/+31
| | | | | | | | | | | | | There are two new options in the -threaded RTS: -qm Don't automatically migrate threads between CPUs -qw Migrate a thread to the current CPU when it is woken up previously both of these were effectively off, i.e. threads were migrated between CPUs willy-milly, and threads were always migrated to the current CPU when woken up. This is the first step in tweaking the scheduling for more effective work balancing, there will no doubt be more to come.
* mkDerivedConstants.c depends on ghcplatform.hDuncan Coutts2006-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | I think this missing dep is what broke my parallel build I used make -j2 with ghc-6.4.2.20060323 and got: ------------------------------------------------------------------------ ==fptools== make boot -wr --jobserver-fds=3,11 -j; in /var/tmp/portage/ghc-6.4.2_pre20060323/work/ghc-6.4.2.20060323/ghc/includes ------------------------------------------------------------------------ Creating ghcplatform.h... Done. gcc -O -O2 -march=k8 -pipe -Wa,--noexecstack -c mkDerivedConstants.c -o mkDerivedConstants.o In file included from ghcconfig.h:5, from Stg.h:42, from Rts.h:19, from mkDerivedConstants.c:20: ghcplatform.h:1:1: unterminated #ifndef Done. With this patch applied I can no longer repoduce this build bug. So I think this patch should be applied to the cvs ghc-6-4-branch too.
* ENTER(): avoid re-reading the info pointer of the closure when entering itSimon Marlow2006-03-141-4/+6
| | | | | This fixes another instance of a subtle SMP bug (see patch "really nasty bug in SMP").
* Make it a fatal error to try to enter a PAPSimon Marlow2006-03-141-0/+1
| | | | | | | This is just an assertion, in effect: we should never enter a PAP, but for convenience we previously attached the PAP apply code to the PAP info table. The problem with this was that it makes it harder to track down bugs that result in entering a PAP...
* Use Darwin-compatible x86 assembly syntax in SMP.h (lock/cmpxchg with a slash)wolfgang.thaller@gmx.net2006-03-061-1/+1
|
* A better x86_64 register mapping, with more argument registers.Simon Marlow2006-02-281-32/+31
| | | | | | | Now that we can handle using C argument registers as global registers, extend the x86_64 register mapping. We now have 5 integer argument registers, 4 float, and 2 double (all caller-saves). This results in a reasonable speedup on x86_64.
* pass arguments to unknown function calls in registersSimon Marlow2006-02-281-2/+16
| | | | | | | | | | | | | | | We now have more stg_ap entry points: stg_ap_*_fast, which take arguments in registers according to the platform calling convention. This is faster if the function being called is evaluated and has the right arity, which is the common case (see the eval/apply paper for measurements). We still need the stg_ap_*_info entry points for stack-based application, such as an overflows when a function is applied to too many argumnets. The stg_ap_*_fast functions actually just check for an evaluated function, and if they don't find one, push the args on the stack and invoke stg_ap_*_info. (this might be slightly slower in some cases, but not the common case).
* support LOCK_DEBUG for Windowskr.angelov@gmail.com2006-02-221-0/+15
|
* bugfix for LDV profiling on 64-bit platformsSimon Marlow2006-02-231-1/+1
| | | | | There was an integer overflow in the definition of LDV_RECORD_CREATE when StgWord is 64 bits.
* fix for ASSIGN_BaseReg() in the unregisterised waySimon Marlow2006-02-221-1/+1
|
* fix a deadlock in atomicModifyMutVar#Simon Marlow2006-02-211-0/+1
| | | | | | | | | atomicModifyMutVar# was re-using the storage manager mutex (sm_mutex) to get its atomicity guarantee in SMP mode. But recently the addition of a call to dirty_MUT_VAR() to implement the read barrier lead to a rare deadlock case, because dirty_MUT_VAR() very occasionally needs to allocate a new block to chain on the mutable list, which requires sm_mutex.
* SMP support (xchg(), cas() and mb()) for PowerPCwolfgang.thaller@gmx.net2006-02-121-2/+33
|
* Merge the smp and threaded RTS waysSimon Marlow2006-02-098-39/+32
| | | | | | | Now, the threaded RTS also includes SMP support. The -smp flag is a synonym for -threaded. The performance implications of this are small to negligible, and it results in a code cleanup and reduces the number of combinations we have to test.
* change dirty_MUT_VAR() to use recordMutableCap()Simon Marlow2006-02-092-2/+2
| | | | rather than recordMutableGen(), the former works better in SMP
* fix for the unregisterised waySimon Marlow2006-02-091-0/+10
| | | | | | We always assign to BaseReg on return from resumeThread(), but in cases where BaseReg is not an lvalue (eg. unreg) we need to disable this assigment. See comments for more details.
* fix a bug in closure_sizeW_()Simon Marlow2006-02-081-0/+3
|
* make the smp way RTS-only, normal libraries now work with -smpSimon Marlow2006-02-086-100/+111
| | | | | | | | | | | | | | We had to bite the bullet here and add an extra word to every thunk, to enable running ordinary libraries on SMP. Otherwise, we would have needed to ship an extra set of libraries with GHC 6.6 in addition to the two sets we already ship (normal + profiled), and all Cabal packages would have to be compiled for SMP too. We decided it best just to take the hit now, making SMP easily accessible to everyone in GHC 6.6. Incedentally, although this increases allocation by around 12% on average, the performance hit is around 5%, and much less if your inner loop doesn't use any laziness.
* implement clean/dirty TSOsSimon Marlow2006-01-231-2/+19
| | | | | | | | Along the lines of the clean/dirty arrays and IORefs implemented recently, now threads are marked clean or dirty depending on whether they need to be scanned during a minor GC or not. This should speed up GC when there are lots of threads, especially if most of them are idle.
* remove old commentSimon Marlow2006-01-231-4/+0
|
* [project @ 2006-01-18 10:31:50 by simonmar]simonmar2006-01-181-0/+24
| | | | | | | - fix a mixup in Capability.c regarding signals: signals_pending() is not used in THREADED_RTS - some cleanups and warning removal while I'm here
* [project @ 2006-01-17 16:13:18 by simonmar]simonmar2006-01-174-20/+33
| | | | | | | | | | | | | | | | | | | | | | | Improve the GC behaviour of IORefs (see Ticket #650). This is a small change to the way IORefs interact with the GC, which should improve GC performance for programs with plenty of IORefs. Previously we had a single closure type for mutable variables, MUT_VAR. Mutable variables were *always* on the mutable list in older generations, and always traversed on every GC. Now, we have two closure types: MUT_VAR_CLEAN and MUT_VAR_DIRTY. The latter is on the mutable list, but the former is not. (NB. this differs from MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY, both of which are on the mutable list). writeMutVar# now implements a write barrier, by calling dirty_MUT_VAR() in the runtime, that does the necessary modification of MUT_VAR_CLEAN into MUT_VAR_DIRY, and adding to the mutable list if necessary. This results in some pretty dramatic speedups for GHC itself. I've just measureed a 30% overall speedup compiling a 31-module program (anna) with the default heap settings :-D
* [project @ 2006-01-17 16:03:47 by simonmar]simonmar2006-01-172-23/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Improve the GC behaviour of IOArrays/STArrays See Ticket #650 This is a small change to the way mutable arrays interact with the GC, that can have a dramatic effect on performance, and make tricks with unsafeThaw/unsafeFreeze redundant. Data.HashTable should be faster now (I haven't measured it yet). We now have two mutable array closure types, MUT_ARR_PTRS_CLEAN and MUT_ARR_PTRS_DIRTY. Both are on the mutable list if the array is in an old generation. writeArray# sets the type to MUT_ARR_PTRS_DIRTY. The garbage collector can set the type to MUT_ARR_PTRS_CLEAN if it finds that no element of the array points into a younger generation (discovering this required a small addition to evacuate(), but rough tests indicate that it doesn't measurably affect performance). NOTE: none of this affects unboxed arrays (IOUArray/STUArray), only boxed arrays (IOArray/STArray). We could go further and extend the DIRTY bit to be per-block rather than for the whole array, but for now this is an easy improvement.
* [project @ 2006-01-16 16:38:24 by simonmar]simonmar2006-01-161-0/+8
| | | | | Default signal handlers weren't being installed; amazing that this has been broken ever since I rearranged the signal handling code.
* [project @ 2006-01-11 16:58:53 by simonmar]simonmar2006-01-111-1/+1
| | | | | MAYBE_GC: we should check alloc_blocks in addition to CurrentNursery, since some allocateLocal calls don't allocate from the nursery.
* [project @ 2006-01-04 12:51:59 by simonmar]simonmar2006-01-041-1/+0
| | | | remove duplicate definition
* [project @ 2006-01-03 12:53:40 by simonmar]simonmar2006-01-031-10/+16
| | | | | | for TSO fields, define a Cmm macro TSO_OFFSET_xxx to get the actual offset including the header and variable parts (we were misusing the headerless OFFSET_xxx macros in a couple of places).
* [project @ 2005-12-02 14:09:21 by simonmar]simonmar2005-12-021-12/+2
| | | | | revert rev. 1.22 again, just in case this is the cause of the segfaults reported on OpenBSD and SuSE.
* [project @ 2005-11-28 14:39:47 by simonmar]simonmar2005-11-283-2/+2
| | | | | | | | Small performance improvement to STM: reduce the size of an atomically frame from 3 words to 2 words by combining the "waiting" boolean field with the info pointer, i.e. having two separate info tables/return addresses for an atomically frame, one for the normal case and one for the waiitng case.
* [project @ 2005-11-25 13:56:16 by simonmar]simonmar2005-11-251-3/+0
| | | | oops, undo previous (SMP.h is already included)
* [project @ 2005-11-25 13:10:04 by simonmar]simonmar2005-11-251-0/+3
| | | | #include SMP.h
* [project @ 2005-11-25 13:06:25 by simonmar]simonmar2005-11-251-1/+13
| | | | define wb() and xchg() for non-SMP versions of the RTS
* [project @ 2005-11-24 16:23:48 by simonmar]simonmar2005-11-241-5/+5
| | | | fix some (thankfully harmless) typos
* [project @ 2005-11-24 14:21:33 by simonmar]simonmar2005-11-241-4/+34
| | | | | | | | unlockClosure() requires a write barrier for the compiler - write barriers aren't required for the CPU, but gcc re-orders non-aliasing writes unless we use an explicit barrier. This only just showed up when we started compiling the RTS with -O2.
* [project @ 2005-11-23 14:28:52 by simonmar]simonmar2005-11-231-2/+12
| | | | un-revert rev. 1.22, it wasn't the cause of last weekend's breakage
* [project @ 2005-11-21 20:00:55 by tharris]tharris2005-11-212-12/+11
| | | | Files missed from STM implementation changes
* [project @ 2005-11-19 11:44:32 by simonmar]simonmar2005-11-191-12/+2
| | | | | something has gone wrong; I don't have time right now to find out exactly what, so revert rev. 1.22 in an attempt to fix it.
* [project @ 2005-11-18 15:24:12 by simonmar]simonmar2005-11-185-9/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two improvements to the SMP runtime: - support for 'par', aka sparks. Load balancing is very primitive right now, but I have seen programs that go faster using par. - support for backing off when a thread is found to be duplicating a computation currently underway in another thread. This also fixes some instability in SMP, because it turned out that when an update frame points to an indirection, which can happen if a thunk is under evaluation in multiple threads, then after GC has shorted out the indirection the update will trash the value. Now we suspend the duplicate computation to the heap before this can happen. Additionally: - stack squeezing is separate from lazy blackholing, and now only happens if there's a reasonable amount of squeezing to be done in relation to the number of words of stack that have to be moved. This means we won't try to shift 10Mb of stack just to save 2 words at the bottom (it probably never happened, but still). - update frames are now marked when they have been visited by lazy blackholing, as per the SMP paper. - cleaned up raiseAsync() a bit.
* [project @ 2005-11-18 15:23:09 by simonmar]simonmar2005-11-181-3/+1
| | | | cosmetic
* [project @ 2005-11-18 15:13:46 by simonmar]simonmar2005-11-181-0/+18
| | | | | | | Add wcStore(), a write-combining store if supported (I tried using it in the update code and only succeeded in making things slower, but it might come in handy in the future)
* [project @ 2005-11-18 14:24:47 by simonmar]simonmar2005-11-181-2/+12
| | | | | Omit the __DISCARD__() call in FB_ if __GNUC__ >= 3. It doesn't appear to be necessary now, and it prevents some gcc optimisations.
* [project @ 2005-11-10 16:14:01 by simonmar]simonmar2005-11-101-0/+1
| | | | | | Fix a crash in STM; we were releasing ownership of the transaction too early in stmWait(), so a TSO could be woken up before we had finished putting it to sleep properly.
* [project @ 2005-11-04 12:02:04 by simonmar]simonmar2005-11-041-5/+25
| | | | Win32: Use CriticalSections instead of Mutexes, they are *much* faster.
* [project @ 2005-11-03 14:35:20 by simonmar]simonmar2005-11-031-2/+4
| | | | Modify ACQUIRE_LOCK/RELEASE_LOCK for use in .cmm files
* [project @ 2005-10-27 15:26:06 by simonmar]simonmar2005-10-271-360/+0
| | | | | | | | | | | | | | | | - Very simple work-sharing amongst Capabilities: whenever a Capability detects that it has more than 1 thread in its run queue, it runs around looking for empty Capabilities, and shares the threads on its run queue equally with the free Capabilities it finds. - unlock the garbage collector's mutable lists, by having private mutable lists per capability (and per generation). The private mutable lists are moved onto the main mutable lists at each GC. This pulls the old-generation update code out of the storage manager mutex, which is one of the last remaining causes of (alleged) contention. - Fix some problems with synchronising when a GC is required. We should synchronise quicker now.
* [project @ 2005-10-26 10:42:54 by simonmar]simonmar2005-10-262-2/+2
| | | | | | | | | | | | | | | | - change the type of StgRun(): now we return the Capability that the thread currently holds. The return status of the thread is now stored in cap->r.rRet (a new slot in the reg table). This was necessary because on return from StgRun(), the current TSO may be blocked, so it no longer belongs to us. If it is a bound thread, then the Task may have been already woken up on another Capability, so the scheduler can't use task->cap to find the capability it currently owns. - when shutting down, allow a bound thread to remove its TSO from the run queue when exiting (eliminates an error condition in releaseCapability()).
* [project @ 2005-10-24 09:28:38 by simonmar]simonmar2005-10-241-1/+1
| | | | Fix build for way "u"