summaryrefslogtreecommitdiff
path: root/rts/Capability.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Instead of a separate context-switch flag, set HpLim to zeroSimon Marlow2009-03-131-8/+11
| | | | | | | | | | | | This reduces the latency between a context-switch being triggered and the thread returning to the scheduler, which in turn should reduce the cost of the GC barrier when there are many cores. We still retain the old context_switch flag which is checked at the end of each block of allocation. The idea is that setting HpLim may fail if the the target thread is modifying HpLim at the same time; the context_switch flag is a fallback. It also allows us to "context switch soon" without forcing an immediate switch, which can be costly.
* Keep the remembered sets local to each thread during parallel GCSimon Marlow2009-01-121-0/+3
| | | | | | | | | | | | | | | | | | | | | This turns out to be quite vital for parallel programs: - The way we discover which threads to traverse is by finding dirty threads via the remembered sets (aka mutable lists). - A dirty thread will be on the remembered set of the capability that was running it, and we really want to traverse that thread's stack using the GC thread for the capability, because it is in that CPU's cache. If we get this wrong, we get penalised badly by the memory system. Previously we had per-capability mutable lists but they were aggregated before GC and traversed by just one of the GC threads. This resulted in very poor performance particularly for parallel programs with deep stacks. Now we keep per-capability remembered sets throughout GC, which also removes a lock (recordMutableGen_sync).
* Use mutator threads to do GC, instead of having a separate pool of GC threadsSimon Marlow2008-11-211-55/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the GC had its own pool of threads to use as workers when doing parallel GC. There was a "leader", which was the mutator thread that initiated the GC, and the other threads were taken from the pool. This was simple and worked fine for sequential programs, where we did most of the benchmarking for the parallel GC, but falls down for parallel programs. When we have N mutator threads and N cores, at GC time we would have to stop N-1 mutator threads and start up N-1 GC threads, and hope that the OS schedules them all onto separate cores. It practice it doesn't, as you might expect. Now we use the mutator threads to do GC. This works quite nicely, particularly for parallel programs, where each mutator thread scans its own spark pool, which is probably in its cache anyway. There are some flag changes: -g<n> is removed (-g1 is still accepted for backwards compat). There's no way to have a different number of GC threads than mutator threads now. -q1 Use one OS thread for GC (turns off parallel GC) -qg<n> Use parallel GC for generations >= <n> (default: 1) Using parallel GC only for generations >=1 works well for sequential programs. Compiling an ordinary sequential program with -threaded and running it with -N2 or more should help if you do a lot of GC. I've found that adding -qg0 (do parallel GC for generation 0 too) speeds up some parallel programs, but slows down some sequential programs. Being conservative, I left the threshold at 1. ToDo: document the new options.
* don't run sparks if there are other threads on this CapabilitySimon Marlow2008-11-141-4/+7
|
* Add optional eager black-holing, with new flag -feager-blackholingSimon Marlow2008-11-181-0/+1
| | | | | | | | | | | | | | | Eager blackholing can improve parallel performance by reducing the chances that two threads perform the same computation. However, it has a cost: one extra memory write per thunk entry. To get the best results, any code which may be executed in parallel should be compiled with eager blackholing turned on. But since there's a cost for sequential code, we make it optional and turn it on for the parallel package only. It might be a good idea to compile applications (or modules) with parallel code in with -feager-blackholing. ToDo: document -feager-blackholing.
* move an assertionSimon Marlow2008-11-131-2/+2
|
* re-instate counting of sparks convertedSimon Marlow2008-11-061-3/+16
| | | | lost in patch "Run sparks in batches"
* Run sparks in batches, instead of creating a new thread for each oneSimon Marlow2008-11-061-7/+22
| | | | | Signficantly reduces the overhead for par, which means that we can make use of paralellism at a much finer granularity.
* Move the freeing of Capabilities later in the shutdown sequenceSimon Marlow2008-10-241-3/+16
| | | | | Fixes a bug whereby the Capability has been freed but other Capabilities are still trying to steal sparks from its pool.
* fix a warningSimon Marlow2008-10-231-1/+0
|
* traverse the spark pools only once during GC rather than twiceSimon Marlow2008-10-221-17/+8
|
* Refactoring and reorganisation of the schedulerSimon Marlow2008-10-221-72/+53
| | | | | | | | | | | | | | | | | Change the way we look for work in the scheduler. Previously, checking to see whether there was anything to do was a non-side-effecting operation, but this has changed now that we do work-stealing. This lead to a refactoring of the inner loop of the scheduler. Also, lots of cleanup in the new work-stealing code, but no functional changes. One new statistic is added to the +RTS -s output: SPARKS: 1430 (2 converted, 1427 pruned) lets you know something about the use of `par` in the program.
* Work stealing for sparksberthold@mathematik.uni-marburg.de2008-09-151-3/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Spark stealing support for PARALLEL_HASKELL and THREADED_RTS versions of the RTS. Spark pools are per capability, separately allocated and held in the Capability structure. The implementation uses Double-Ended Queues (deque) and cas-protected access. The write end of the queue (position bottom) can only be used with mutual exclusion, i.e. by exactly one caller at a time. Multiple readers can steal()/findSpark() from the read end (position top), and are synchronised without a lock, based on a cas of the top position. One reader wins, the others return NULL for a failure. Work stealing is called when Capabilities find no other work (inside yieldCapability), and tries all capabilities 0..n-1 twice, unless a theft succeeds. Inside schedulePushWork, all considered cap.s (those which were idle and could be grabbed) are woken up. Future versions should wake up capabilities immediately when putting a new spark in the local pool, from newSpark(). Patch has been re-recorded due to conflicting bugfixes in the sparks.c, also fixing a (strange) conflict in the scheduler.
* Move the context_switch flag into the CapabilitySimon Marlow2008-09-191-0/+15
| | | | | Fixes a long-standing bug that could in some cases cause sub-optimal scheduling behaviour.
* Fix race condition in wakeupThreadOnCapability() (#2574)Simon Marlow2008-09-091-40/+23
| | | | | | | | | | | wakeupThreadOnCapbility() is used to signal another capability that there is a thread waiting to be added to its run queue. It adds the thread to the (locked) wakeup queue on the remote capability. In order to do this, it has to modify the TSO's link field, which has a write barrier. The write barrier might put the TSO on the mutable list, and the bug was that it was using the mutable list of the *target* capability, which we do not have exclusive access to. We should be using the current Capabilty's mutable list in this case.
* Capability stopping when waiting for GCberthold@mathematik.uni-marburg.de2008-08-191-1/+26
|
* Undo fix for #2185: sparks really should be treated as rootsSimon Marlow2008-07-231-15/+5
| | | | | Unless sparks are roots, strategies don't work at all: all the sparks get GC'd. We need to think about this some more.
* FIX #2185: sparks should not be treated as roots by the GCSimon Marlow2008-04-241-4/+28
|
* Reorganisation to fix problems related to the gct register variableSimon Marlow2008-04-161-0/+50
| | | | | | | | | - GCAux.c contains code not compiled with the gct register enabled, it is callable from outside the GC - marking functions are moved to their relevant subsystems, outside the GC - mark_root needs to save the gct register, as it is called from outside the GC
* hs_exit()/shutdownHaskell(): wait for outstanding foreign calls to complete ↵Simon Marlow2007-07-241-2/+18
| | | | | | | | | | | | | | | | before returning This is pertinent to #1177. When shutting down a DLL, we need to be sure that there are no OS threads that can return to the code that we are about to unload, and previously the threaded RTS was unsafe in this respect. When exiting a standalone program we don't have to be quite so paranoid: all the code will disappear at the same time as any running threads. Happily exiting a program happens via a different path: shutdownHaskellAndExit(). If we're about to exit(), then there's no need to wait for foreign calls to complete.
* Warning police: "%p" format expects a void*sven.panne@aedion.de2007-02-031-1/+1
|
* Partial fix for #926Simon Marlow2007-02-011-0/+24
| | | | | | | | | | | | | | It seems that when a program exits with open DLLs on Windows, the system attempts to shut down the DLLs, but it also terminates (some of?) the running threads. The RTS isn't prepared for threads to die unexpectedly, so it sits around waiting for its workers to finish. This bites in two places: ShutdownIOManager() in the the unthreaded RTS, and shutdownCapability() in the threaded RTS. So far I've modified the latter to notice when worker threads have died unexpectedly and continue shutting down. It seems a bit trickier to fix the unthreaded RTS, so for now the workaround for #926 is to use the threaded RTS.
* Free more things that we allocate2006-12-16Ian Lynagh2006-12-151-2/+8
|
* Free various things we allocateIan Lynagh2006-12-151-0/+2
|
* remove unused includes, now that Storage.h & Stable.h are included by Rts.hSimon Marlow2006-11-151-1/+0
|
* Split GC.c, and move storage manager into sm/ directorySimon Marlow2006-10-241-0/+1
| | | | | | | | | | | | | | | | | In preparation for parallel GC, split up the monolithic GC.c file into smaller parts. Also in this patch (and difficult to separate, unfortunatley): - Don't include Stable.h in Rts.h, instead just include it where necessary. - consistently use STATIC_INLINE in source files, and INLINE_HEADER in header files. STATIC_INLINE is now turned off when DEBUG is on, to make debugging easier. - The GC no longer takes the get_roots function as an argument. We weren't making use of this generalisation.
* STM invariantstharris@microsoft.com2006-10-071-1/+2
|
* don't closeMutex() the Capability lockSimon Marlow2006-08-311-2/+4
| | | | | | | | | | | | | There might be threads in foreign calls that will attempt to return via resumeThread() and grab this lock, so we can't safely destroy it. Fixes one cause of internal error: ASSERTION FAILED: file Capability.c, line 90 although I haven't repeated that assertion failure in the wild, only with a specially crafted test case, so I can't be sure I really got it.
* shutdownCapability(): don't bail out after 50 iterationsSimon Marlow2006-08-251-1/+7
| | | | | | See comments for details. Fixes assertion failures in stage 3 build which appeared after recent closeMutex() addidion. May fix other shutdown issues.
* Add closeMutex and use it on clean upEsa Ilari Vuokko2006-08-231-0/+3
|
* Asynchronous exception support for SMPSimon Marlow2006-06-161-1/+30
| | | | | | | | | | | | | | | | | This patch makes throwTo work with -threaded, and also refactors large parts of the concurrency support in the RTS to clean things up. We have some new files: RaiseAsync.{c,h} asynchronous exception support Threads.{c,h} general threading-related utils Some of the contents of these new files used to be in Schedule.c, which is smaller and cleaner as a result of the split. Asynchronous exception support in the presence of multiple running Haskell threads is rather tricky. In fact, to my annoyance there are still one or two bugs to track down, but the majority of the tests run now.
* New tracing interfaceSimon Marlow2006-06-081-21/+25
| | | | | | | | A simple interface for generating trace messages with timestamps and thread IDs attached to them. Most debugging output goes through this interface now, so it is straightforward to get timestamped debugging traces with +RTS -vt. Also, we plan to use this to generate parallelism profiles from the trace output.
* Reorganisation of the source treeSimon Marlow2006-04-071-0/+668
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.