summaryrefslogtreecommitdiff
path: root/rts/Task.c
Commit message (Collapse)AuthorAgeFilesLines
* rts/Task: Move debugTrace to avoid data raceBen Gamari2020-11-241-2/+2
| | | | Specifically, we need to hold all_tasks_mutex to read taskCount.
* rts: Eliminate shutdown data race on task countersBen Gamari2020-11-241-0/+1
|
* rts: Use relaxed operations for cap->running_task (TODO)Ben Gamari2020-11-241-1/+1
| | | | | This shouldn't be necessary since only the owning thread of the capability should be touching this.
* rts: Ensure that task->id is initializedBen Gamari2018-12-071-0/+1
| | | | | | | | | | Reviewers: erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5325
* Fix a few GCC warningsMichal Terepeta2018-05-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 8 now generates warnings for incompatible function pointer casts [-Werror=cast-function-type]. Apparently there are a few of those in rts code, which makes `./validate` unhappy (since we compile with `-Werror`) This commit tries to fix these issues by changing the functions to have the correct type (and, if necessary, moving the casts into those functions). For instance, hash/comparison function are declared (`Hash.h`) to take `StgWord` but we want to use `StgWord64[2]` in `StaticPtrTable.c`. Instead of casting the function pointers, we can cast the `StgWord` parameter to `StgWord*`. I think this should be ok since `StgWord` should be the same size as a pointer. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4673
* rts: Note functions which must take all_tasks_mutex.Ben Gamari2018-03-021-0/+2
|
* Include original process name in worker thread name (#14153)Echo Nolan2017-09-251-1/+22
| | | | | | | | | | | | | | | | | | | | Prior to this commit, worker OS thread were renamed to "ghc_worker" when spawned. This was annoying when reading debugging messages that print the process name because it doesn't tell you *which* Haskell program is generating the message. This commit changes it to "original_process_name:w", truncating the original name to fit in the kernel buffer if neccesary. Test Plan: ./validate Reviewers: austin, bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: Phyx, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4001
* Prefer #if defined to #ifdefBen Gamari2017-04-281-4/+4
| | | | Our new CPP linter enforces this.
* Use C99's boolBen Gamari2016-11-291-10/+10
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* rts: Add api to pin a thread to a numa node but without fixing a capabilityDarshan Kapashi2016-11-101-5/+13
| | | | | | | | | | | | | | | | | | | | | | | `rts_setInCallCapability` sets the thread affinity as well as pins the numa node. We should also have the ability to set the numa node without setting the capability affinity. `rts_pinNumaNodeForCapability` function is added and exported via `RtsAPI.h`. Previous callers of `rts_setInCallCapability` should now also call `rts_pinNumaNodeForCapability` to get the same effect as before. Test Plan: ./validate Reviewers: austin, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie, niteria Differential Revision: https://phabricator.haskell.org/D2637 GHC Trac Issues: #12764
* Add hs_try_putmvar()Simon Marlow2016-09-121-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a fast, non-blocking, asynchronous, interface to tryPutMVar that can be called from C/C++. It's useful for callback-based C/C++ APIs: the idea is that the callback invokes hs_try_putmvar(), and the Haskell code waits for the callback to run by blocking in takeMVar. The callback doesn't block - this is often a requirement of callback-based APIs. The callback wakes up the Haskell thread with minimal overhead and no unnecessary context-switches. There are a couple of benchmarks in testsuite/tests/concurrent/should_run. Some example results comparing hs_try_putmvar() with using a standard foreign export: ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4 0.49s ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4 2.30s hs_try_putmvar() is 4x faster for this workload (see the source for hs_try_putmvar003.hs for details of the workload). An alternative solution is to use the IO Manager for this. We've tried it, but there are problems with that approach: * Need to create a new file descriptor for each callback * The IO Manger thread(s) become a bottleneck * More potential for things to go wrong, e.g. throwing an exception in an IO Manager callback kills the IO Manager thread. Test Plan: validate; new unit tests Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2501
* NUMA cleanupsSimon Marlow2016-06-171-2/+2
| | | | | - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c - Add a test to tests/rts
* NUMA supportSimon Marlow2016-06-101-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-6/+7
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* RTS: Add setInCallCapability()Simon Marlow2016-04-261-0/+9
| | | | | | | | This allows an OS thread to specify which capability it should run on when it makes a call into Haskell. It is intended for a fairly specialised use case, when the client wants to have tighter control over the mapping between OS threads and Capabilities - perhaps 1:1 correspondence, for example.
* RTS: Rename InCall.stat struct field to .rstatHerbert Valerio Riedel2015-12-041-1/+1
| | | | | | | | | | | On AIX, C system headers can redirect the token `stat` via #define stat stat64 to provide large-file support. Simply avoiding the use of `stat` as an identifier eschews macro-replacement. Differential Revision: https://phabricator.haskell.org/D1566
* Clarify meaning of the RTS `taskCount` variableThomas Miedema2015-03-221-1/+1
| | | | | | | | | | | | | | | | | | | In #9261, there was some confusion about the meaning of the taskCount stats variable in the rts. It turns out that taskCount is not decremented when a worker task is stopped (i.e. from workerTaskStop), but only when freeMyTask is called, which frees the task bound to the current thread. So taskCount is the current number of bound tasks + the total number of worker tasks. This makes the calculation of the current number of bound tasks in rts/Stats.c correct _as is_. [skip ci] Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D746
* Name worker threads using pthread_setname_npSimon Marlow2014-10-101-1/+1
| | | | | This helps identify threads in gdb particularly in processes with a lot of threads.
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: detabify/dewhitespace Task.cAustin Seipp2014-08-201-19/+19
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* Acquire all_tasks_mutex in forkProcessEdsko de Vries2014-07-131-1/+1
| | | | | | | | | | | | | | Summary: (for the same reason that we acquire all the other mutexes) Test Plan: validate Reviewers: simonmar, austin, duncan Reviewed By: simonmar, austin, duncan Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D60
* Avoid deadlock in freeTask (called by forkProcess)Edsko de Vries2014-07-131-0/+14
| | | | | | | | | | | | | | Summary: Documented in more detail inline with the change. Test Plan: validate Reviewers: austin, simonmar, duncan Reviewed By: austin, simonmar, duncan Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D59
* Add hs_thread_done() (#8124)Simon Marlow2014-02-271-1/+39
| | | | See documentation for details.
* Don't move Capabilities in setNumCapabilities (#8209)Simon Marlow2013-09-041-28/+0
| | | | | | | | | | | | | We have various problems with reallocating the array of Capabilities, due to threads in waitForReturnCapability that are already holding a pointer to a Capability. Rather than add more locking to make this safer, I decided it would be easier to ensure that we never move the Capabilities at all. The capabilities array is now an array of pointers to Capabaility. There are extra indirections, but it rarely matters - we don't often access Capabilities via the array, normally we already have a pointer to one. I ran the parallel benchmarks and didn't see any difference.
* Hopefully fix breakage on OS X w/ LLVMSimon Marlow2013-01-171-9/+0
| | | | | | | Reordering of includes in GC.c broke on OS X because gctKey is declared in Task.h and is needed in the storage manager. This is really the wrong place for it anyway, so I've moved the gctKey pieces to where they should be.
* Merge taskId and serialisableTaskIdMikolaj Konarski2012-07-251-18/+4
| | | | A companion ghc-events pachakge commit displays task ids in the same format.
* Fix typoIan Lynagh2012-07-141-1/+1
|
* Emit the task-tracking eventsDuncan Coutts2012-07-101-0/+12
| | | | | | | | | | | | Based on initial patches by Mikolaj Konarski <mikolaj@well-typed.com> Use the new task tracing functions traceTaskCreate/Migrate/Delete. There are two key places. One is for worker tasks which have a relatively simple life cycle. Worker tasks are created and deleted by the RTS. The other case is bound tasks which are either created by the RTS, or appear as foreign C threads making calls into the RTS. For bound threads we do the tracing in rts_lock/unlock, which actually covers both threads coming in from outside, and also bound threads made by the RTS.
* New functions to get kernel thread Id + serialisable task IdDuncan Coutts2012-07-071-1/+4
| | | | | | | | | | | | | | | | | | | | On most platforms the userspace thread type (e.g. pthread_t) and kernel thread id are different. Normally we don't care about kernel thread Ids, but some system tools for tracing/profiling etc report kernel ids. For example Solaris and OSX's DTrace and Linux's perf tool report kernel thread ids. To be able to match these up with RTS's OSThread we need a way to get at the kernel thread, so we add a new function for to do just that (the implementation is system-dependent). Additionally, strictly speaking the OSThreadId type, used as task ids, is not a serialisable representation. On unix OSThreadId is a typedef for pthread_t, but pthread_t is not guaranteed to be a numeric type. Indeed on some systems pthread_t is a pointer and in principle it could be a structure type. So we add another new function to get a serialisable representation of an OSThreadId. This is only for use in log files. We use the function to serialise an id of a task, with the extra feature that it works in non-threaded builds by always returning 1.
* Fix warnings on Win64Ian Lynagh2012-04-261-2/+2
| | | | | | Mostly this meant getting pointer<->int conversions to use the right sizes. lnat is now size_t, rather than unsigned long, as that seems a better match for how it's used.
* Drop the per-task timing stats, give a summary only (#5897)Simon Marlow2012-03-021-55/+45
| | | | | | | | | | | | | | | | | | | | | | We were keeping around the Task struct (216 bytes) for every worker we ever created, even though we only keep a maximum of 6 workers per Capability. These Task structs accumulate and cause a space leak in programs that do lots of safe FFI calls; this patch frees the Task struct as soon as a worker exits. One reason we were keeping the Task structs around is because we print out per-Task timing stats in +RTS -s, but that isn't terribly useful. What is sometimes useful is knowing how *many* Tasks there were. So now I'm printing a single-line summary, this is for the program in TASKS: 2001 (1 bound, 31 peak workers (2000 total), using -N1) So although we created 2k tasks overall, there were only 31 workers active at any one time (which is exactly what we expect: the program makes 30 safe FFI calls concurrently). This also gives an indication of how many capabilities were being used, which is handy if you use +RTS -N without an explicit number.
* Allow the number of capabilities to be increased at runtime (#3729)Simon Marlow2011-12-061-0/+28
| | | | | At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
* Time handling overhaulSimon Marlow2011-11-251-3/+3
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* Enable pthread_getspecific() tls for LLVM compilerDavid M Peixotto2011-10-071-1/+12
| | | | | | | | | | | | | LLVM does not support the __thread attribute for thread local storage and may generate incorrect code for global register variables. We want to allow building the runtime with LLVM-based compilers such as llvm-gcc and clang, particularly for MacOS. This patch changes the gct variable used by the garbage collector to use pthread_getspecific() for thread local storage when an llvm based compiler is used to build the runtime.
* Fix gcc 4.6 warnings; fixes #5176Ian Lynagh2011-06-251-2/+2
| | | | | | | | | | | Based on a patch from David Terei. Some parts are a little ugly (e.g. defining things that only ASSERTs use only when DEBUG is defined), so we might want to tweak things a little. I've also turned off -Werror for didn't-inline warnings, as we now get a few such warnings.
* Refactoring and tidy upSimon Marlow2011-04-111-7/+12
| | | | | | | | | | | | This is a port of some of the changes from my private local-GC branch (which is still in darcs, I haven't converted it to git yet). There are a couple of small functional differences in the GC stats: first, per-thread GC timings should now be more accurate, and secondly we now report average and maximum pause times. e.g. from minimax +RTS -N8 -s: Tot time (elapsed) Avg pause Max pause Gen 0 2755 colls, 2754 par 13.16s 0.93s 0.0003s 0.0150s Gen 1 769 colls, 769 par 3.71s 0.26s 0.0003s 0.0059s
* boundTaskExiting: don't set task->stopped unless this is the last call (#4850)Simon Marlow2010-12-211-2/+8
| | | | | | | | | | | | | | | | | The bug in this case was that we had a worker thread making a foreign call which invoked a callback (in this case it was performGC, I think). When the callback ended, boundTaskExiting() was setting task->stopped, but the Task is now per-OS-thread, so it is shared by the worker that made the original foreign call. When the foreign call returned, because task->stopped was set, the worker was not placed on the queue of spare workers. Somehow the worker woke up again, and found the spare_workers queue empty, which lead to a crash. Two bugs here: task->stopped should not have been set by boundTaskExiting (this broke when I split the Task and InCall structs, in 6.12.2), and releaseCapabilityAndQueueWorker() should not be testing task->stopped anyway, because it should only ever be called when task->stopped is false (this is now an assertion).
* Fix up the ifdefs in Task.cIan Lynagh2010-11-131-0/+4
|
* Use standard task ID print style (hexadecimal).Edward Z. Yang2010-11-111-12/+12
|
* Interruptible FFI calls with pthread_kill and CancelSynchronousIO. v4Edward Z. Yang2010-09-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
* Windows: use a thread-local variable for myTask()Simon Marlow2010-09-151-1/+3
| | | | Which entailed fixing an incorrect #ifdef in Task.c
* Use a separate mutex to protect all_tasks, avoiding a lock-order-reversalSimon Marlow2010-07-161-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | In GHC 6.12.x I found a rare deadlock caused by this lock-order-reversal: AQ cap->lock startWorkerTask newTask AQ sched_mutex scheduleCheckBlackHoles AQ sched_mutex unblockOne_ wakeupThreadOnCapabilty AQ cap->lock so sched_mutex and cap->lock are taken in a different order in two places. This doesn't happen in the HEAD because we don't have scheduleCheckBlackHoles, but I thought it would be prudent to make this less likely to happen in the future by using a different mutex in newTask. We can clearly see that the all_tasks mutex cannot be involved in a deadlock, becasue we never call anything else while holding it.
* Fix crash in nested callbacks (#4038)Simon Marlow2010-05-071-2/+2
| | | | | Broken by "Split part of the Task struct into a separate struct InCall".
* Make the running_finalizers flag task-localSimon Marlow2010-05-051-0/+1
| | | | | Fixes a bug reported by Lennart Augustsson, whereby we could get an incorrect error from the RTS about re-entry from a finalizer,
* tidy up the end of the all_tasks list after forkingSimon Marlow2010-03-291-0/+1
|
* fix bug in discardTasksExcept() that broke forkProcessSimon Marlow2010-03-111-2/+3
|
* Split part of the Task struct into a separate struct InCallSimon Marlow2010-03-091-92/+163
| | | | | | | | | | | | | | | The idea is that this leaves Tasks and OSThread in one-to-one correspondence. The part of a Task that represents a call into Haskell from C is split into a separate struct InCall, pointed to by the Task and the TSO bound to it. A given OSThread/Task thus always uses the same mutex and condition variable, rather than getting a new one for each callback. Conceptually it is simpler, although there are more types and indirections in a few places now. This improves callback performance by removing some of the locks that we had to take when making in-calls. Now we also keep the current Task in a thread-local variable if supported by the OS and gcc (currently only Linux).
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* Fix #3236: emit a helpful error message when the RTS has not been initialisedSimon Marlow2009-05-181-5/+18
|