summaryrefslogtreecommitdiff
path: root/rts/win32/OSThreads.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix 32 bit windows buildTamar Christina2018-05-281-1/+1
| | | | | | | | | | | | | | | | Summary: Fix a number of issues that have broken the 32 bit build. This makes it build again. Test Plan: ./validate Reviewers: hvr, goldfire, bgamari, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4691
* Fix NUMA support on Windows (#15049)David Kraeutmann2018-05-031-2/+1
| | | | | | | | | | | | | | | | | | | | * osNumaNodes now returns the right number of nodes * thread affinity is now correctly set TODO: no noticeable performance improvement. does windows already distribute threads in a NUMA-aware fashion? Test Plan: * validate * local tests on a NUMA machine Reviewers: bgamari, erikd, simonmar Reviewed By: bgamari, simonmar Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4607
* rts: Ensure that forkOS releases Task on terminationBen Gamari2018-01-311-0/+1
| | | | | | | | | | | | | | Test Plan: validate Reviewers: simonmar, erikd Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14725 Differential Revision: https://phabricator.haskell.org/D4346
* We define the `<XXX>_HOST_ARCH` to `1`, but never to `0`inMoritz Angermann2017-05-111-8/+8
| | | | | | | | | | | | | | | | | compiler/ghc.mk @echo "#define $(HostArch_CPP)_HOST_ARCH 1" >> $@ @echo "#define $(TargetArch_CPP)_HOST_ARCH 1" >> $@ this leads to warnigns like: > warning: 'x86_64_HOST_ARCH' is not defined, evaluates to 0 [-Wundef] Reviewers: austin, bgamari, erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3555
* Prefer #if defined to #ifdefBen Gamari2017-04-281-3/+3
| | | | Our new CPP linter enforces this.
* Fix x86 Windows build and testsuiteTamar Christina2016-12-061-9/+9
| | | | | | | | | | | | | | | | Summary: Fix issues preventing x86 GHC to build on Windows and fix segfault in the testsuite. Test Plan: ./validate Reviewers: austin, erikd, simonmar, bgamari Reviewed By: bgamari Subscribers: #ghc_windows_task_force, thomie Differential Revision: https://phabricator.haskell.org/D2789
* Use C99's boolBen Gamari2016-11-291-8/+8
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Add NUMA support for WindowsTamar Christina2016-10-011-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: NOTE: I have been able to do simple testing on emulated NUMA nodes. Real hardware would be needed for a proper test. D2199 Added NUMA support for Linux, I have just filled in the missing pieces following the description of the Linux APIs. Test Plan: Use `bcdedit.exe /set groupsize 2` to modify the kernel again (Similar to D2533). This generates some NUMA nodes: ``` Logical Processor to NUMA Node Map: NUMA Node 0: ** -- NUMA Node 1: -- ** Approximate Cross-NUMA Node Access Cost (relative to fastest): 00 01 00: 1.1 1.1 01: 1.0 1.0 ``` run ` ../test-numa.exe +RTS --numa -RTS` and check PerfMon for NUMA allocations. Reviewers: simonmar, erikd, bgamari, austin Reviewed By: simonmar Subscribers: thomie, #ghc_windows_task_force Differential Revision: https://phabricator.haskell.org/D2534 GHC Trac Issues: #12602
* Support more than 64 logical processors on WindowsTamar Christina2016-10-011-12/+286
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Windows support for more than 64 logical processors are implemented using processor groups. Essentially what it's doing is keeping the existing maximum of 64 processors and keeping the affinity mask a 64 bit value, but adds an hierarchy above that. This support was added to Windows 7 and so we need to at runtime detect if the APIs are still there due to our minimum supported version being Windows Vista. The Maximum number of groups supported at this time is 4, so 256 logical cores. The group indices are 0 based. One thread can have affinity with multiple groups. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251.aspx and particularly helpful is the whitepaper: 'Supporting Systems that have more than 64 processors' at https://msdn.microsoft.com/en-us/library/windows/hardware/dn653313.aspx Processor groups are not guaranteed to be uniformly distributed nor guaranteed to be filled before a next group is needed. The OS will assign processors to groups based on physical proximity and will never partially assign cores from one physical cpu to more than one group. If one has two 48 core CPUs then you'd end up with two groups of 48 logical cpus. Now add a 3rd CPU with 10 cores and the group it is assigned to depends where the socket is on the board. Test Plan: ./validate or make test -c . in the rts test folder. This tests for regressions, to test this particular functionality itself: <program> +RTS -N -qa -RTS Test is detailed in description. Reviewers: bgamari, simonmar, austin, erikd Reviewed By: simonmar Subscribers: thomie, #ghc_windows_task_force Differential Revision: https://phabricator.haskell.org/D2533 GHC Trac Issues: #11054
* NUMA supportSimon Marlow2016-06-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-051-5/+5
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Replaced SEH handles with VEH handlers which should work uniformly across ↵Tamar Christina2015-03-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 and x64 Summary: On Windows, the default action for things like division by zero and segfaults is to pop up a Dr. Watson error reporting dialog if the exception is unhandled by the user code. This is a pain when we are SSHed into a Windows machine, or when we want to debug a problem with gdb (gdb will get a first and second chance to handle the exception, but if it doesn't the pop-up will show). veh_excn provides two macros, `BEGIN_CATCH` and `END_CATCH`, which will catch such exceptions in the entire process and die by printing a message and calling `stg_exit(1)`. Previously this code was handled using SEH (Structured Exception Handlers) however each compiler and platform have different ways of dealing with SEH. `MSVC` compilers have the keywords `__try`, `__catch` and `__except` to have the compiler generate the appropriate SEH handler code for you. `MinGW` compilers have no such keywords and require you to manually set the SEH Handlers, however because SEH is implemented differently in x86 and x64 the methods to use them in GCC differs. `x86`: SEH is based on the stack, the SEH handlers are available at `FS[0]`. On startup one would only need to add a new handler there. This has a number of issues such as hard to share handlers and it can be exploited. `x64`: In order to fix the issues with the way SEH worked in x86, on x64 SEH handlers are statically compiled and added to the .pdata section by the compiler. Instead of being thread global they can now be Image global since you have to specify the `RVA` of the region of code that the handlers govern. You can on x64 Dynamically allocate SEH handlers, but it seems that (based on experimentation and it's very under-documented) that the dynamic calls cannot override static SEH handlers in the .pdata section. Because of this and because GHC no longer needs to support < windows XP, the better alternative for handling errors would be using the in XP introduced VEH. The bonus is because VEH (Vectored Exception Handler) are a runtime construct the API is the same for both x86 and x64 (note that the Context object does contain CPU specific structures) and the calls are the same cross compilers. Which means this file can be simplified quite a bit. Using VEH also means we don't have to worry about the dynamic code generated by GHCi. Test Plan: Prior to this diff the tests for `derefnull` and `divbyzero` seem to have been disabled for windows. To reproduce the issue on x64: 1) open ghci 2) import GHC.Base 3) run: 1 `divInt` 0 which should lead to ghci crashing an a watson error box displaying. After applying the patch, run: make TEST="derefnull divbyzero" on both x64 and x86 builds of ghc to verify fix. Reviewers: simonmar, austin Reviewed By: austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D691 GHC Trac Issues: #6079
* Name worker threads using pthread_setname_npSimon Marlow2014-10-101-1/+2
| | | | | This helps identify threads in gdb particularly in processes with a lot of threads.
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-291-8/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* rts: Detab OSThreads.cAustin Seipp2014-07-281-1/+1
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-281-0/+8
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/OSThreads.cAustin Seipp2014-07-281-1/+2
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Untabify and delete trailing whitespace.Austin Seipp2013-10-261-23/+23
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Fix Windows build.Austin Seipp2013-10-261-1/+1
| | | | | | | | | GlobalMemoryStatusEx actually requires _WIN32_WINNT to be defined as 0x0501 (Windows XP) for availability. For completeness, I bumped WIN32_WINNT in Ticker and OSThreads as well. Signed-off-by: Austin Seipp <austin@well-typed.com>
* New functions to get kernel thread Id + serialisable task IdDuncan Coutts2012-07-071-1/+7
| | | | | | | | | | | | | | | | | | | | On most platforms the userspace thread type (e.g. pthread_t) and kernel thread id are different. Normally we don't care about kernel thread Ids, but some system tools for tracing/profiling etc report kernel ids. For example Solaris and OSX's DTrace and Linux's perf tool report kernel thread ids. To be able to match these up with RTS's OSThread we need a way to get at the kernel thread, so we add a new function for to do just that (the implementation is system-dependent). Additionally, strictly speaking the OSThreadId type, used as task ids, is not a serialisable representation. On unix OSThreadId is a typedef for pthread_t, but pthread_t is not guaranteed to be a numeric type. Indeed on some systems pthread_t is a pointer and in principle it could be a structure type. So we add another new function to get a serialisable representation of an OSThreadId. This is only for use in log files. We use the function to serialise an id of a task, with the extra feature that it works in non-threaded builds by always returning 1.
* Fix Windows buildSimon Marlow2011-12-091-1/+1
|
* Define getNumberOfProcessors() even when !THREADED_RTSSimon Marlow2011-12-071-0/+5
|
* Close some handle leaks (#5604)Simon Marlow2011-11-091-9/+21
| | | | | Also, use the Win32 API (CreateThread) instead of the CRT API (_beginthreadex) for thread creation.
* Interruptible FFI calls with pthread_kill and CancelSynchronousIO. v4Edward Z. Yang2010-09-191-0/+19
| | | | | | | | | | | | | | | | | | | | | | | This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
* implement setThreadAffinity on Windows (#1741)Simon Marlow2010-09-141-2/+19
|
* Win32 yieldThread(): use SwitchToThread() instead of Sleep(0)Simon Marlow2010-01-271-1/+1
|
* Windows build fixesSimon Marlow2009-08-031-1/+1
|
* Set thread affinity with +RTS -qa (only on Linux so far)Simon Marlow2009-03-181-0/+6
|
* Add getNumberOfProcessors(), FIX MacOS X build problem (hopefully)Simon Marlow2009-03-171-0/+14
| | | | | Somebody needs to implement getNumberOfProcessors() for MacOS X, currently it will return 1.
* Windows: remove the {Enter,Leave}CricialSection wrappersSimon Marlow2007-08-291-8/+0
| | | | | | The C-- parser was missing the "stdcall" calling convention for foreign calls, but once added we can call {Enter,Leave}CricialSection directly.
* Fix the threaded RTS on WindowsIan Lynagh2007-08-161-0/+9
| | | | | When calling EnterCriticalSection and LeaveCriticalSection from C-- code, we go via wrappers which use ccall (rather than stdcall).
* Free thread local storage on shutdownIan Lynagh2007-02-221-0/+11
|
* Partial fix for #926Simon Marlow2007-02-011-2/+22
| | | | | | | | | | | | | | It seems that when a program exits with open DLLs on Windows, the system attempts to shut down the DLLs, but it also terminates (some of?) the running threads. The RTS isn't prepared for threads to die unexpectedly, so it sits around waiting for its workers to finish. This bites in two places: ShutdownIOManager() in the the unthreaded RTS, and shutdownCapability() in the threaded RTS. So far I've modified the latter to notice when worker threads have died unexpectedly and continue shutting down. It seems a bit trickier to fix the unthreaded RTS, so for now the workaround for #926 is to use the threaded RTS.
* add sysErrorBelch() for reporting system call errorsSimon Marlow2006-08-301-3/+5
|
* Add closeMutex and use it on clean upEsa Ilari Vuokko2006-08-231-0/+10
|
* Reorganisation of the source treeSimon Marlow2006-04-071-0/+199
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.