summaryrefslogtreecommitdiff
path: root/rts/win32
Commit message (Collapse)AuthorAgeFilesLines
* NUMA supportSimon Marlow2016-06-102-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
* Runtime linker: Break m32 allocator out into its own fileErik de Castro Lopo2016-05-251-6/+6
| | | | | | | | | | | | | | | | | | | | This makes the code a little more modular and allows the removal of some CPP hackery. By providing dummy implementations of of the `m32_*` functions (which simply call `errorBelch`) it means that the call sites for these functions are syntax checked even when `RTS_LINKER_USE_MMAP` is `0`. Also changes some size parameter types from `unsigned int` to `size_t`. Test Plan: Validate on Linux, OS X and Windows Reviewers: Phyx, hsyl20, bgamari, simonmar, austin Reviewed By: simonmar, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2237
* Get types in osFreeMBlocks in sync with osGetMBlocksTomas Carnecky2016-05-191-1/+1
| | | | | | | | | | | | | The first argument of 'osFreeMBlocks' ought to have the same type as the return value from 'osGetMBlocks'. Make it so. Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: erikd, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2235
* rts: Replace `nat` with `uint32_t`Erik de Castro Lopo2016-05-054-11/+11
| | | | | | | | | | | | The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
* Remove now pointless INLINE_ME macroHerbert Valerio Riedel2016-03-271-1/+1
| | | | | | | | | At some point there may have been a reason for the `INLINE_ME` macro, but not anymore... Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D2041
* rts: drop unused getThreadCPUTimeSergei Trofimovich2016-02-071-13/+0
| | | | | | | | | | | | | | | | Use of this helper function was removed in: commit 3c9fc104337a142fe4f375d30d7a6b81d55a70c1 Author: Brian Brooks <brooks.brian@gmail.com> Date: Thu Jul 10 02:55:33 2014 -0500 Avoid unnecessary clock_gettime() syscalls in GC stats. Noticed by uselex.rb: getThreadCPUTime: [R]: exported from: ./rts/dist/build/posix/GetTime.p_o Signed-off-by: Sergei Trofimovich <siarheit@google.com>
* T11300: Fix test on windowsTamar Christina2016-01-142-7/+7
| | | | | | | | | | | | | | Summary: Fix exit code for Windows to match expected for out-of-memory test Test Plan: ./validate Reviewers: simonmar, austin, thomie, bgamari Reviewed By: thomie, bgamari Differential Revision: https://phabricator.haskell.org/D1753 GHC Trac Issues: #11422
* rts/posix: Reduce heap allocation amount on mmap failureBen Gamari2015-11-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the two-step allocator the RTS asks the kernel for a large upfront mmap'd region of memory (on the order of terabytes). While we have no expectation that this entire region will be backed by physical memory, this scheme nevertheless fails on some systems with resource limits. Here we use a back-off scheme to reduce our allocation request until we find a size agreeable to the kernel. Fixes #10877. This also fixes a latent bug wherein the heap reservation retry logic would fail to free the previously reserved address space, which would likely result in a heap allocation failure. Test Plan: set address space limit with `ulimit -v 67108864` and try running a compiled program Reviewers: simonmar, austin Reviewed By: simonmar Subscribers: thomie, RyanGlScott Differential Revision: https://phabricator.haskell.org/D1405 GHC Trac Issues: #10877
* rts: Make MBLOCK_SPACE_SIZE dynamicBen Gamari2015-10-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | Previously this was introduced in D524 as a compile-time constant. Sadly, this isn't flexible enough to allow for environments where ulimits restrict the maximum address space size (see, for instance, Consequently, we are forced to make this dynamic. In principle this shouldn't be so terrible as we can place both the beginning and end addresses within the same cache line, likely incurring only one or so additional instruction in HEAP_ALLOCED. Test Plan: validate Reviewers: austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1353 GHC Trac Issues: #10877
* Typos in commentsGabor Greif2015-08-011-1/+1
|
* Two step allocator for 64-bit systemsGiovanni Campagna2015-07-221-5/+72
| | | | | | | | | | | | | | | | | | | | | | | Summary: The current OS memory allocator conflates the concepts of allocating address space and allocating memory, which makes the HEAP_ALLOCED() implementation excessively complicated (as the only thing it cares about is address space layout) and slow. Instead, what we want is to allocate a single insanely large contiguous block of address space (to make HEAP_ALLOCED() checks fast), and then commit subportions of that in 1MB blocks as we did before. This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for certain OSes. Test Plan: validate Reviewers: simonmar, ezyang, austin Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D524 GHC Trac Issues: #9706
* Replaced SEH handles with VEH handlers which should work uniformly across ↵Tamar Christina2015-03-037-142/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 and x64 Summary: On Windows, the default action for things like division by zero and segfaults is to pop up a Dr. Watson error reporting dialog if the exception is unhandled by the user code. This is a pain when we are SSHed into a Windows machine, or when we want to debug a problem with gdb (gdb will get a first and second chance to handle the exception, but if it doesn't the pop-up will show). veh_excn provides two macros, `BEGIN_CATCH` and `END_CATCH`, which will catch such exceptions in the entire process and die by printing a message and calling `stg_exit(1)`. Previously this code was handled using SEH (Structured Exception Handlers) however each compiler and platform have different ways of dealing with SEH. `MSVC` compilers have the keywords `__try`, `__catch` and `__except` to have the compiler generate the appropriate SEH handler code for you. `MinGW` compilers have no such keywords and require you to manually set the SEH Handlers, however because SEH is implemented differently in x86 and x64 the methods to use them in GCC differs. `x86`: SEH is based on the stack, the SEH handlers are available at `FS[0]`. On startup one would only need to add a new handler there. This has a number of issues such as hard to share handlers and it can be exploited. `x64`: In order to fix the issues with the way SEH worked in x86, on x64 SEH handlers are statically compiled and added to the .pdata section by the compiler. Instead of being thread global they can now be Image global since you have to specify the `RVA` of the region of code that the handlers govern. You can on x64 Dynamically allocate SEH handlers, but it seems that (based on experimentation and it's very under-documented) that the dynamic calls cannot override static SEH handlers in the .pdata section. Because of this and because GHC no longer needs to support < windows XP, the better alternative for handling errors would be using the in XP introduced VEH. The bonus is because VEH (Vectored Exception Handler) are a runtime construct the API is the same for both x86 and x64 (note that the Context object does contain CPU specific structures) and the calls are the same cross compilers. Which means this file can be simplified quite a bit. Using VEH also means we don't have to worry about the dynamic code generated by GHCi. Test Plan: Prior to this diff the tests for `derefnull` and `divbyzero` seem to have been disabled for windows. To reproduce the issue on x64: 1) open ghci 2) import GHC.Base 3) run: 1 `divInt` 0 which should lead to ghci crashing an a watson error box displaying. After applying the patch, run: make TEST="derefnull divbyzero" on both x64 and x86 builds of ghc to verify fix. Reviewers: simonmar, austin Reviewed By: austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D691 GHC Trac Issues: #6079
* Per-thread allocation counters and limitsSimon Marlow2014-11-121-2/+3
| | | | | | | | This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417. New changes: now works on 32-bit platforms too. I added some basic support for 64-bit subtraction and comparison operations to the x86 NCG.
* Revert "Rename _closure to _static_closure, apply naming consistently."Edward Z. Yang2014-10-201-17/+17
| | | | | | | This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
* Name worker threads using pthread_setname_npSimon Marlow2014-10-101-1/+2
| | | | | This helps identify threads in gdb particularly in processes with a lot of threads.
* Rename _closure to _static_closure, apply naming consistently.Edward Z. Yang2014-10-011-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
* Revert "rts: add Emacs 'Local Variables' to every .c file"Simon Marlow2014-09-2917-136/+0
| | | | This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
* Fix cppcheck warningsBoris Egorov2014-09-161-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Cppcheck found a few defects in win32 IOManager and a typo in rts testsuite. This commit fixes them. Cppcheck 1.54 founds three possible null pointer dereferences of ioMan pointer. It is dereferenced and checked for NULL after that. testheapalloced.c contains typo in printf statement, which should print percent sign but treated as parameter placement by compiler. To properly print percent sign one need to use "%%" string. FYI: Cppcheck 1.66 cannot find possible null pointer dereferences in mentioned places, mistakenly thinking that some memory leaking instead. I probably fill a regression bug to Cppcheck. Test Plan: Build project, run 'make fulltest'. It finished with 28 unexpected failures. I don't know if they are related to my fix. Unexpected results from: TEST="T3500b T7891 tc124 T7653 T5321FD T5030 T4801 T6048 T5631 T5837 T5642 T9020 T3064 parsing001 T1969 T5321Fun T783 T3294" OVERALL SUMMARY for test run started at Tue Sep 9 16:46:27 2014 NOVT 4:23:24 spent to go through 4101 total tests, which gave rise to 16075 test cases, of which 3430 were skipped 315 had missing libraries 12154 expected passes 145 expected failures 3 caused framework failures 0 unexpected passes 28 unexpected failures Unexpected failures: ../../libraries/base/tests T7653 [bad exit code] (ghci,threaded1,threaded2) perf/compiler T1969 [stat not good enough] (normal) perf/compiler T3064 [stat not good enough] (normal) perf/compiler T3294 [stat not good enough] (normal) perf/compiler T4801 [stat not good enough] (normal) perf/compiler T5030 [stat not good enough] (normal) perf/compiler T5321FD [stat not good enough] (normal) perf/compiler T5321Fun [stat not good enough] (normal) perf/compiler T5631 [stat not good enough] (normal) perf/compiler T5642 [stat not good enough] (normal) perf/compiler T5837 [stat not good enough] (normal) perf/compiler T6048 [stat not good enough] (optasm) perf/compiler T783 [stat not good enough] (normal) perf/compiler T9020 [stat not good enough] (optasm) perf/compiler parsing001 [stat not good enough] (normal) typecheck/should_compile T7891 [exit code non-0] (hpc,optasm,optllvm) typecheck/should_compile tc124 [exit code non-0] (hpc,optasm,optllvm) typecheck/should_run T3500b [exit code non-0] (hpc,optasm,threaded2,dyn,optllvm) Reviewers: austin Reviewed By: austin Subscribers: simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D203
* Fix variable name typo from commit 3021fbNiklas Larsson2014-07-301-1/+1
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: Detab OSThreads.cAustin Seipp2014-07-281-1/+1
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: add Emacs 'Local Variables' to every .c fileAustin Seipp2014-07-2817-0/+136
| | | | | | | | This will hopefully help ensure some basic consistency in the forward by overriding buffer variables. In particular, it sets the wrap length, the offset to 4, and turns off tabs. Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/WorkQueue.cAustin Seipp2014-07-281-26/+27
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/WorkQueue.hAustin Seipp2014-07-281-1/+1
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/ThrIOManager.cAustin Seipp2014-07-281-7/+7
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/OSThreads.cAustin Seipp2014-07-281-1/+2
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/OSMem.cAustin Seipp2014-07-281-15/+21
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/IOManager.cAustin Seipp2014-07-281-259/+286
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/IOManager.hAustin Seipp2014-07-281-26/+26
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/GetTime.cAustin Seipp2014-07-281-8/+8
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/GetEnv.cAustin Seipp2014-07-281-1/+1
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/ConsoleHandler.cAustin Seipp2014-07-281-87/+88
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/AwaitEvent.cAustin Seipp2014-07-281-3/+3
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/AsyncIO.hAustin Seipp2014-07-281-4/+4
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: delint/detab/dewhitespace win32/AsyncIO.cAustin Seipp2014-07-281-150/+166
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Raise exceptions when blocked in bad FDs (fixes Trac #4934)Sergei Trofimovich2014-06-081-2/+1
| | | | | | | | | | | | | | | Before the patch any call to 'select()' with 'bad_fd' led to: - unblocking of all threads - hiding exception for 'threadWaitRead bad_fd' The patch fixes both cases in this way: after 'select()' failure we iterate over each blocked descriptor and poll individually to see it's actual status, which is: - READY (move to run queue) - BLOCKED (leave in blocked queue) - INVALID (send an IOErrror exception) Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Revert "Per-thread allocation counters and limits"Simon Marlow2014-05-041-1/+0
| | | | | | | | Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
* fix rts exported symbols base_GHCziIOziException_allocationLimitExceeded_closureSergei Trofimovich2014-05-031-0/+1
| | | | | | | | | | | | | | | | Commit b0534f78a73f972e279eed4447a5687bd6a8308e added new exported rts symbols, but slightly misspelled them. Observer on first compiled program: > Linking dist/build/haskell-updater/haskell-updater ... > /usr/lib64/ghc-7.9.20140503/rts-1.0/libHSrts.a(Schedule.o): In function `scheduleWaitThread': > (.text+0xc4c): undefined reference to `base_GHCziIOziException_allocationLimitExceeded_closure' > /usr/lib64/ghc-7.9.20140503/rts-1.0/libHSrts.a(RtsStartup.o): In function `hs_init_ghc': > (.text+0x2fa): undefined reference to `base_GHCziIOziException_allocationLimitExceeded_closure' > collect2: error: ld returned 1 exit status CC: Simon Marlow <marlowsd@gmail.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Untabify and delete trailing whitespace.Austin Seipp2013-10-261-23/+23
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Fix Windows build.Austin Seipp2013-10-263-3/+5
| | | | | | | | | GlobalMemoryStatusEx actually requires _WIN32_WINNT to be defined as 0x0501 (Windows XP) for availability. For completeness, I bumped WIN32_WINNT in Ticker and OSThreads as well. Signed-off-by: Austin Seipp <austin@well-typed.com>
* rts: Add getPhysicalMemorySizeBen Gamari2013-10-251-0/+18
|
* Simplify some code; patch from Bill TuttIan Lynagh2013-02-171-1/+1
|
* Fix line endings in rts/win32/ThrIOManager.cIan Lynagh2013-02-171-159/+159
|
* Small refactoring; patch from nusIan Lynagh2013-02-161-9/+1
|
* Build fix for dyn way on Windows; patch from nusIan Lynagh2013-02-161-0/+1
|
* commentsSimon Marlow2013-02-071-1/+11
|
* Fix threadDelay on Windows; fixes ThreadDelay001 failuresIan Lynagh2013-02-061-2/+24
| | | | | | | | MSDN says of Sleep: If dwMilliseconds is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on. so we need to add (milliseconds-per-tick - 1) to the amount of time we sleep for.
* Use usecs rather than msecs for microsecondsIan Lynagh2013-02-054-10/+10
| | | | We were using "us" elsewhere, so this was inconsistent.
* More OS X build fixesIan Lynagh2012-09-141-1/+1
|
* Deprecate lnat, and use StgWord insteadSimon Marlow2012-09-071-15/+15
| | | | | | | | | | | | lnat was originally "long unsigned int" but we were using it when we wanted a 64-bit type on a 64-bit machine. This broke on Windows x64, where long == int == 32 bits. Using types of unspecified size is bad, but what we really wanted was a type with N bits on an N-bit machine. StgWord is exactly that. lnat was mentioned in some APIs that clients might be using (e.g. StackOverflowHook()), so we leave it defined but with a comment to say that it's deprecated.
* New functions to get kernel thread Id + serialisable task IdDuncan Coutts2012-07-071-1/+7
| | | | | | | | | | | | | | | | | | | | On most platforms the userspace thread type (e.g. pthread_t) and kernel thread id are different. Normally we don't care about kernel thread Ids, but some system tools for tracing/profiling etc report kernel ids. For example Solaris and OSX's DTrace and Linux's perf tool report kernel thread ids. To be able to match these up with RTS's OSThread we need a way to get at the kernel thread, so we add a new function for to do just that (the implementation is system-dependent). Additionally, strictly speaking the OSThreadId type, used as task ids, is not a serialisable representation. On unix OSThreadId is a typedef for pthread_t, but pthread_t is not guaranteed to be a numeric type. Indeed on some systems pthread_t is a pointer and in principle it could be a structure type. So we add another new function to get a serialisable representation of an OSThreadId. This is only for use in log files. We use the function to serialise an id of a task, with the extra feature that it works in non-threaded builds by always returning 1.