| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the code a little more modular and allows the removal of some
CPP hackery. By providing dummy implementations of of the `m32_*`
functions (which simply call `errorBelch`) it means that the call sites
for these functions are syntax checked even when `RTS_LINKER_USE_MMAP`
is `0`.
Also changes some size parameter types from `unsigned int` to `size_t`.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: Phyx, hsyl20, bgamari, simonmar, austin
Reviewed By: simonmar, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2237
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first argument of 'osFreeMBlocks' ought to have the same type as the
return value from 'osGetMBlocks'. Make it so.
Reviewers: austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: erikd, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2235
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
| |
At some point there may have been a reason for the
`INLINE_ME` macro, but not anymore...
Reviewed By: austin
Differential Revision: https://phabricator.haskell.org/D2041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use of this helper function was removed in:
commit 3c9fc104337a142fe4f375d30d7a6b81d55a70c1
Author: Brian Brooks <brooks.brian@gmail.com>
Date: Thu Jul 10 02:55:33 2014 -0500
Avoid unnecessary clock_gettime() syscalls in GC stats.
Noticed by uselex.rb:
getThreadCPUTime: [R]: exported from:
./rts/dist/build/posix/GetTime.p_o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fix exit code for Windows to match expected for out-of-memory test
Test Plan: ./validate
Reviewers: simonmar, austin, thomie, bgamari
Reviewed By: thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D1753
GHC Trac Issues: #11422
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the two-step allocator the RTS asks the kernel for a large upfront
mmap'd region of memory (on the order of terabytes). While we have no
expectation that this entire region will be backed by physical memory,
this scheme nevertheless fails on some systems with resource limits.
Here we use a back-off scheme to reduce our allocation request until we
find a size agreeable to the kernel. Fixes #10877.
This also fixes a latent bug wherein the heap reservation retry logic
would fail to free the previously reserved address space, which would
likely result in a heap allocation failure.
Test Plan:
set address space limit with `ulimit -v 67108864` and try running
a compiled program
Reviewers: simonmar, austin
Reviewed By: simonmar
Subscribers: thomie, RyanGlScott
Differential Revision: https://phabricator.haskell.org/D1405
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this was introduced in D524 as a compile-time constant.
Sadly, this isn't flexible enough to allow for environments where
ulimits restrict the maximum address space size (see, for instance,
Consequently, we are forced to make this dynamic. In principle this
shouldn't be so terrible as we can place both the beginning and end
addresses within the same cache line, likely incurring only one or so
additional instruction in HEAP_ALLOCED.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1353
GHC Trac Issues: #10877
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current OS memory allocator conflates the concepts of allocating
address space and allocating memory, which makes the HEAP_ALLOCED()
implementation excessively complicated (as the only thing it cares
about is address space layout) and slow. Instead, what we want
is to allocate a single insanely large contiguous block of address
space (to make HEAP_ALLOCED() checks fast), and then commit subportions
of that in 1MB blocks as we did before.
This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for
certain OSes.
Test Plan: validate
Reviewers: simonmar, ezyang, austin
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D524
GHC Trac Issues: #9706
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 and x64
Summary:
On Windows, the default action for things like division by zero and
segfaults is to pop up a Dr. Watson error reporting dialog if the exception
is unhandled by the user code.
This is a pain when we are SSHed into a Windows machine, or when we
want to debug a problem with gdb (gdb will get a first and second chance to
handle the exception, but if it doesn't the pop-up will show).
veh_excn provides two macros, `BEGIN_CATCH` and `END_CATCH`, which
will catch such exceptions in the entire process and die by
printing a message and calling `stg_exit(1)`.
Previously this code was handled using SEH (Structured Exception Handlers)
however each compiler and platform have different ways of dealing with SEH.
`MSVC` compilers have the keywords `__try`, `__catch` and `__except` to have the
compiler generate the appropriate SEH handler code for you.
`MinGW` compilers have no such keywords and require you to manually set the
SEH Handlers, however because SEH is implemented differently in x86 and x64
the methods to use them in GCC differs.
`x86`: SEH is based on the stack, the SEH handlers are available at `FS[0]`.
On startup one would only need to add a new handler there. This has
a number of issues such as hard to share handlers and it can be exploited.
`x64`: In order to fix the issues with the way SEH worked in x86, on x64 SEH handlers
are statically compiled and added to the .pdata section by the compiler.
Instead of being thread global they can now be Image global since you have to
specify the `RVA` of the region of code that the handlers govern.
You can on x64 Dynamically allocate SEH handlers, but it seems that (based on
experimentation and it's very under-documented) that the dynamic calls cannot override
static SEH handlers in the .pdata section.
Because of this and because GHC no longer needs to support < windows XP, the better
alternative for handling errors would be using the in XP introduced VEH.
The bonus is because VEH (Vectored Exception Handler) are a runtime construct the API
is the same for both x86 and x64 (note that the Context object does contain CPU specific
structures) and the calls are the same cross compilers. Which means this file can be
simplified quite a bit.
Using VEH also means we don't have to worry about the dynamic code generated by GHCi.
Test Plan:
Prior to this diff the tests for `derefnull` and `divbyzero` seem to have been disabled for windows.
To reproduce the issue on x64:
1) open ghci
2) import GHC.Base
3) run: 1 `divInt` 0
which should lead to ghci crashing an a watson error box displaying.
After applying the patch, run:
make TEST="derefnull divbyzero"
on both x64 and x86 builds of ghc to verify fix.
Reviewers: simonmar, austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D691
GHC Trac Issues: #6079
|
|
|
|
|
|
|
|
| |
This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417.
New changes: now works on 32-bit platforms too. I added some basic
support for 64-bit subtraction and comparison operations to the x86
NCG.
|
|
|
|
|
|
|
| |
This reverts commit 35672072b4091d6f0031417bc160c568f22d0469.
Conflicts:
compiler/main/DriverPipeline.hs
|
|
|
|
|
| |
This helps identify threads in gdb particularly in processes with a
lot of threads.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In preparation for indirecting all references to closures,
we rename _closure to _static_closure to ensure any old code
will get an undefined symbol error. In order to reference
a closure foobar_closure (which is now undefined), you should instead
use STATIC_CLOSURE(foobar). For convenience, a number of these
old identifiers are macro'd.
Across C-- and C (Windows and otherwise), there were differing
conventions on whether or not foobar_closure or &foobar_closure
was the address of the closure. Now, all foobar_closure references
are addresses, and no & is necessary.
CHARLIKE/INTLIKE were not changed, simply alpha-renamed.
Part of remove HEAP_ALLOCED patch set (#8199)
Depends on D265
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D267
GHC Trac Issues: #8199
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Cppcheck found a few defects in win32 IOManager and a typo in rts
testsuite. This commit fixes them.
Cppcheck 1.54 founds three possible null pointer dereferences of ioMan
pointer. It is dereferenced and checked for NULL after that.
testheapalloced.c contains typo in printf statement, which should print
percent sign but treated as parameter placement by compiler. To properly
print percent sign one need to use "%%" string.
FYI: Cppcheck 1.66 cannot find possible null pointer dereferences in
mentioned places, mistakenly thinking that some memory leaking instead.
I probably fill a regression bug to Cppcheck.
Test Plan:
Build project, run 'make fulltest'. It finished with 28 unexpected
failures. I don't know if they are related to my fix.
Unexpected results from:
TEST="T3500b T7891 tc124 T7653 T5321FD T5030 T4801 T6048 T5631 T5837 T5642 T9020 T3064 parsing001 T1969 T5321Fun T783 T3294"
OVERALL SUMMARY for test run started at Tue Sep 9 16:46:27 2014 NOVT
4:23:24 spent to go through
4101 total tests, which gave rise to
16075 test cases, of which
3430 were skipped
315 had missing libraries
12154 expected passes
145 expected failures
3 caused framework failures
0 unexpected passes
28 unexpected failures
Unexpected failures:
../../libraries/base/tests T7653 [bad exit code] (ghci,threaded1,threaded2)
perf/compiler T1969 [stat not good enough] (normal)
perf/compiler T3064 [stat not good enough] (normal)
perf/compiler T3294 [stat not good enough] (normal)
perf/compiler T4801 [stat not good enough] (normal)
perf/compiler T5030 [stat not good enough] (normal)
perf/compiler T5321FD [stat not good enough] (normal)
perf/compiler T5321Fun [stat not good enough] (normal)
perf/compiler T5631 [stat not good enough] (normal)
perf/compiler T5642 [stat not good enough] (normal)
perf/compiler T5837 [stat not good enough] (normal)
perf/compiler T6048 [stat not good enough] (optasm)
perf/compiler T783 [stat not good enough] (normal)
perf/compiler T9020 [stat not good enough] (optasm)
perf/compiler parsing001 [stat not good enough] (normal)
typecheck/should_compile T7891 [exit code non-0] (hpc,optasm,optllvm)
typecheck/should_compile tc124 [exit code non-0] (hpc,optasm,optllvm)
typecheck/should_run T3500b [exit code non-0] (hpc,optasm,threaded2,dyn,optllvm)
Reviewers: austin
Reviewed By: austin
Subscribers: simonmar, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D203
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the patch any call to 'select()' with 'bad_fd' led to:
- unblocking of all threads
- hiding exception for 'threadWaitRead bad_fd'
The patch fixes both cases in this way:
after 'select()' failure we iterate over each blocked descriptor
and poll individually to see it's actual status, which is:
- READY (move to run queue)
- BLOCKED (leave in blocked queue)
- INVALID (send an IOErrror exception)
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
| |
Problems were found on 32-bit platforms, I'll commit again when I have a fix.
This reverts the following commits:
54b31f744848da872c7c6366dea840748e01b5cf
b0534f78a73f972e279eed4447a5687bd6a8308e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit b0534f78a73f972e279eed4447a5687bd6a8308e added new exported rts symbols,
but slightly misspelled them.
Observer on first compiled program:
> Linking dist/build/haskell-updater/haskell-updater ...
> /usr/lib64/ghc-7.9.20140503/rts-1.0/libHSrts.a(Schedule.o): In function `scheduleWaitThread':
> (.text+0xc4c): undefined reference to `base_GHCziIOziException_allocationLimitExceeded_closure'
> /usr/lib64/ghc-7.9.20140503/rts-1.0/libHSrts.a(RtsStartup.o): In function `hs_init_ghc':
> (.text+0x2fa): undefined reference to `base_GHCziIOziException_allocationLimitExceeded_closure'
> collect2: error: ld returned 1 exit status
CC: Simon Marlow <marlowsd@gmail.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
| |
GlobalMemoryStatusEx actually requires _WIN32_WINNT to be defined as
0x0501 (Windows XP) for availability.
For completeness, I bumped WIN32_WINNT in Ticker and OSThreads as well.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
MSDN says of Sleep:
If dwMilliseconds is greater than one tick but less than two, the
wait can be anywhere between one and two ticks, and so on.
so we need to add (milliseconds-per-tick - 1) to the amount of time we
sleep for.
|
|
|
|
| |
We were using "us" elsewhere, so this was inconsistent.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
lnat was originally "long unsigned int" but we were using it when we
wanted a 64-bit type on a 64-bit machine. This broke on Windows x64,
where long == int == 32 bits. Using types of unspecified size is bad,
but what we really wanted was a type with N bits on an N-bit machine.
StgWord is exactly that.
lnat was mentioned in some APIs that clients might be using
(e.g. StackOverflowHook()), so we leave it defined but with a comment
to say that it's deprecated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On most platforms the userspace thread type (e.g. pthread_t) and kernel
thread id are different. Normally we don't care about kernel thread Ids,
but some system tools for tracing/profiling etc report kernel ids.
For example Solaris and OSX's DTrace and Linux's perf tool report kernel
thread ids. To be able to match these up with RTS's OSThread we need a
way to get at the kernel thread, so we add a new function for to do just
that (the implementation is system-dependent).
Additionally, strictly speaking the OSThreadId type, used as task ids,
is not a serialisable representation. On unix OSThreadId is a typedef for
pthread_t, but pthread_t is not guaranteed to be a numeric type.
Indeed on some systems pthread_t is a pointer and in principle it
could be a structure type. So we add another new function to get a
serialisable representation of an OSThreadId. This is only for use
in log files. We use the function to serialise an id of a task,
with the extra feature that it works in non-threaded builds
by always returning 1.
|