| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
As noted in #18991, we would previously allocate heap in low memory.
Due to this the linker, which typically *needs* low memory, would end up
competing with the heap. In longer builds we end up running out of
low memory entirely, leading to linking failures.
|
|
|
|
|
|
|
|
| |
Since switching to the two-step allocator, the `outofmem` test fails via
`osCommitMemory` failing to commit. However, this was previously exiting
with `EXIT_FAILURE`, rather than `EXIT_HEAPOVERFLOW`. I think the latter
is a more reasonable exit code for this case and matches the behavior on
POSIX platforms.
|
|
|
|
| |
instead of emulated ones
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with Win 8.1/Server 2012 windows no longer preallocates
page tables for reserverd memory eagerly, which prevented us from
using this approach in the past.
We also try to allocate the heap high in the memory space.
Hopefully this makes it easier to allocate things in the low
4GB of memory that need to be there. Like jump islands for the
linker.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The megablock allocator does not currently check that after aligning the
free region if it still has enough space to actually do the allocation.
This causes it to return a memory region which it didn't fully allocate
itself. Even worse, it can cause it to return a block with a region
that will be present in two allocation pools.
This causes if you're lucky an error from the OS that you're committing
memory that has never been reserved, or causes random heap corruption.
This change makes it consider the alignment as well.
Test Plan: ./validate , testcase testmblockalloc
Reviewers: bgamari, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5363
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate, run program with `+RTS --numa` without libnuma
support compiled in
Reviewers: erikd, simonmar
Subscribers: thomie, carter
GHC Trac Issues: #14956
Differential Revision: https://phabricator.haskell.org/D4556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* osNumaNodes now returns the right number of nodes
* thread affinity is now correctly set
TODO: no noticeable performance improvement.
does windows already distribute threads in a NUMA-aware fashion?
Test Plan:
* validate
* local tests on a NUMA machine
Reviewers: bgamari, erikd, simonmar
Reviewed By: bgamari, simonmar
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4607
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Adds quick-cross-ncg flavour.
- Fix windows wchar with `_s` for mingw
- Lookup windres, dllwrap and objdump
- Fix type.
Reviewers: bgamari, hvr, Phyx, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie, erikd, carter
Differential Revision: https://phabricator.haskell.org/D4430
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
| |
This will make it a bit easier to maintain consistent output in the testsuite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix issues preventing x86 GHC to build on Windows and
fix segfault in the testsuite.
Test Plan: ./validate
Reviewers: austin, erikd, simonmar, bgamari
Reviewed By: bgamari
Subscribers: #ghc_windows_task_force, thomie
Differential Revision: https://phabricator.haskell.org/D2789
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
NOTE: I have been able to do simple testing on emulated NUMA nodes.
Real hardware would be needed for a proper test.
D2199 Added NUMA support for Linux, I have just filled in the missing pieces following
the description of the Linux APIs.
Test Plan:
Use `bcdedit.exe /set groupsize 2` to modify the kernel again (Similar to D2533).
This generates some NUMA nodes:
```
Logical Processor to NUMA Node Map:
NUMA Node 0:
**
--
NUMA Node 1:
--
**
Approximate Cross-NUMA Node Access Cost (relative to fastest):
00 01
00: 1.1 1.1
01: 1.0 1.0
```
run ` ../test-numa.exe +RTS --numa -RTS`
and check PerfMon for NUMA allocations.
Reviewers: simonmar, erikd, bgamari, austin
Reviewed By: simonmar
Subscribers: thomie, #ghc_windows_task_force
Differential Revision: https://phabricator.haskell.org/D2534
GHC Trac Issues: #12602
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We stumbled upon a case where an external library (OpenCL) does not work
if a specific address (0x200000000) is taken.
It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000:
```
void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE);
at = osTryReserveHeapMemory(*len, hint);
```
This makes it impossible to use Haskell programs compiled with GHC 8
with C functions that use OpenCL.
See this example https://github.com/chpatrick/oclwtf for a repro.
This patch allows the user to work around this kind of behavior outside
our control by letting the user override the starting address through an
RTS command line flag.
Reviewers: bgamari, Phyx, simonmar, erikd, austin
Reviewed By: Phyx, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the code a little more modular and allows the removal of some
CPP hackery. By providing dummy implementations of of the `m32_*`
functions (which simply call `errorBelch`) it means that the call sites
for these functions are syntax checked even when `RTS_LINKER_USE_MMAP`
is `0`.
Also changes some size parameter types from `unsigned int` to `size_t`.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: Phyx, hsyl20, bgamari, simonmar, austin
Reviewed By: simonmar, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2237
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first argument of 'osFreeMBlocks' ought to have the same type as the
return value from 'osGetMBlocks'. Make it so.
Reviewers: austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: erikd, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2235
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fix exit code for Windows to match expected for out-of-memory test
Test Plan: ./validate
Reviewers: simonmar, austin, thomie, bgamari
Reviewed By: thomie, bgamari
Differential Revision: https://phabricator.haskell.org/D1753
GHC Trac Issues: #11422
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the two-step allocator the RTS asks the kernel for a large upfront
mmap'd region of memory (on the order of terabytes). While we have no
expectation that this entire region will be backed by physical memory,
this scheme nevertheless fails on some systems with resource limits.
Here we use a back-off scheme to reduce our allocation request until we
find a size agreeable to the kernel. Fixes #10877.
This also fixes a latent bug wherein the heap reservation retry logic
would fail to free the previously reserved address space, which would
likely result in a heap allocation failure.
Test Plan:
set address space limit with `ulimit -v 67108864` and try running
a compiled program
Reviewers: simonmar, austin
Reviewed By: simonmar
Subscribers: thomie, RyanGlScott
Differential Revision: https://phabricator.haskell.org/D1405
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this was introduced in D524 as a compile-time constant.
Sadly, this isn't flexible enough to allow for environments where
ulimits restrict the maximum address space size (see, for instance,
Consequently, we are forced to make this dynamic. In principle this
shouldn't be so terrible as we can place both the beginning and end
addresses within the same cache line, likely incurring only one or so
additional instruction in HEAP_ALLOCED.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1353
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current OS memory allocator conflates the concepts of allocating
address space and allocating memory, which makes the HEAP_ALLOCED()
implementation excessively complicated (as the only thing it cares
about is address space layout) and slow. Instead, what we want
is to allocate a single insanely large contiguous block of address
space (to make HEAP_ALLOCED() checks fast), and then commit subportions
of that in 1MB blocks as we did before.
This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for
certain OSes.
Test Plan: validate
Reviewers: simonmar, ezyang, austin
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D524
GHC Trac Issues: #9706
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 and x64
Summary:
On Windows, the default action for things like division by zero and
segfaults is to pop up a Dr. Watson error reporting dialog if the exception
is unhandled by the user code.
This is a pain when we are SSHed into a Windows machine, or when we
want to debug a problem with gdb (gdb will get a first and second chance to
handle the exception, but if it doesn't the pop-up will show).
veh_excn provides two macros, `BEGIN_CATCH` and `END_CATCH`, which
will catch such exceptions in the entire process and die by
printing a message and calling `stg_exit(1)`.
Previously this code was handled using SEH (Structured Exception Handlers)
however each compiler and platform have different ways of dealing with SEH.
`MSVC` compilers have the keywords `__try`, `__catch` and `__except` to have the
compiler generate the appropriate SEH handler code for you.
`MinGW` compilers have no such keywords and require you to manually set the
SEH Handlers, however because SEH is implemented differently in x86 and x64
the methods to use them in GCC differs.
`x86`: SEH is based on the stack, the SEH handlers are available at `FS[0]`.
On startup one would only need to add a new handler there. This has
a number of issues such as hard to share handlers and it can be exploited.
`x64`: In order to fix the issues with the way SEH worked in x86, on x64 SEH handlers
are statically compiled and added to the .pdata section by the compiler.
Instead of being thread global they can now be Image global since you have to
specify the `RVA` of the region of code that the handlers govern.
You can on x64 Dynamically allocate SEH handlers, but it seems that (based on
experimentation and it's very under-documented) that the dynamic calls cannot override
static SEH handlers in the .pdata section.
Because of this and because GHC no longer needs to support < windows XP, the better
alternative for handling errors would be using the in XP introduced VEH.
The bonus is because VEH (Vectored Exception Handler) are a runtime construct the API
is the same for both x86 and x64 (note that the Context object does contain CPU specific
structures) and the calls are the same cross compilers. Which means this file can be
simplified quite a bit.
Using VEH also means we don't have to worry about the dynamic code generated by GHCi.
Test Plan:
Prior to this diff the tests for `derefnull` and `divbyzero` seem to have been disabled for windows.
To reproduce the issue on x64:
1) open ghci
2) import GHC.Base
3) run: 1 `divInt` 0
which should lead to ghci crashing an a watson error box displaying.
After applying the patch, run:
make TEST="derefnull divbyzero"
on both x64 and x86 builds of ghc to verify fix.
Reviewers: simonmar, austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D691
GHC Trac Issues: #6079
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
| |
GlobalMemoryStatusEx actually requires _WIN32_WINNT to be defined as
0x0501 (Windows XP) for availability.
For completeness, I bumped WIN32_WINNT in Ticker and OSThreads as well.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
lnat was originally "long unsigned int" but we were using it when we
wanted a 64-bit type on a 64-bit machine. This broke on Windows x64,
where long == int == 32 bits. Using types of unspecified size is bad,
but what we really wanted was a type with N bits on an N-bit machine.
StgWord is exactly that.
lnat was mentioned in some APIs that clients might be using
(e.g. StackOverflowHook()), so we leave it defined but with a comment
to say that it's deprecated.
|
|
|
|
| |
Stops outofmem segfaulting on Win64
|
|
|
|
|
|
| |
Mostly this meant getting pointer<->int conversions to use the right
sizes. lnat is now size_t, rather than unsigned long, as that seems a
better match for how it's used.
|
|
|
|
| |
Also added a few comments, and a load of code got indented 1 level deeper.
|
|
|
|
| |
as well as decommiting it.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also common-up some duplicate bits in the platform-specific code
|
| |
|
| |
|
|
See bug #738
Allocating executable memory is getting more difficult these days. In
particular, the default SELinux policy on Fedora Core 5 disallows
making the heap (i.e. malloc()'d memory) executable, although it does
apparently allow mmap()'ing anonymous executable memory by default.
Previously, stgMallocBytesRWX() used malloc() underneath, and then
tried to make the page holding the memory executable. This was rather
hacky and fails with Fedora Core 5.
This patch adds a mini-allocator for executable memory, based on the
block allocator. We grab page-sized blocks and make them executable,
then allocate small objects from the page. There's a simple free
function, that will free whole pages back to the system when they are
empty.
|