| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
| |
This is causing too much platform dependent breakage at the moment. We
will need a more rigorous testing strategy before this can be
merged again.
This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We stumbled upon a case where an external library (OpenCL) does not work
if a specific address (0x200000000) is taken.
It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000:
```
void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE);
at = osTryReserveHeapMemory(*len, hint);
```
This makes it impossible to use Haskell programs compiled with GHC 8
with C functions that use OpenCL.
See this example ​https://github.com/chpatrick/oclwtf for a repro.
This patch allows the user to work around this kind of behavior outside
our control by letting the user override the starting address through an
RTS command line flag.
Reviewers: bgamari, Phyx, simonmar, erikd, austin
Reviewed By: Phyx, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2513
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a function takes a pointer parameter and doesn't update what
the pointer points to, we can add `const` to the parameter
declaration to document that no updates occur.
Test Plan: Validate on Linux, OS X and Windows
Reviewers: austin, Phyx, bgamari, simonmar, hsyl20
Reviewed By: bgamari, simonmar, hsyl20
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2200
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the two-step allocator the RTS asks the kernel for a large upfront
mmap'd region of memory (on the order of terabytes). While we have no
expectation that this entire region will be backed by physical memory,
this scheme nevertheless fails on some systems with resource limits.
Here we use a back-off scheme to reduce our allocation request until we
find a size agreeable to the kernel. Fixes #10877.
This also fixes a latent bug wherein the heap reservation retry logic
would fail to free the previously reserved address space, which would
likely result in a heap allocation failure.
Test Plan:
set address space limit with `ulimit -v 67108864` and try running
a compiled program
Reviewers: simonmar, austin
Reviewed By: simonmar
Subscribers: thomie, RyanGlScott
Differential Revision: https://phabricator.haskell.org/D1405
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this was introduced in D524 as a compile-time constant.
Sadly, this isn't flexible enough to allow for environments where
ulimits restrict the maximum address space size (see, for instance,
Consequently, we are forced to make this dynamic. In principle this
shouldn't be so terrible as we can place both the beginning and end
addresses within the same cache line, likely incurring only one or so
additional instruction in HEAP_ALLOCED.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1353
GHC Trac Issues: #10877
|
|
|
|
|
|
| |
C99 does not allow unnamed parameters in definition argument lists [1].
[1] http://stackoverflow.com/questions/8776810/parameter-name-omitted-c-vs-c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current OS memory allocator conflates the concepts of allocating
address space and allocating memory, which makes the HEAP_ALLOCED()
implementation excessively complicated (as the only thing it cares
about is address space layout) and slow. Instead, what we want
is to allocate a single insanely large contiguous block of address
space (to make HEAP_ALLOCED() checks fast), and then commit subportions
of that in 1MB blocks as we did before.
This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for
certain OSes.
Test Plan: validate
Reviewers: simonmar, ezyang, austin
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D524
GHC Trac Issues: #9706
|
|
|
|
| |
This reverts commit 39b5c1cbd8950755de400933cecca7b8deb4ffcd.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This will hopefully help ensure some basic consistency in the forward by
overriding buffer variables. In particular, it sets the wrap length, the
offset to 4, and turns off tabs.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
lnat was originally "long unsigned int" but we were using it when we
wanted a 64-bit type on a 64-bit machine. This broke on Windows x64,
where long == int == 32 bits. Using types of unspecified size is bad,
but what we really wanted was a type with N bits on an N-bit machine.
StgWord is exactly that.
lnat was mentioned in some APIs that clients might be using
(e.g. StackOverflowHook()), so we leave it defined but with a comment
to say that it's deprecated.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
|
|
|
|
|
|
|
| |
After much experimentation, I've found a formulation for HEAP_ALLOCED
that (a) improves performance, and (b) doesn't have any race
conditions when used concurrently. GC performance on x86_64 should be
improved slightly. See extensive comments in MBlock.h for the
details.
|
| |
|
| |
|
|
|
|
| |
Also common-up some duplicate bits in the platform-specific code
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think this fixes #1209
Previously:
outofmem.exe: getMBlocks: VirtualAlloc MEM_RESERVE 1025 blocks failed: Not enoug
h storage is available to process this command.
Now:
outofmem.exe: out of memory
|
|
|
|
|
|
|
|
|
|
| |
When debugging the GC and storage manager we often want repeated runs
of the program to allocate memory at the same addresses, so that we
can set watch points. Unfortunately the OS doesn't always give us
memory at predictable addresses. This flag gives the OS a hint as to
where we would like our memory allocated. Previously I did this by
changing the HEAP_BASE setting in MBlock.h and recompiling, this patch
just adds a flag so I don't have to recompile.
|
|
In preparation for parallel GC, split up the monolithic GC.c file into
smaller parts. Also in this patch (and difficult to separate,
unfortunatley):
- Don't include Stable.h in Rts.h, instead just include it where
necessary.
- consistently use STATIC_INLINE in source files, and INLINE_HEADER
in header files. STATIC_INLINE is now turned off when DEBUG is on,
to make debugging easier.
- The GC no longer takes the get_roots function as an argument.
We weren't making use of this generalisation.
|