| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When we use nursery chunks with +RTS -n<size>, when the current nursery
runs out we have to check whether there's another chunk available with
getNewNursery(). There was one place we weren't doing this: the ad-hoc
heap check in scheduleProcessInbox().
The impact of the bug was that we would GC too early when using nursery
chunks, especially in programs that used messages (throwTo between
capabilities could do this, also hs_try_putmvar()).
Test Plan: validate, also local testing in our application
Reviewers: bgamari, niteria, austin, erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An additional stat is tracked per gc: par_balanced_copied This is the
the number of bytes copied by each gc thread under the balanced lmit,
which is simply (copied_bytes / num_gc_threads). The stat is added to
all the appropriate GC structures, so is visible in the eventlog and in
GHC.Stats.
A note is added explaining how work balance is computed.
Remove some end of line whitespace
Test Plan:
./validate
experiment with the program attached to the ticket
examine code changes carefully
Reviewers: simonmar, austin, hvr, bgamari, erikd
Reviewed By: simonmar
Subscribers: Phyx, rwbarton, thomie
GHC Trac Issues: #13830
Differential Revision: https://phabricator.haskell.org/D3658
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang defines '__clear_cache' slightly differently from gcc:
rts/sm/Storage.c:1349:13: error:
error: conflicting types for '__clear_cache'
|
1349 | extern void __clear_cache(char * begin, char * end);
| ^
extern void __clear_cache(char * begin, char * end);
^
note: '__clear_cache' is a builtin with type 'void (void *, void *)'
Reported by Moritz Angermann.
While at it used '__builtin___clear_cache' if advertised by clang.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
| |
This reverts commit 9492703a5862ee8623455209e50344cf8c4de077.
Incomplete patch (missing begin, end assignments).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang defines '__clear_cache' slightly differently from gcc:
rts/sm/Storage.c:1349:13: error:
error: conflicting types for '__clear_cache'
|
1349 | extern void __clear_cache(char * begin, char * end);
| ^
extern void __clear_cache(char * begin, char * end);
^
note: '__clear_cache' is a builtin with type 'void (void *, void *)'
Reported by Moritz Angermann.
While at it used '__builtin___clear_cache' if advertised by clang.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Strangely gcc 5.4 compiling on amd64 (nixos) complained about these.
Both warnings look correct, so I'm not sure why we haven't been seeing
these up until now.
Test Plan: Validate
Reviewers: simonmar, austin, erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3693
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed when was building UNREG ghc with -optc{-Wall,-Werror}:
rts/sm/Storage.c:1359:3: error:
error: implicit declaration of function '__clear_cache'
[-Werror=implicit-function-declaration]
__clear_cache((void*)begin, (void*)end);
^~~~~~~~~~~~~
|
1359 | __clear_cache((void*)begin, (void*)end);
| ^
Left direct '__clear_cache' usage gcc toolchain before 4.4.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
| |
This reverts commit d1d3e98443cf263ef09253e2478e3e638e174e0d.
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit 6dd1257fdd4d18e84d32e89bf0ec664b3c8f7b93.
Change fails vaildation:
rts/sm/Storage.c:1351:20: error:
error: ‘gcc_clear_cache’ defined but not used [-Werror=unused-function]
STATIC_INLINE void gcc_clear_cache(void * begin, void * end)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Noticed when was building UNREG ghc with -optc{-Wall,-Werror}:
rts/sm/Storage.c:1359:3: error:
error: implicit declaration of function '__clear_cache'
[-Werror=implicit-function-declaration]
__clear_cache((void*)begin, (void*)end);
^~~~~~~~~~~~~
|
1359 | __clear_cache((void*)begin, (void*)end);
| ^
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The problem occurred when
* Threads A & B evaluate the same thunk
* Thread A context-switches, so the thunk gets blackholed
* Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to
the blackhole and thread A's `tso->bq` queue
* Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE
* We GC, replacing A's update frame with stg_enter_checkbh
* Throw an exception in A, which ignores the stg_enter_checkbh frame
Now we have C blocked on A's tso->bq queue, but we forgot to check the
queue because the stg_enter_checkbh frame has been thrown away by the
exception.
The solution and alternative designs are discussed in Note [upd-black-hole].
This also exposed a bug in the interpreter, whereby we were sometimes
context-switching without calling `threadPaused()`. I've fixed this
and added some Notes.
Test Plan:
* `cd testsuite/tests/concurrent && make slow`
* validate
Reviewers: niteria, bgamari, austin, erikd
Reviewed By: erikd
Subscribers: rwbarton, thomie
GHC Trac Issues: #13751
Differential Revision: https://phabricator.haskell.org/D3630
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes gcc-7.1.0 warnings of form:
rts/sm/Scav.c:559:9: error:
error: this statement may fall through [-Werror=implicit-fallthrough=]
scavenge_fun_srt(info);
^~~~~~~~~~~~~~~~~~~~~~
Many of places are indeed unobvious and some are
already annotated by comments.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compiler/ghc.mk
@echo "#define $(HostArch_CPP)_HOST_ARCH 1" >> $@
@echo "#define $(TargetArch_CPP)_HOST_ARCH 1" >> $@
this leads to warnigns like:
> warning: 'x86_64_HOST_ARCH' is not defined, evaluates to 0 [-Wundef]
Reviewers: austin, bgamari, erikd, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3555
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were old module names:
* Data.Compact -> GHC.Compact
* Data.Compact.Internal -> GHC.Compact
This commit is for ghc-8.2 branch.
Test Plan: build
Reviewers: austin, bgamari, hvr, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3522
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin, erikd, simonmar
Reviewed By: erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
|
|
| |
This helps ensure that system includes on some more fragile
platforms (like e.g. AIX) see a more consistent set of CPP defines,
and consequently reduce the risk of conflicting typdefs/prototypes
being exposed.
|
|
|
|
|
|
|
|
| |
This is causing too much platform dependent breakage at the moment. We
will need a more rigorous testing strategy before this can be
merged again.
This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The C code in the RTS now gets built with `-Wundef` and the Haskell code
(stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever
`#if` is used on undefined identifiers.
Test Plan: Validate on Linux and Windows
Reviewers: austin, angerman, simonmar, bgamari, Phyx
Reviewed By: bgamari
Subscribers: thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3278
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we throw an exception for heap overflow, we should only print
the heap overflow message in the main thread when the HeapOverflow
exception is caught, rather than as a side effect in the GC.
Stack overflows were already done this way, I just made heap overflow
consistent with stack overflow, and did some related cleanup.
Fixes broken T2592(profasm) which was reporting the heap overflow
message twice (you would only notice when building with profiling
libs enabled).
Test Plan: validate
Reviewers: bgamari, niteria, austin, DemiMarie, hvr, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3394
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently I've used a different build system for building the
rts (Xcode). And in doing so, I looked through the rts/ghc.mk
to figure out how to build the rts.
In general it's quite straight forward to just compile all the
c files with the proper flags.
However there is one rather awkward copy step that copies some
files for special handling for the rts way.
I'm wondering if the proposed solution in this diff is better
or worse than the current situation?
The idea is to keep the files, but use #includes to produce
identical files with just an additional define. It does however
produce empty objects for non threaded ways.
Reviewers: ezyang, bgamari, austin, erikd, simonmar, rwbarton
Reviewed By: bgamari, simonmar, rwbarton
Subscribers: rwbarton, thomie, snowleopard
Differential Revision: https://phabricator.haskell.org/D3237
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes trac issue #13288.
Reviewers: austin, bgamari, erikd, simonmar
Reviewed By: simonmar
Subscribers: mutjida, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3143
|
|
|
|
| |
See comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Visible API changes:
* The C struct `GCDetails` gives the stats about a single GC. This is
passed to the `gcDone()` callback if one is set via the
RtsConfig. (previously we just passed a collection of values, so this
is more extensible, at the expense of breaking the existing API)
* `RTSStats` gives cumulative stats since the start of the program,
and includes the `GCDetails` for the most recent GC. This struct
can be obtained via `getRTSStats()` (the old `getGCStats()` has been
removed, and `getGCStatsEnabled()` has been renamed to
`getRTSStatsEnabled()`)
Improvements:
* The per-GC stats and cumulative stats are now cleanly separated.
* Inside the RTS we have a top-level `RTSStats` struct to keep all our
stats in, previously this was just a collection of strangely-named
variables. This struct is mostly just copied in `getRTSStats()`, so
the implementation of that function is a lot shorter.
* Types are more consistent. We use a uint64_t byte count for all
memory values, and Time for all time values.
* Names are more consistent. We use a suffix `_bytes` for all byte
counts and `_ns` for all time values.
* We now collect information about the amount of memory in large
objects and compact objects in `GCDetails`. (the latter was the reason
I started doing this patch but it seems to have ballooned a bit!)
* I fixed a bug in the calculation of the elapsed MUT time, and added
an ASSERT to stop the calculations going wrong in the future.
For now I kept the Haskell API in `GHC.Stats` the same, by
impedence-matching with the new API. We could either break that API
and make it match the C API more closely, or we could add a new API
and deprecate the old one. Opinions welcome.
This stuff is very easy to get wrong, and it's hard to test. Reviews
welcome!
Test Plan:
manual testing
validate
Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2756
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix issues preventing x86 GHC to build on Windows and
fix segfault in the testsuite.
Test Plan: ./validate
Reviewers: austin, erikd, simonmar, bgamari
Reviewed By: bgamari
Subscribers: #ghc_windows_task_force, thomie
Differential Revision: https://phabricator.haskell.org/D2789
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate
Reviewers: simonmar, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2764
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code passed an end pointer, but the interface takes a size
instead.
Fixes #12838.
Reviewers: austin, erikd, simonmar, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2711
GHC Trac Issues: #12838
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We currently have two info tables for a constructor
* XXX_con_info: the info table for a heap-resident instance of the
constructor, It has type CONSTR, or one of the specialised types like
CONSTR_1_0
* XXX_static_info: the info table for a static instance of this
constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF.
I'm getting rid of the latter, and using the `con_info` info table for
both static and dynamic constructors. For rationale and more details
see Note [static constructors] in SMRep.hs.
I also removed these macros: `isSTATIC()`, `ip_STATIC()`,
`closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC
distinction, and anyway HEAP_ALLOCED() does the same job.
Test Plan: validate
Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2690
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The problem boils down to global variables: in particular gc_threads[],
which was being modified by a subsequent GC before the previous GC had
finished with it. The fix is to not use global variables.
This was causing setnumcapabilities001 to fail (again!). It's an old
bug though.
Test Plan:
Ran setnumcapabilities001 in a loop for a couple of hours. Before this
patch it had been failing after a few minutes. Not a very scientific
test, but it's the best I have.
Reviewers: bgamari, austin, fryguybob, niteria, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2654
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nursery chunks help reduce the cost of GC when capabilities are unevenly
loaded, by ensuring that we use more of the available nursery.
The rationale for enabling this at -A16m is that any negative effects
due to loss of cache locality are less likely to be an issue at -A16m
and above. It's a conservative guess. If we had a lot of benchmark
data we could probably do better.
Results for nofib/parallel at -N4 -A32m with and without -n4m:
```
------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
------------------------------------------------------------------------
blackscholes 0.0% -9.5% -9.0% -15.0% -2.2%
coins 0.0% -4.7% -3.6% -0.6% -13.6%
mandel 0.0% -0.3% +7.7% +13.1% +0.1%
matmult 0.0% +1.5% +10.0% +7.7% +0.1%
nbody 0.0% -4.1% -2.9% 0.085 0.0%
parfib 0.0% -1.4% +1.0% +1.5% +0.2%
partree 0.0% -0.3% +0.8% +2.9% -0.8%
prsa 0.0% -0.5% -2.1% -7.6% 0.0%
queens 0.0% -3.2% -1.4% +2.2% +1.3%
ray 0.0% -5.6% -14.5% -7.6% +0.8%
sumeuler 0.0% -0.4% +2.4% +1.1% 0.0%
------------------------------------------------------------------------
Min 0.0% -9.5% -14.5% -15.0% -13.6%
Max 0.0% +1.5% +10.0% +13.1% +1.3%
Geometric Mean +0.0% -2.6% -1.3% -0.5% -1.4%
```
Not conclusive, but slightly better. This matters a lot more when you
have more cores.
Test Plan: validate, nofib/paralel
Reviewers: niteria, ezyang, nh2, trofi, austin, erikd, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2581
GHC Trac Issues: #9221
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We stumbled upon a case where an external library (OpenCL) does not work
if a specific address (0x200000000) is taken.
It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000:
```
void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE);
at = osTryReserveHeapMemory(*len, hint);
```
This makes it impossible to use Haskell programs compiled with GHC 8
with C functions that use OpenCL.
See this example https://github.com/chpatrick/oclwtf for a repro.
This patch allows the user to work around this kind of behavior outside
our control by letting the user override the starting address through an
RTS command line flag.
Reviewers: bgamari, Phyx, simonmar, erikd, austin
Reviewed By: Phyx, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D2513
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
malloc'd memory is not guaranteed to be zeroed. On Linux, however,
it is often zeroed, leading to latent bugs. In fact, with this
patch I fix two uninitialized memory bugs stemming from this.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Test Plan: validate
Reviewers: simonmar, austin, Phyx, bgamari, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2455
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch refactors GNU C version test (for 4.5 and more modern)
due to usage of __builtin_unreachable done in the CNF.c code directly
into the new RTS_UNREACHABLE macro placed into Rts.h
Reviewers: bgamari, austin, simonmar, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch fixes compilation failure on OpenBSD. The OpenBSD's
GNU C compiler is of 4.2.1 version and problematic __builtin_unreachable
was added in GNU C 4.5 release. Let's use pure abort() call
on OpenBSD instead of __builtin_unreachable
Reviewers: bgamari, austin, erikd, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2453
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Knowing the length of the run queue in O(1) time is useful: for example
we don't have to traverse the run queue to know how many threads we have
to migrate in schedulePushWork().
Test Plan: validate
Reviewers: ezyang, erikd, bgamari, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2437
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
@simonmar told me that it makes more sense this way.
Test Plan: it still builds
Reviewers: bgamari, austin, simonmar, erikd
Reviewed By: simonmar, erikd
Subscribers: thomie, simonmar
Differential Revision: https://phabricator.haskell.org/D2428
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The recent Compact Regions commit (cf989ffe49) builds fine on Linux
but doesn't build on OS X r Windows.
* rts/sm/CNF.c: Drop un-needed #includes.
* Fix parenthesis usage with CPP ASSERT macro.
* Fix format string in debugBelch messages.
* Use stg_max() instead hand rolled inline max() function.
Test Plan: Build on Linux, OS X and Windows
Reviewers: gcampax, simonmar, austin, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2421
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings in initial support for compact regions, as described in the
ICFP 2015 paper "Efficient Communication and Collection with Compact
Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
Campagna.
Some things may change before the 8.2 release, but I (Simon M.) wanted
to get the main patch committed so that we can iterate.
What documentation there is is in the Data.Compact module in the new
compact package. We'll need to extend and polish the documentation
before the release.
Test Plan:
validate
(new test cases included)
Reviewers: ezyang, simonmar, hvr, bgamari, austin
Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1264
GHC Trac Issues: #11493
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
| |
* Remove unused/old flags from the structs
* Update old comments
* Add missing flags to GHC.RTS
* Simplify GHC.RTS, remove C code and use hsc2hs instead
* Make ParFlags unconditional, and add support to GHC.RTS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|