summaryrefslogtreecommitdiff
path: root/rts/sm
Commit message (Collapse)AuthorAgeFilesLines
* Fix a missing getNewNursery(), and related cleanupSimon Marlow2017-07-181-11/+11
| | | | | | | | | | | | | | | | | | | | Summary: When we use nursery chunks with +RTS -n<size>, when the current nursery runs out we have to check whether there's another chunk available with getNewNursery(). There was one place we weren't doing this: the ad-hoc heap check in scheduleProcessInbox(). The impact of the bug was that we would GC too early when using nursery chunks, especially in programs that used messages (throwTo between capabilities could do this, also hs_try_putmvar()). Test Plan: validate, also local testing in our application Reviewers: bgamari, niteria, austin, erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3749
* Fix Work Balance computation in RTS statsDouglas Wilson2017-07-111-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | An additional stat is tracked per gc: par_balanced_copied This is the the number of bytes copied by each gc thread under the balanced lmit, which is simply (copied_bytes / num_gc_threads). The stat is added to all the appropriate GC structures, so is visible in the eventlog and in GHC.Stats. A note is added explaining how work balance is computed. Remove some end of line whitespace Test Plan: ./validate experiment with the program attached to the ticket examine code changes carefully Reviewers: simonmar, austin, hvr, bgamari, erikd Reviewed By: simonmar Subscribers: Phyx, rwbarton, thomie GHC Trac Issues: #13830 Differential Revision: https://phabricator.haskell.org/D3658
* lowercase clangMoritz Angermann2017-07-061-2/+2
|
* rts/sm/Storage.c: tweak __clear_cache proto for clangSergei Trofimovich2017-07-051-2/+15
| | | | | | | | | | | | | | | | | | clang defines '__clear_cache' slightly differently from gcc: rts/sm/Storage.c:1349:13: error: error: conflicting types for '__clear_cache' | 1349 | extern void __clear_cache(char * begin, char * end); | ^ extern void __clear_cache(char * begin, char * end); ^ note: '__clear_cache' is a builtin with type 'void (void *, void *)' Reported by Moritz Angermann. While at it used '__builtin___clear_cache' if advertised by clang. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Revert "rts/sm/Storage.c: tweak __clear_cache proto for clang"Sergei Trofimovich2017-07-051-13/+2
| | | | | | This reverts commit 9492703a5862ee8623455209e50344cf8c4de077. Incomplete patch (missing begin, end assignments).
* rts/sm/Storage.c: tweak __clear_cache proto for clangSergei Trofimovich2017-07-051-2/+13
| | | | | | | | | | | | | | | | | | clang defines '__clear_cache' slightly differently from gcc: rts/sm/Storage.c:1349:13: error: error: conflicting types for '__clear_cache' | 1349 | extern void __clear_cache(char * begin, char * end); | ^ extern void __clear_cache(char * begin, char * end); ^ note: '__clear_cache' is a builtin with type 'void (void *, void *)' Reported by Moritz Angermann. While at it used '__builtin___clear_cache' if advertised by clang. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* rts: Fix uninitialised variable usesBen Gamari2017-07-031-1/+1
| | | | | | | | | | | | | | Strangely gcc 5.4 compiling on amd64 (nixos) complained about these. Both warnings look correct, so I'm not sure why we haven't been seeing these up until now. Test Plan: Validate Reviewers: simonmar, austin, erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3693
* UNREG: use __builtin___clear_cache where availableSergei Trofimovich2017-06-221-0/+16
| | | | | | | | | | | | | | | | | Noticed when was building UNREG ghc with -optc{-Wall,-Werror}: rts/sm/Storage.c:1359:3: error: error: implicit declaration of function '__clear_cache' [-Werror=implicit-function-declaration] __clear_cache((void*)begin, (void*)end); ^~~~~~~~~~~~~ | 1359 | __clear_cache((void*)begin, (void*)end); | ^ Left direct '__clear_cache' usage gcc toolchain before 4.4. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Revert "rts: Suppress unused gcc_clear_cache warning"Ben Gamari2017-06-211-2/+0
| | | | This reverts commit d1d3e98443cf263ef09253e2478e3e638e174e0d.
* rts: Suppress unused gcc_clear_cache warningBen Gamari2017-06-211-0/+2
|
* Revert "UNREG: use __builtin___clear_cache where available"Sergei Trofimovich2017-06-211-21/+1
| | | | | | | | | This reverts commit 6dd1257fdd4d18e84d32e89bf0ec664b3c8f7b93. Change fails vaildation: rts/sm/Storage.c:1351:20: error: error: ‘gcc_clear_cache’ defined but not used [-Werror=unused-function] STATIC_INLINE void gcc_clear_cache(void * begin, void * end)
* UNREG: use __builtin___clear_cache where availableSergei Trofimovich2017-06-211-1/+21
| | | | | | | | | | | | | | | Noticed when was building UNREG ghc with -optc{-Wall,-Werror}: rts/sm/Storage.c:1359:3: error: error: implicit declaration of function '__clear_cache' [-Werror=implicit-function-declaration] __clear_cache((void*)begin, (void*)end); ^~~~~~~~~~~~~ | 1359 | __clear_cache((void*)begin, (void*)end); | ^ Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Typos [ci skip]Gabor Greif2017-06-131-1/+1
|
* Fix a lost-wakeup bug in BLACKHOLE handling (#13751)Simon Marlow2017-06-083-24/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The problem occurred when * Threads A & B evaluate the same thunk * Thread A context-switches, so the thunk gets blackholed * Thread C enters the blackhole, creates a BLOCKING_QUEUE attached to the blackhole and thread A's `tso->bq` queue * Thread B updates the blackhole with a value, overwriting the BLOCKING_QUEUE * We GC, replacing A's update frame with stg_enter_checkbh * Throw an exception in A, which ignores the stg_enter_checkbh frame Now we have C blocked on A's tso->bq queue, but we forgot to check the queue because the stg_enter_checkbh frame has been thrown away by the exception. The solution and alternative designs are discussed in Note [upd-black-hole]. This also exposed a bug in the interpreter, whereby we were sometimes context-switching without calling `threadPaused()`. I've fixed this and added some Notes. Test Plan: * `cd testsuite/tests/concurrent && make slow` * validate Reviewers: niteria, bgamari, austin, erikd Reviewed By: erikd Subscribers: rwbarton, thomie GHC Trac Issues: #13751 Differential Revision: https://phabricator.haskell.org/D3630
* rts: Make compact debugging output depend upon compact debug flagBen Gamari2017-05-231-1/+1
|
* CNF: Silence pointer fix-up message unless gc debugging is enabledBen Gamari2017-05-201-2/+2
|
* rts: annotate switch/case with '/* fallthrough */'Sergei Trofimovich2017-05-144-0/+12
| | | | | | | | | | | | | | Fixes gcc-7.1.0 warnings of form: rts/sm/Scav.c:559:9: error: error: this statement may fall through [-Werror=implicit-fallthrough=] scavenge_fun_srt(info); ^~~~~~~~~~~~~~~~~~~~~~ Many of places are indeed unobvious and some are already annotated by comments. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* We define the `<XXX>_HOST_ARCH` to `1`, but never to `0`inMoritz Angermann2017-05-111-1/+1
| | | | | | | | | | | | | | | | | compiler/ghc.mk @echo "#define $(HostArch_CPP)_HOST_ARCH 1" >> $@ @echo "#define $(TargetArch_CPP)_HOST_ARCH 1" >> $@ this leads to warnigns like: > warning: 'x86_64_HOST_ARCH' is not defined, evaluates to 0 [-Wundef] Reviewers: austin, bgamari, erikd, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3555
* Fix comment for compact regionTakenobu Tani2017-05-041-2/+2
| | | | | | | | | | | | | | | | | | | There were old module names: * Data.Compact -> GHC.Compact * Data.Compact.Internal -> GHC.Compact This commit is for ghc-8.2 branch. Test Plan: build Reviewers: austin, bgamari, hvr, erikd, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3522
* Prefer #if defined to #ifdefBen Gamari2017-04-2823-68/+68
| | | | Our new CPP linter enforces this.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-284-4/+4
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* rts: Fix "ASSERT ("sBen Gamari2017-04-233-18/+18
| | | | | | | | | | Reviewers: austin, erikd, simonmar Reviewed By: erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3486
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-2317-70/+17
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* Minor reordering of `#include`s fixing compilation on AIXHerbert Valerio Riedel2017-04-231-1/+2
| | | | | | | This helps ensure that system includes on some more fragile platforms (like e.g. AIX) see a more consistent set of CPP defines, and consequently reduce the risk of conflicting typdefs/prototypes being exposed.
* Revert "Enable new warning for fragile/incorrect CPP #if usage"Ben Gamari2017-04-054-4/+4
| | | | | | | | This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-054-4/+4
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* Report heap overflow in the same way as stack overflowSimon Marlow2017-04-023-3/+14
| | | | | | | | | | | | | | | | | | | | | | | Now that we throw an exception for heap overflow, we should only print the heap overflow message in the main thread when the HeapOverflow exception is caught, rather than as a side effect in the GC. Stack overflows were already done this way, I just made heap overflow consistent with stack overflow, and did some related cleanup. Fixes broken T2592(profasm) which was reporting the heap overflow message twice (you would only notice when building with profiling libs enabled). Test Plan: validate Reviewers: bgamari, niteria, austin, DemiMarie, hvr, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3394
* Drop copy step from the rts/ghc.mkMoritz Angermann2017-02-282-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I've used a different build system for building the rts (Xcode). And in doing so, I looked through the rts/ghc.mk to figure out how to build the rts. In general it's quite straight forward to just compile all the c files with the proper flags. However there is one rather awkward copy step that copies some files for special handling for the rts way. I'm wondering if the proposed solution in this diff is better or worse than the current situation? The idea is to keep the files, but use #includes to produce identical files with just an additional define. It does however produce empty objects for non threaded ways. Reviewers: ezyang, bgamari, austin, erikd, simonmar, rwbarton Reviewed By: bgamari, simonmar, rwbarton Subscribers: rwbarton, thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3237
* rts: Correct the nursery size in the gen 1 growth computationJohn C. Carey2017-02-231-1/+13
| | | | | | | | | | | | Fixes trac issue #13288. Reviewers: austin, bgamari, erikd, simonmar Reviewed By: simonmar Subscribers: mutjida, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3143
* Fix crashes in hash table scanning with THREADED_RTSSimon Marlow2016-12-071-3/+19
| | | | See comments.
* Overhaul of Compact Regions (#12455)Simon Marlow2016-12-077-546/+478
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit makes various improvements and addresses some issues with Compact Regions (aka Compact Normal Forms). This was the most important thing I wanted to fix. Compaction previously prevented GC from running until it was complete, which would be a problem in a multicore setting. Now, we compact using a hand-written Cmm routine that can be interrupted at any point. When a GC is triggered during a sharing-enabled compaction, the GC has to traverse and update the hash table, so this hash table is now stored in the StgCompactNFData object. Previously, compaction consisted of a deepseq using the NFData class, followed by a traversal in C code to copy the data. This is now done in a single pass with hand-written Cmm (see rts/Compact.cmm). We no longer use the NFData instances, instead the Cmm routine evaluates components directly as it compacts. The new compaction is about 50% faster than the old one with no sharing, and a little faster on average with sharing (the cost of the hash table dominates when we're doing sharing). Static objects that don't (transitively) refer to any CAFs don't need to be copied into the compact region. In particular this means we often avoid copying Char values and small Int values, because these are static closures in the runtime. Each Compact# object can support a single compactAdd# operation at any given time, so the Data.Compact library now enforces mutual exclusion using an MVar stored in the Compact object. We now get exceptions rather than killing everything with a barf() when we encounter an object that cannot be compacted (a function, or a mutable object). We now also detect pinned objects, which can't be compacted either. The Data.Compact API has been refactored and cleaned up. A new compactSize operation returns the size (in bytes) of the compact object. Most of the documentation is in the Haddock docs for the compact library, which I've expanded and improved here. Various comments in the code have been improved, especially the main Note [Compact Normal Forms] in rts/sm/CNF.c. I've added a few tests, and expanded a few of the tests that were there. We now also run the tests with GHCi, and in a new test way that enables sanity checking (+RTS -DS). There's a benchmark in libraries/compact/tests/compact_bench.hs for measuring compaction speed and comparing sharing vs. no sharing. The field totalDataW in StgCompactNFData was unnecessary. Test Plan: * new unit tests * validate * tested manually that we can compact Data.Aeson data Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd Subscribers: thomie, simonpj Differential Revision: https://phabricator.haskell.org/D2751 GHC Trac Issues: #12455
* Overhaul GC statsSimon Marlow2016-12-063-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Visible API changes: * The C struct `GCDetails` gives the stats about a single GC. This is passed to the `gcDone()` callback if one is set via the RtsConfig. (previously we just passed a collection of values, so this is more extensible, at the expense of breaking the existing API) * `RTSStats` gives cumulative stats since the start of the program, and includes the `GCDetails` for the most recent GC. This struct can be obtained via `getRTSStats()` (the old `getGCStats()` has been removed, and `getGCStatsEnabled()` has been renamed to `getRTSStatsEnabled()`) Improvements: * The per-GC stats and cumulative stats are now cleanly separated. * Inside the RTS we have a top-level `RTSStats` struct to keep all our stats in, previously this was just a collection of strangely-named variables. This struct is mostly just copied in `getRTSStats()`, so the implementation of that function is a lot shorter. * Types are more consistent. We use a uint64_t byte count for all memory values, and Time for all time values. * Names are more consistent. We use a suffix `_bytes` for all byte counts and `_ns` for all time values. * We now collect information about the amount of memory in large objects and compact objects in `GCDetails`. (the latter was the reason I started doing this patch but it seems to have ballooned a bit!) * I fixed a bug in the calculation of the elapsed MUT time, and added an ASSERT to stop the calculations going wrong in the future. For now I kept the Haskell API in `GHC.Stats` the same, by impedence-matching with the new API. We could either break that API and make it match the C API more closely, or we could add a new API and deprecate the old one. Opinions welcome. This stuff is very easy to get wrong, and it's hard to test. Reviews welcome! Test Plan: manual testing validate Reviewers: bgamari, niteria, austin, ezyang, hvr, erikd, rwbarton, Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2756
* Fix x86 Windows build and testsuiteTamar Christina2016-12-061-1/+1
| | | | | | | | | | | | | | | | Summary: Fix issues preventing x86 GHC to build on Windows and fix segfault in the testsuite. Test Plan: ./validate Reviewers: austin, erikd, simonmar, bgamari Reviewed By: bgamari Subscribers: #ghc_windows_task_force, thomie Differential Revision: https://phabricator.haskell.org/D2789
* Use C99's boolBen Gamari2016-11-2916-246/+246
| | | | | | | | | | | | Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
* Fix type of GarbageCollect declarationBen Gamari2016-11-291-1/+1
| | | | | | | | | | Test Plan: Validate Reviewers: simonmar, austin, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2764
* Storage.c: Pass a size to sys_icache_invalidateShea Levy2016-11-151-2/+2
| | | | | | | | | | | | | | | | | The previous code passed an end pointer, but the interface takes a size instead. Fixes #12838. Reviewers: austin, erikd, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2711 GHC Trac Issues: #12838
* Remove CONSTR_STATICSimon Marlow2016-11-146-31/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We currently have two info tables for a constructor * XXX_con_info: the info table for a heap-resident instance of the constructor, It has type CONSTR, or one of the specialised types like CONSTR_1_0 * XXX_static_info: the info table for a static instance of this constructor, which has type CONSTR_STATIC or CONSTR_STATIC_NOCAF. I'm getting rid of the latter, and using the `con_info` info table for both static and dynamic constructors. For rationale and more details see Note [static constructors] in SMRep.hs. I also removed these macros: `isSTATIC()`, `ip_STATIC()`, `closure_STATIC()`, since they relied on the CONSTR/CONSTR_STATIC distinction, and anyway HEAP_ALLOCED() does the same job. Test Plan: validate Reviewers: bgamari, simonpj, austin, gcampax, hvr, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2690 GHC Trac Issues: #12455
* Fix a bug in parallel GC synchronisationSimon Marlow2016-10-293-28/+29
| | | | | | | | | | | | | | | | | | | | | Summary: The problem boils down to global variables: in particular gc_threads[], which was being modified by a subsequent GC before the previous GC had finished with it. The fix is to not use global variables. This was causing setnumcapabilities001 to fail (again!). It's an old bug though. Test Plan: Ran setnumcapabilities001 in a loop for a couple of hours. Before this patch it had been failing after a few minutes. Not a very scientific test, but it's the best I have. Reviewers: bgamari, austin, fryguybob, niteria, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2654
* Turn on -n4m with -A16m or greaterSimon Marlow2016-10-091-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nursery chunks help reduce the cost of GC when capabilities are unevenly loaded, by ensuring that we use more of the available nursery. The rationale for enabling this at -A16m is that any negative effects due to loss of cache locality are less likely to be an issue at -A16m and above. It's a conservative guess. If we had a lot of benchmark data we could probably do better. Results for nofib/parallel at -N4 -A32m with and without -n4m: ``` ------------------------------------------------------------------------ Program Size Allocs Runtime Elapsed TotalMem ------------------------------------------------------------------------ blackscholes 0.0% -9.5% -9.0% -15.0% -2.2% coins 0.0% -4.7% -3.6% -0.6% -13.6% mandel 0.0% -0.3% +7.7% +13.1% +0.1% matmult 0.0% +1.5% +10.0% +7.7% +0.1% nbody 0.0% -4.1% -2.9% 0.085 0.0% parfib 0.0% -1.4% +1.0% +1.5% +0.2% partree 0.0% -0.3% +0.8% +2.9% -0.8% prsa 0.0% -0.5% -2.1% -7.6% 0.0% queens 0.0% -3.2% -1.4% +2.2% +1.3% ray 0.0% -5.6% -14.5% -7.6% +0.8% sumeuler 0.0% -0.4% +2.4% +1.1% 0.0% ------------------------------------------------------------------------ Min 0.0% -9.5% -14.5% -15.0% -13.6% Max 0.0% +1.5% +10.0% +13.1% +1.3% Geometric Mean +0.0% -2.6% -1.3% -0.5% -1.4% ``` Not conclusive, but slightly better. This matters a lot more when you have more cores. Test Plan: validate, nofib/paralel Reviewers: niteria, ezyang, nh2, trofi, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2581 GHC Trac Issues: #9221
* Make start address of `osReserveHeapMemory` tunable via command line -xbFrancesco Mazzoli2016-09-092-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We stumbled upon a case where an external library (OpenCL) does not work if a specific address (0x200000000) is taken. It so happens that `osReserveHeapMemory` starts trying to mmap at 0x200000000: ``` void *hint = (void*)((W_)8 * (1 << 30) + attempt * BLOCK_SIZE); at = osTryReserveHeapMemory(*len, hint); ``` This makes it impossible to use Haskell programs compiled with GHC 8 with C functions that use OpenCL. See this example ​https://github.com/chpatrick/oclwtf for a repro. This patch allows the user to work around this kind of behavior outside our control by letting the user override the starting address through an RTS command line flag. Reviewers: bgamari, Phyx, simonmar, erikd, austin Reviewed By: Phyx, simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2513
* When in sanity mode, un-zero malloc'd memory; fix uninitialized memory bugs.Edward Z. Yang2016-08-151-0/+2
| | | | | | | | | | | | | | | | malloc'd memory is not guaranteed to be zeroed. On Linux, however, it is often zeroed, leading to latent bugs. In fact, with this patch I fix two uninitialized memory bugs stemming from this. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Test Plan: validate Reviewers: simonmar, austin, Phyx, bgamari, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2455
* refactor test for __builtin_unreachable into Rts.h macro RTS_UNREACHABLEKarel Gardas2016-08-151-4/+1
| | | | | | | | | | | | | Summary: This patch refactors GNU C version test (for 4.5 and more modern) due to usage of __builtin_unreachable done in the CNF.c code directly into the new RTS_UNREACHABLE macro placed into Rts.h Reviewers: bgamari, austin, simonmar, erikd Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2457
* fix compilation failure on OpenBSD with system supplied GNU C 4.2.1Karel Gardas2016-08-141-1/+4
| | | | | | | | | | | | | | Summary: This patch fixes compilation failure on OpenBSD. The OpenBSD's GNU C compiler is of 4.2.1 version and problematic __builtin_unreachable was added in GNU C 4.5 release. Let's use pure abort() call on OpenBSD instead of __builtin_unreachable Reviewers: bgamari, austin, erikd, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2453
* Track the lengths of the thread queuesSimon Marlow2016-08-031-2/+4
| | | | | | | | | | | | | | | Summary: Knowing the length of the run queue in O(1) time is useful: for example we don't have to traverse the run queue to know how many threads we have to migrate in schedulePushWork(). Test Plan: validate Reviewers: ezyang, erikd, bgamari, austin Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2437
* Move stat_startGCSyncBartosz Nitka2016-07-271-2/+0
| | | | | | | | | | | | | | @simonmar told me that it makes more sense this way. Test Plan: it still builds Reviewers: bgamari, austin, simonmar, erikd Reviewed By: simonmar, erikd Subscribers: thomie, simonmar Differential Revision: https://phabricator.haskell.org/D2428
* Fix the non-Linux buildErik de Castro Lopo2016-07-221-14/+5
| | | | | | | | | | | | | | | | | | | Summary: The recent Compact Regions commit (cf989ffe49) builds fine on Linux but doesn't build on OS X r Windows. * rts/sm/CNF.c: Drop un-needed #includes. * Fix parenthesis usage with CPP ASSERT macro. * Fix format string in debugBelch messages. * Use stg_max() instead hand rolled inline max() function. Test Plan: Build on Linux, OS X and Windows Reviewers: gcampax, simonmar, austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2421
* Compact RegionsGiovanni Campagna2016-07-209-12/+1653
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
* NUMA cleanupsSimon Marlow2016-06-173-16/+14
| | | | | - Move the numaMap and nNumaNodes out of RtsFlags to Capability.c - Add a test to tests/rts
* Rts flags cleanupSimon Marlow2016-06-102-5/+5
| | | | | | | | * Remove unused/old flags from the structs * Update old comments * Add missing flags to GHC.RTS * Simplify GHC.RTS, remove C code and use hsc2hs instead * Make ParFlags unconditional, and add support to GHC.RTS
* NUMA supportSimon Marlow2016-06-109-246/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199