| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
| |
I need to do this now or when I move these files the linter will be mad.
|
|
|
|
|
| |
Metric Increase:
T11276
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to #19381 #19359 #14702
After a spike in memory usage we have been conservative about returning
allocated blocks to the OS in case we are still allocating a lot and would
end up just reallocating them. The result of this was that up to 4 * live_bytes
of blocks would be retained once they were allocated even if memory usage ended up
a lot lower.
For a heap of size ~1.5G, this would result in OS memory reporting 6G which is
both misleading and worrying for users.
In long-lived server applications this results in consistent high memory
usage when the live data size is much more reasonable (for example ghcide)
Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes
of blocks before gradually returning uneeded memory back to the OS on subsequent
major GCs which are NOT caused by a heap overflow.
Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs
counter and the amount of memory which is retained is inversely proportional to this number.
By default the excess memory retained is
oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor)
On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0
(as we could continue to allocate more, so retaining all the memory might make sense).
Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower.
Smaller values make it get returned faster. Setting `-Fd0` disables the
memory return completely, which is the behaviour of older GHC versions.
The default is `-Fd4` which results in the following scaling:
> mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]]
(1.0,0.8408964152537146)
(2.0,0.7071067811865475)
(3.0,0.5946035575013605)
(4.0,0.5)
(5.0,0.4204482076268573)
(6.0,0.35355339059327373)
(7.0,0.29730177875068026)
(8.0,0.25)
(9.0,0.21022410381342865)
(10.0,0.17677669529663687)
(11.0,0.14865088937534013)
(12.0,0.125)
(13.0,0.10511205190671433)
(14.0,8.838834764831843e-2)
(15.0,7.432544468767006e-2)
(16.0,6.25e-2)
(17.0,5.255602595335716e-2)
(18.0,4.4194173824159216e-2)
(19.0,3.716272234383503e-2)
(20.0,3.125e-2)
So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained.
Further to this decay factor, the amount of memory we attempt to retain is
also influenced by the GC strategy for the oldest generation. If we are using
a copying strategy then we will need at least 2 * live_bytes for copying to take
place, so we always keep that much. If using compacting or nonmoving then we need a lower number,
so we just retain at least `1.2 * live_bytes` for some protection.
In future we might want to make this behaviour more aggressive, some
relevant literature is
> Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106
which describes the "memory reducer" in the V8 javascript engine which
on an idle collection immediately returns as much memory as possible.
|
|
|
|
|
|
|
|
| |
This profiling mode creates bands by the address of the info table for
each closure. This provides a much more fine-grained profiling output
than any of the other profiling modes.
The `-hi` profiling mode does not require a profiling build.
|
|
|
|
|
|
|
|
|
|
| |
This patch exposes three new functions in `GHC.Profiling` which allow
heap profiling to be enabled and disabled dynamically.
1. startHeapProfTimer - Starts heap profiling with the given RTS options
2. stopHeapProfTimer - Stops heap profiling
3. requestHeapCensus - Perform a heap census on the next context
switch, regardless of whether the timer is enabled or not.
|
|
|
|
|
|
|
| |
It should be left to tooling to perform the filtering to remove these
specific closure types from the profile if desired.
Fixes #16795
|
|
|
|
|
|
|
| |
This introduces a flag, --eventlog-flush-interval, which can be used to
set an upper bound on the amount of time for which an eventlog event
will remain enqueued. This can be useful in real-time monitoring
settings.
|
|
|
|
|
|
|
|
| |
We currently only post the entry counters, not the other global
counters as in my experience the former are more useful. We use the heap
profiler's census period to decide when to dump.
Also spruces up the documentation surrounding ticky-ticky a bit.
|
|
|
|
|
|
|
|
| |
This addes the necessary logic to support aarch64 on elf, as well
as aarch64 on mach-o, which Apple calls arm64.
We change architecture name to AArch64, which is the official arm
naming scheme.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Sets `MiscFlags.disableDelayedOsMemoryReturn`.
See the added `Note [MADV_FREE and MADV_DONTNEED]` for details.
|
| |
|
| |
|
|
|
|
| |
This flag will enable the use of a non-moving oldest generation.
|
|
|
|
|
|
|
| |
Namely ensure that block descriptors are initialized with valid
generation numbers.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
| |
Zeros heap memory after gc freed it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This re-applies {D5195} with fixes for i386:
* Fix unused label warnings, see {D5230} or {D5273}
* Fix a silly bug introduced by moving `#if`
{P190}
Add a RTS option -xp to load PIC object anywhere in address space. We do
this by relaxing the requirement of <0x80000000 result of
`mmapForLinker` and implying USE_CONTIGUOUS_MMAP.
We also need to change calls to `ocInit` and `ocGetNames` to avoid
dangling pointers when the address of `oc->image` is changed by
`ocAllocateSymbolExtra`.
Test Plan:
See {D5195}, also test under i386:
```
$ uname -a
Linux watashi-arch32 4.18.5-arch1-1.0-ARCH #1 SMP PREEMPT Tue Aug 28
20:45:30 CEST 2018 i686 GNU/Linux
$ cd testsuite/tests/th/ && make test
...
```
will run `./validate` on stacked diff.
Reviewers: simonmar, bgamari, alpmestan, trommler, hvr, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5289
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces the `+RTS -ol` flag, which allows user to specify the
destination file for eventlog output.
Test Plan: Validate with included test
Reviewers: simonmar, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5293
|
|
|
|
| |
This reverts commit 5403a8636fe82f971234873564f3a05393b89b7a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a RTS option -xp to load PIC object anywhere in address space. We do
this by relaxing the requirement of <0x80000000 result of
`mmapForLinker` and implying USE_CONTIGUOUS_MMAP.
We also need to change calls to `ocInit` and `ocGetNames` to avoid
dangling pointers when the address of `oc->image` is changed by
`ocAllocateSymbolExtra`.
Test Plan:
```
$ uname -a
Linux localhost 4.18.8-arch1-1-ARCH #1 SMP PREEMPT Sat Sep 15 20:34:48
UTC 2018 x86_64 GNU/Linux
$ cat mk/build.mk
DYNAMIC_GHC_PROGRAMS = NO
DYNAMIC_BY_DEFAULT = NO
GhcRTSWays += thr_debug
EXTRA_HC_OPTS += -debug
WAY_p_HC_OPTS += -fPIC -fexternal-dynamic-refs
$ inplace/bin/ghc-stage2 --interactive -prof +RTS -xp
GHCi, version 8.7.20180928: http://www.haskell.org/ghc/ :? for help
ghc-stage2: R_X86_64_32 relocation out of range:
ghczmprim_GHCziTypes_ZMZN_closure = 7f690bffab59
Recompile
/data/users/watashi/ghc/libraries/ghc-prim/dist-install/build/HSghc-prim
-0.5.3.o with -fPIC -fexternal-dynamic-refs.
ghc-stage2: unable to load package `ghc-prim-0.5.3'
$ strace -f -e open,mmap inplace/bin/ghc-stage2 --interactive -prof
-fexternal-interpreter -opti+RTS -opti-xp
...
[pid 1355283]
open("/data/users/watashi/ghc/libraries/base/dist-install/build/libHSbas
e-4.12.0.0_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a84842000
[pid 1355283]
open("/data/users/watashi/ghc/libraries/base/dist-install/build/libHSbas
e-4.12.0.0_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a84676000
...
Prelude> System.Posix.Process.getProcessID
...
[pid 1355283]
open("/data/users/watashi/ghc/libraries/unix/dist-install/build/libHSuni
x-2.7.2.2_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 45056, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a67d60000
[pid 1355283]
open("/data/users/watashi/ghc/libraries/unix/dist-install/build/libHSuni
x-2.7.2.2_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 57344, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a67d52000
...
```
```
$ uname -a
Darwin watashis-iMac.local 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug
22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
$ mv
/Users/watashi/gao/ghc/libraries/integer-gmp/dist-install/build/HSintege
r-gmp-1.0.2.0.o{,._DISABLE_GHC_ISSUE_15105}
$ inplace/bin/ghc-stage2 --interactive +RTS -xp
GHCi, version 8.7.20181003: http://www.haskell.org/ghc/ :? for help
Prelude> System.Posix.Process.getProcessID
42791
Prelude> Data.Set.fromList [1 .. 10]
fromList [1,2,3,4,5,6,7,8,9,10]
Prelude>
Leaving GHCi.
$ inplace/bin/ghc-stage2 --interactive -prof -fexternal-interpreter
GHCi, version 8.7.20181003: http://www.haskell.org/ghc/ :? for help
Prelude> System.Posix.Process.getProcessID
42806
Prelude> Data.Set.fromList [1 .. 10]
fromList [1,2,3,4,5,6,7,8,9,10]
Prelude>
Leaving GHCi.
```
Also test with something that used to hit the 2Gb limit and it loads
and runs without problem.
Reviewers: simonmar, bgamari, angerman, Phyx, hvr, erikd
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing internal counters:
* gc_alloc_block_sync
* whitehole_spin
* gen[g].sync
* gen[1].sync
are now not shown in the -s report unless --internal-counters is also passed.
If --internal-counters is passed we now show the counters above, reformatted, as
well as several other counters. In particular, we now count the yieldThread()
calls that SpinLocks do as well as their spins.
The added counters are:
* gc_spin (spin and yield)
* mut_spin (spin and yield)
* whitehole_threadPaused (spin only)
* whitehole_executeMessage (spin only)
* whitehole_lockClosure (spin only)
* waitForGcThreadsd (spin and yield)
As well as the following, which are not SpinLock-like things:
* any_work
* do_work
* scav_find_work
See the Note for descriptions of what these counters are.
We add busy_wait_nops in these loops along with the counter increment where it
was absent.
Old internal counters output:
```
gc_alloc_block_sync: 0
whitehole_gc_spin: 0
gen[0].sync: 0
gen[1].sync: 0
```
New internal counters output:
```
Internal Counters:
Spins Yields
gc_alloc_block_sync 323 0
gc_spin 9016713 752
mut_spin 57360944 47716
whitehole_gc 0 n/a
whitehole_threadPaused 0 n/a
whitehole_executeMessage 0 n/a
whitehole_lockClosure 0 0
waitForGcThreads 2 415
gen[0].sync 6 0
gen[1].sync 1 0
any_work 2017
no_work 2014
scav_find_work 1004
```
Test Plan:
./validate
Check it builds with #define PROF_SPIN removed from includes/rts/Config.h
Reviewers: bgamari, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie, carter
GHC Trac Issues: #3553, #9221
Differential Revision: https://phabricator.haskell.org/D4302
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GC sync is the time between a GC being intiated and all the mutator
threads finally stopping so that the GC can start. Problems that cause
the GC sync to be delayed are hard to find and can cause dramatic
slowdowns for heavily parallel programs.
The new flag --long-gc-sync=<time> helps by emitting a warning and
calling a user-overridable hook when the GC sync time exceeds the
specified threshold. A debugger can be used to set a breakpoint when
this happens and inspect the stacks of threads to find the culprit.
Test Plan:
```
$ ./inplace/bin/ghc-stage2 +RTS --long-gc-sync=0.0000001 -S
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
1135856 51144 153736 0.000 0.000 0.002 0.002 0 0 (Gen: 0)
1034760 94704 188752 0.000 0.000 0.002 0.002 0 0 (Gen: 0)
1038888 134832 228888 0.009 0.009 0.011 0.011 0 0 (Gen: 1)
1025288 90128 235184 0.000 0.000 0.012 0.012 0 0 (Gen: 0)
1049088 130080 333984 0.000 0.000 0.013 0.013 0 0 (Gen: 0)
Warning: waited 0us for GC sync
1034424 73360 331976 0.000 0.000 0.013 0.013 0 0 (Gen: 0)
```
Also tested on a real production problem.
Reviewers: niteria, bgamari, erikd
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D4193
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds the ability to generate stack traces on crashes for Windows.
When running in the interpreter this attempts to use symbol information from
the interpreter and information we know about the loaded object files to
resolve addresses to symbols.
When running compiled it doesn't have this information and then defaults
to using symbol information from PDB files. Which for now means only
files compiled with ICC or MSVC will show traces compiled.
But I have a future patch that may address this shortcoming.
Also since I don't know how to walk a pure haskell stack, I can for now
only show the last entry. I'm hoping to figure out how Apply.cmm works to
be able to walk the stalk and give more entries for pure haskell code.
In GHCi
```
$ echo main | inplace/bin/ghc-stage2.exe --interactive ./testsuite/tests/rts/derefnull.hs
GHCi, version 8.3.20170830: http://www.haskell.org/ghc/ :? for help
Ok, 1 module loaded.
Prelude Main>
Access violation in generated code when reading 0x0
Attempting to reconstruct a stack trace...
Frame Code address
* 0x77cde10 0xc370229 E:\..\base\dist-install\build\HSbase-4.10.0.0.o+0x190031
(base_ForeignziStorable_zdfStorableInt4_info+0x3f)
```
and compiled
```
Access violation in generated code when reading 0x0
Attempting to reconstruct a stack trace...
Frame Code address
* 0xf0dbd0 0x40bb01 E:\..\rts\derefnull.run\derefnull.exe+0xbb01
```
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3913
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's often hard to debug things like segfaults on Windows,
mostly because gdb isn't always of use and users don't know
how to effectively use it.
This patch provides a way to create a crash drump by passing
`+RTS --generate-crash-dumps` as an option. If any unhandled
exception is triggered a dump is made that contains enough
information to be able to diagnose things successfully.
Currently the created dumps are a bit big because I include
all registers, code and threads information.
This looks like
```
$ testsuite/tests/rts/derefnull.run/derefnull.exe +RTS
--generate-crash-dumps
Access violation in generated code when reading 0000000000000000
Crash dump created. Dump written to:
E:\msys64\tmp\ghc-20170901-220250-11216-16628.dmp
```
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari, simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3912
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exception handling on Windows is unfortunately a bit complicated.
But essentially the VEH Handlers we currently have are running too
early.
This was a problem as it ran so early it also swallowed C++ exceptions
and other software exceptions which the system could have very well
recovered from.
So instead we use a sequence of chains to for the exception handlers to
run as late as possible. You really can't get any later than this.
Please read the comment in the patch for more details.
I'm also providing a switch to allow people to turn off the exception
handling entirely. In case it does present a problem with their code.
(Reverted and recommitted to fix authorship information)
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13911, #12110
Differential Revision: https://phabricator.haskell.org/D3911
|
|
|
|
|
|
| |
Reverting to fix authorship of commit.
This reverts commit 1825cbdbdf08ed4bd6fd6794852596078953298a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exception handling on Windows is unfortunately a bit complicated.
But essentially the VEH Handlers we currently have are running too
early.
This was a problem as it ran so early it also swallowed C++ exceptions
and other software exceptions which the system could have very well
recovered from.
So instead we use a sequence of chains to for the exception handlers to
run as late as possible. You really can't get any later than this.
Please read the comment in the patch for more details.
I'm also providing a switch to allow people to turn off the exception
handling entirely. In case it does present a problem with their code.
Test Plan: ./validate
Reviewers: austin, hvr, bgamari, erikd, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #13911, #12110
Differential Revision: https://phabricator.haskell.org/D3911
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sprintf is safe thanks to the guarantees on the format strings that
we pass to it. Well, almost. The GR_FILENAME_FMT_GUM format would not
have satisfied them if it was still used.
If someone makes a mistake that's a potential privilege escalation,
so I think it's reasonable to switch to snprintf to protect against
that remote possibility.
Test Plan: it builds, CI
Reviewers: simonmar, bgamari, austin, erikd
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3944
|
|
|
|
| |
Our new CPP linter enforces this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This both says what we mean and silences a bunch of spurious CPP linting
warnings. This pragma is supported by all CPP implementations which we
support.
Reviewers: austin, erikd, simonmar, hvr
Reviewed By: simonmar
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3482
|
|
|
|
|
| |
I evidently neglected to consider that validate doesn't build profiled
ways. Arg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a RTS option, -po, which allows the user to override the stem
used to form the output file names of the heap profile and cost center summary.
It's a bit unclear to me whether this is really the interface we want.
Alternatively we could just allow the user to specify the `.hp` and `.prof` file
names separately. This would arguably be a bit more straightforward and would
allow the user to name JSON output with an appropriate `.json` suffix if they so
desired. However, this would come at the cost of taking more of the option
space, which is a somewhat precious commodity.
Test Plan: Validate, try using `-po` RTS option
Reviewers: simonmar, austin, erikd
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D3182
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a JSON output format for cost-centre profiler reports.
It's not clear whether this is really something we want to introduce
given that we may also move to a more Haskell-driven output pipeline in
the future, but I nevertheless found this helpful, so I thought I would
put it up.
Test Plan: Compile a program with `-prof -fprof-auto`; run with `+RTS
-pj`
Reviewers: austin, erikd, simonmar
Reviewed By: simonmar
Subscribers: duncan, maoe, thomie, simonmar
Differential Revision: https://phabricator.haskell.org/D3132
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes heap overflow to throw a HeapOverflow exception instead of
killing the process.
Test Plan: GHC CI
Reviewers: simonmar, austin, hvr, erikd, bgamari
Reviewed By: simonmar, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2790
GHC Trac Issues: #1791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: Validate on lots of platforms
Reviewers: erikd, simonmar, austin
Reviewed By: erikd, simonmar
Subscribers: michalt, thomie
Differential Revision: https://phabricator.haskell.org/D2699
|
|
|
|
|
| |
- Move the numaMap and nNumaNodes out of RtsFlags to Capability.c
- Add a test to tests/rts
|
|
|
|
|
|
|
|
| |
* Remove unused/old flags from the structs
* Update old comments
* Add missing flags to GHC.RTS
* Simplify GHC.RTS, remove C code and use hsc2hs instead
* Make ParFlags unconditional, and add support to GHC.RTS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The aim here is to reduce the number of remote memory accesses on
systems with a NUMA memory architecture, typically multi-socket servers.
Linux provides a NUMA API for doing two things:
* Allocating memory local to a particular node
* Binding a thread to a particular node
When given the +RTS --numa flag, the runtime will
* Determine the number of NUMA nodes (N) by querying the OS
* Assign capabilities to nodes, so cap C is on node C%N
* Bind worker threads on a capability to the correct node
* Keep a separate free lists in the block layer for each node
* Allocate the nursery for a capability from node-local memory
* Allocate blocks in the GC from node-local memory
For example, using nofib/parallel/queens on a 24-core 2-socket machine:
```
$ ./Main 15 +RTS -N24 -s -A64m
Total time 173.960s ( 7.467s elapsed)
$ ./Main 15 +RTS -N24 -s -A64m --numa
Total time 150.836s ( 6.423s elapsed)
```
The biggest win here is expected to be allocating from node-local
memory, so that means programs using a large -A value (as here).
According to perf, on this program the number of remote memory accesses
were reduced by more than 50% by using `--numa`.
Test Plan:
* validate
* There's a new flag --debug-numa=<n> that pretends to do NUMA without
actually making the OS calls, which is useful for testing the code
on non-NUMA systems.
* TODO: I need to add some unit tests
Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2199
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `nat` type was an alias for `unsigned int` with a comment saying
it was at least 32 bits. We keep the typedef in case client code is
using it but mark it as deprecated.
Test Plan: Validated on Linux, OS X and Windows
Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20
Differential Revision: https://phabricator.haskell.org/D2166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
+RTS -AL<size> controls the total size of large objects that can be
allocated before a GC is triggered. Previously this was always just the
value of -A, and the limit mainly existed to prevent runaway allocation
in pathalogical programs that allocate a lot of large objects. However,
since the limit is shared between all cores, on a large multicore the
default becomes more restrictive, and can end up triggering GC well
before it would normally have been.
Arguably a better default would be A*N, but this is probably excessive.
Adding a flag lets you choose, and I've left the default as it was.
See docs for usage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the GC to use fewer threads than the number of capabilities.
At each GC, we choose some of the capabilities to be "idle", which means
that the thread running on that capability (if any) will sleep for the
duration of the GC, and the other threads will do its work. We choose
capabilities that are already idle (if any) to be the idle capabilities.
The idea is that this helps in the following situation:
* We want to use a large -N value so as to make use of hyperthreaded
cores
* We use a large heap size, so GC is infrequent
* But we don't want to use all -N threads in the GC, because that
thrashes the memory too much.
See docs for usage.
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1509
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This hasn't been used for a very long time and will soon be superceded
by perf_events support.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: austin, simonmar
Subscribers: thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1493
|