| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it was unclear whether req_shared_libs should require:
* that the platform supports dynamic library loading,
* that GHC supports dynamic linking of Haskell code, or
* that the dyn way libraries were built
Clarify by splitting the predicate into two:
* `req_dynamic_lib_support` demands that the platform support dynamic
linking
* `req_dynamic_hs` demands that the GHC support dynamic linking of
Haskell code on the target platform
Naturally `req_dynamic_hs` cannot be true unless
`req_dynamic_lib_support` is also true.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
| |
|
|
| |
Otherwise we are dependent upon the C++ compiler's default language.
|
| | |
|
| | |
|
| |
|
|
| |
ipeMap.c failed to #include <string.h>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit an Info Table Provenance Entry (IPE) for every stack represeted info table
if -finfo-table-map is turned on.
To decode a cloned stack, lookupIPE() is used. It provides a mapping between
info tables and their source location.
Please see these notes for details:
- [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)]
- [Mapping Info Tables to Source Positions]
Metric Increase:
T12545
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack).
The cloning interface consists of two functions, that clone either the treads
own stack (cloneMyStack) or another threads stack (cloneThreadStack).
The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is
useful for analyses as it prevents concurrent modifications.
For technical details, please see Note [Stack Cloning].
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
|
| |
|
|
| |
Closes #20377
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using a hash map reduces the complexity of lookupIPE(), making it non linear.
On registration each IPE list is added to a temporary IPE lists buffer, reducing
registration time. The hash map is built lazily on first lookup.
IPE event output to stderr is added with tests.
For details, please see
Note [The Info Table Provenance Entry (IPE) Map].
A performance test for IPE registration and lookup can be found here:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5724#note_370806
|
| |
|
|
| |
Ensures that Rts.h can be parsed as C++.
|
| |
|
|
| |
Fixes #20006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|
| |
|
|
| |
(cherry picked from commit f7062e1b0c91e8aa78e245a3dab9571206fce16d)
|
| |
|
|
| |
(cherry picked from commit 3592d1104c47b006fd9f4127d93916f477a6e010)
|
| | |
|
| |
|
|
|
|
|
| |
In commit f3c23939 T18623 is disabled for aarch64.
The limit seems to be too low for powerpc64le, too. This could be
because tables next to code is not supported and our code generator
produces larger code on PowerPC.
|
| |
|
|
| |
This reverts commit 0cbdba2768d84a0f6832ae5cf9ea1e98efd739da.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loader state was stored into HscEnv. As we need to have two
interpreters and one loader state per interpreter in #14335, it's
natural to make the loader state a field of the Interp type.
As a side effect, many functions now only require a Interp parameter
instead of HscEnv. Sadly we can't fully free GHC.Linker.Loader of HscEnv
yet because the loader is initialised lazily from the HscEnv the first
time it is used. This is left as future work.
HscEnv may not contain an Interp value (i.e. hsc_interp :: Maybe Interp).
So a side effect of the previous side effect is that callers of the
modified functions now have to provide an Interp. It is satisfying as it
pushes upstream the handling of the case where HscEnv doesn't contain an
Interpreter. It is better than raising a panic (less partial functions,
"parse, don't validate", etc.).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a set of forward ports (cherry-picks) from 8.10
- a7d22795ed [ci] Add support for building on aarch64-darwin
- 5109e87e13 [testlib/driver] denoise
- 307d34945b [ci] default value for CONFIGURE_ARGS
- 10a18cb4e0 [testsuite] mark ghci056 as fragile
- 16c13d5acf [ci] Default value for MAKE_ARGS
- ab571457b9 [ci/build] Copy config.sub around
- 251892b98f [ci/darwin] bump nixpkgs rev
- 5a6c36ecb4 [testsuite/darwin] fix conc059
- aae95ef0c9 [ci] add timing info
- 3592d1104c [Aarch64] No div-by-zero; disable test.
- 57671071ad [Darwin] mark stdc++ tests as broken
- 33c4d49754 [testsuite] filter out superfluous dylib warnings
- 4bea83afec [ci/nix-shell] Add Foundation and Security
- 6345530062 [testsuite/json2] Fix failure with LLVM backends
- c3944bc89d [ci/nix-shell] [Darwin] Stop the ld warnings about libiconv.
- b821fcc714 [testsuite] static001 is not broken anymore.
- f7062e1b0c [testsuite/arm64] fix section_alignment
- 820b076698 [darwin] stop the DYLD_LIBRARY_PATH madness
- 07b1af0362 [ci/nix-shell] uniquify NIX_LDFLAGS{_FOR_TARGET}
As well as a few additional fixups needed to make this block compile:
- Fixup all.T
- Set CROSS_TARGET, BROKEN_TESTS, XZ, RUNTEST_ARGS, default value.
- [ci] shell.nix bump happy
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to #19381 #19359 #14702
After a spike in memory usage we have been conservative about returning
allocated blocks to the OS in case we are still allocating a lot and would
end up just reallocating them. The result of this was that up to 4 * live_bytes
of blocks would be retained once they were allocated even if memory usage ended up
a lot lower.
For a heap of size ~1.5G, this would result in OS memory reporting 6G which is
both misleading and worrying for users.
In long-lived server applications this results in consistent high memory
usage when the live data size is much more reasonable (for example ghcide)
Therefore we have a new (2021) strategy which starts by retaining up to 4 * live_bytes
of blocks before gradually returning uneeded memory back to the OS on subsequent
major GCs which are NOT caused by a heap overflow.
Each major GC which is NOT caused by heap overflow increases the consec_idle_gcs
counter and the amount of memory which is retained is inversely proportional to this number.
By default the excess memory retained is
oldGenFactor (controlled by -F) / 2 ^ (consec_idle_gcs * returnDecayFactor)
On a major GC caused by a heap overflow, the `consec_idle_gcs` variable is reset to 0
(as we could continue to allocate more, so retaining all the memory might make sense).
Therefore setting bigger values for `-Fd` makes the rate at which memory is returned slower.
Smaller values make it get returned faster. Setting `-Fd0` disables the
memory return completely, which is the behaviour of older GHC versions.
The default is `-Fd4` which results in the following scaling:
> mapM print [(x, 1/ (2**(x / 4))) | x <- [1 :: Double ..20]]
(1.0,0.8408964152537146)
(2.0,0.7071067811865475)
(3.0,0.5946035575013605)
(4.0,0.5)
(5.0,0.4204482076268573)
(6.0,0.35355339059327373)
(7.0,0.29730177875068026)
(8.0,0.25)
(9.0,0.21022410381342865)
(10.0,0.17677669529663687)
(11.0,0.14865088937534013)
(12.0,0.125)
(13.0,0.10511205190671433)
(14.0,8.838834764831843e-2)
(15.0,7.432544468767006e-2)
(16.0,6.25e-2)
(17.0,5.255602595335716e-2)
(18.0,4.4194173824159216e-2)
(19.0,3.716272234383503e-2)
(20.0,3.125e-2)
So after 13 consecutive GCs only 0.1 of the maximum memory used will be retained.
Further to this decay factor, the amount of memory we attempt to retain is
also influenced by the GC strategy for the oldest generation. If we are using
a copying strategy then we will need at least 2 * live_bytes for copying to take
place, so we always keep that much. If using compacting or nonmoving then we need a lower number,
so we just retain at least `1.2 * live_bytes` for some protection.
In future we might want to make this behaviour more aggressive, some
relevant literature is
> Ulan Degenbaev, Jochen Eisinger, Manfred Ernst, Ross McIlroy, and Hannes Payer. 2016. Idle time garbage collection scheduling. SIGPLAN Not. 51, 6 (June 2016), 570–583. DOI:https://doi.org/10.1145/2980983.2908106
which describes the "memory reducer" in the V8 javascript engine which
on an idle collection immediately returns as much memory as possible.
|
| |
|
|
|
|
|
|
|
|
|
| |
The way in which allocatePinned took blocks out of the nursery was
leading to horrible fragmentation in some workloads.
The strategy now is that a separate free block list is reserved for each
capability and blocks are taken from there. When it's empty the global
SM lock is taken and a fresh block of size PINNED_EMPTY_SIZE is allocated.
Fixes #19481
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first change makes the array ones use the proper fixed-size types,
which also means that just like before, they can be used without
explicit conversions with the boxed sized types. (Before, it was Int# /
Word# on both sides, now it is fixed sized on both sides).
For the second change, don't use "extend" or "narrow" in some of the
user-facing primops names for conversions.
- Names like `narrowInt32#` are misleading when `Int` is 32-bits.
- Names like `extendInt64#` are flat-out wrong when `Int is
32-bits.
- `narrow{Int,Word}<N>#` however map a type to itself, and so don't
suffer from this problem. They are left as-is.
These changes are batched together because Alex happend to use the array
ops. We can only use released versions of Alex at this time, sadly, and
I don't want to have to have a release thatwon't work for the final GHC
9.2. So by combining these we get all the changes for Alex done at once.
Bump hackage state in a few places, and also make that workflow slightly
easier for the future.
Bump minimum Alex version
Bump Cabal, array, bytestring, containers, text, and binary submodules
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the eventlog infrastructure had a couple of races that could
pop up when using the startEventLog/endEventLog interfaces. In
particular, stopping and then later restarting logging could result in
data preceding the eventlog header, breaking the integrity of the
stream.
To fix this we rework the invariants regarding the eventlog and
generally tighten up the concurrency control surrounding starting and
stopping of logging.
We also fix an unrelated bug, wherein log events from disabled
capabilities could end up never flushed.
|
| |
|
|
|
|
|
| |
It should be left to tooling to perform the filtering to remove these
specific closure types from the profile if desired.
Fixes #16795
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously lookupSymbol_PEi386 would call lookupSymbol while
holding linker_mutex. Fix this by rather
calling `lookupDependentSymbol`. This is safe
because lookupSymbol_PEi386 unconditionally holds linker_mutex.
Happily, this un-breaks `T12771`, `T13082_good`, and `T14611`, which
were previously marked as broken due to #18718.
Closes #19155.
|
| |
|
|
|
|
|
|
|
|
|
| |
This gives a small increase in performance under most circumstances.
For single threaded GC the improvement is on the order of 1-2%.
For multi threaded GC the results are quite noisy but seem to
fall into the same ballpark.
Fixes #16499
|
| |
|
| |
Due to #18548.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The unit database cache, the home unit and the unit state were stored in
DynFlags while they ought to be stored in the compiler session state
(HscEnv). This patch fixes this.
It introduces a new UnitEnv type that should be used in the future to
handle separate unit environments (especially host vs target units).
Related to #17957
Bump haddock submodule
|
| |
|
|
| |
Due to #18953.
|
| |
|
|
|
|
|
|
| |
These are used to find the current roots of the garbage collector.
Co-authored-by: Sven Tennie's avatarSven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering's avatarMatthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: default avatarBen Gamari <bgamari.foss@gmail.com>
|
| |
|
|
|
| |
This uses the highMemDynamic flag introduced earlier to verify that
dynamic objects are properly unloaded.
|
| |
|
|
|
|
|
| |
Fixes #16525 by tracking dependencies between object file symbols and
marking symbol liveness during garbage collection
See Note [Object unloading] in CheckUnload.c for details.
|
| |\ |
|
| | |
| |
| |
| | |
ThreadSanitizer changes the output of these tests.
|
| | |
| |
| |
| |
| |
| |
| | |
Move linker related code into GHC.Linker. Previously it was scattered
into GHC.Unit.State, GHC.Driver.Pipeline, GHC.Runtime.Linker, etc.
Add documentation in GHC.Linker
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
The `rts_pause` and `rts_resume` functions have been added to `RtsAPI.h` and
allow an external process to completely pause and resume the RTS.
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
| |/
|
|
|
|
|
|
|
|
|
| |
We do not support foreign "C" imports of varargs functions. While this
works on amd64, in general the platform's calling convention may need
more type information that our Cmm representation can currently provide.
For instance, this is the case with Darwin's AArch64 calling convention.
Document this fact in the users guide and fix T5423 which makes use of a
disallowed foreign import.
Closes #18854.
|
| |
|
|
|
|
| |
_all_ of it, leaving nothing for, e.g., thread stacks.
Fix will only allocate 2/3rds and check whether remainder is at least large
enough for minimum amount of thread stacks.
|
| |
|
|
| |
See #18718.
|
| |
|
|
| |
The error originates from osCommitMemory rather than getMBlocks.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In b592bd98ff25730bbe3c13d6f62a427df8c78e28 we started using
-dead_strip_dylib on macOS when lining dynamic libraries and binaries.
The underlying reason being the Load Command Size Limit in macOS
Sierra (10.14) and later.
GHC will produce @rpath/libHS... dependency entries together with a
corresponding RPATH entry pointing to the location of the libHS...
library. Thus for every library we produce two Load Commands. One to
specify the dependent library, and one with the path where to find it.
This makes relocating libraries and binaries easier, as we just need to
update the RPATH entry with the install_name_tool. The dynamic linker
will then subsitute each @rpath with the RPATH entries it finds in the
libraries load commands or the environement, when looking up @rpath
relative libraries.
-dead_strip_dylibs intructs the linker to drop unused libraries. This in
turn help us reduce the number of referenced libraries, and subsequently
the size of the load commands. This however does not remove the RPATH
entries. Subsequently we can end up (in extreme cases) with only a
single @rpath/libHS... entry, but 100s or more RPATH entries in the Load
Commands.
This patch rectifies this (slighly unorthodox) by passing *no* -rpath
arguments to the linker at link time, but -headerpad 8000. The
headerpad argument is in hexadecimal and the maxium 32k of the load
command size. This tells the linker to pad the load command section
enough for us to inject the RPATHs later. We then proceed to link the
library or binary with -dead_strip_dylibs, and *after* the linking
inspect the library to find the left over (non-dead-stripped)
dependencies (using otool). We find the corresponding RPATHs for each
@rpath relative dependency, and inject them into the library or binary
using the install_name_tool. Thus achieving a deadstripped dylib (and
rpaths) build product.
We can not do this in GHC, without starting to reimplement a dynamic
linker as we do not know which symbols and subsequently libraries are
necessary.
Commissioned-by: Mercury Technologies, Inc. (mercury.com)
|
| |
|
|
|
|
|
|
|
| |
They both have the same role and Backend name is more explicit.
Metric Decrease:
T3064
Update Haddock submodule
|
| |
|
|
| |
and atexit functions #7072
|