| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
It was previously disabled because of:
- a confusion about "SRT inlining" (see removed comment in this commit)
- a linker bug (overflow) in the handling of ARM64_RELOC_SUBTRACTOR
relocation: fixed by a previous commit.
|
|
|
|
|
|
|
|
|
| |
- reordered the 3 SRT implementation cases from the most general to the
most specific one:
USE_SRT_POINTER -> USE_SRT_OFFSET -> USE_INLINE_SRT_FIELD
- added requirements for each
- found and documented a confusion about "SRT inlining" not supported
with MachO. (It is fixed in the following commit)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, `Backend` becomes an abstract type
(there are no more exposed value constructors).
Decisions that were formerly made by asking "is the
current back end equal to (or different from) this named value
constructor?" are now made by interrogating the back end about
its properties, which are functions exported by `GHC.Driver.Backend`.
There is a description of how to migrate code using `Backend` in the
user guide.
Clients using the GHC API can find a backdoor to access the Backend
datatype in GHC.Driver.Backend.Internal.
Bumps haddock submodule.
Fixes #20927
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds module `GHC.Cmm.Dominators`, which provides a wrapper
around two existing algorithms in GHC: the Lengauer-Tarjan dominator
analysis from the X86 back end and the reverse postorder ordering from
the Cmm Dataflow framework. Issue #20726 proposes that we evaluate
some alternatives for dominator analysis, but for the time being, the
best path forward is simply to use the existing analysis on
`CmmGraph`s.
This commit addresses a bullet in #21200.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LlvmConfig contains information read from llvm-passes and llvm-targets
files in GHC's top directory. Reading these files is done only when
needed (i.e. when the LLVM backend is used) and cached for the whole
compiler session. This patch changes the way this is done:
- Split LlvmConfig into LlvmConfig and LlvmConfigCache
- Store LlvmConfigCache in HscEnv instead of DynFlags: there is no
good reason to store it in DynFlags. As it is fixed per session, we
store it in the session state instead (HscEnv).
- Initializing LlvmConfigCache required some changes to driver functions
such as newHscEnv. I've used the opportunity to untangle initHscEnv
from initGhcMonad (in top-level GHC module) and to move it to
GHC.Driver.Main, close to newHscEnv.
- I've also made `cmmPipeline` independent of HscEnv in order to remove
the call to newHscEnv in regalloc_unit_tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating an SRT for a recursive group, GHC.Cmm.Info.Build.oneSRT
filters out recursive references, as described in Note [recursive SRTs].
However, doing so for static functions would be unsound, for the reason
described in Note [Invalid optimisation: shortcutting].
However, the same argument applies to static data constructor
applications, as we discovered in #20959. Fix this by ensuring that
static data constructor applications are included in recursive SRTs.
The approach here is not entirely satisfactory, but it is a starting
point.
Fixes #20959.
|
|
|
|
|
|
|
|
|
|
| |
Here we implement a few measures to improve the clarity of the CAF
analysis implementation. Specifically:
* Use CafInfo instead of Bool since the former is more descriptive
* Rename CAFLabel to CAFfyLabel, since not all CAFfyLabels are in fact
CAFs
* Add numerous comments
|
| |
|
| |
|
|
|
|
| |
See #21068.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC uses global initializers for a number of things including
cost-center registration, info-table provenance registration, and setup
of foreign exports. Previously, the global initializer arrays which
referenced these initializers would live in the object file of the C
stub, which would then be merged into the main object file of the
module.
Unfortunately, this approach is no longer tenable with the move to
Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does
not support object merging (that is, the -r flag). Instead we are now
rather packaging a module's object files into a static library. However,
this is problematic in the case of initializers as there are no
references to the C stub object in the archive, meaning that the linker
may drop the object from the final link.
This patch refactors our handling of global initializers to instead
place initializer arrays within the object file of the module to which
they belong. We do this by introducing a Cmm data declaration containing
the initializer array in the module's Cmm stream. While the initializer
functions themselves remain in separate C stub objects, the reference
from the module's object ensures that they are not dropped from the
final link.
In service of #21068.
|
|
|
|
| |
Fixes #20935 and #20924
|
|
|
|
|
|
| |
Remove GHC.Driver.Session imports that weren't considered as redundant
because of the reexport of PlatformConstants. Also remove this reexport
as modules using this datatype should import GHC.Platform instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a number of changes to ticky-ticky profiling.
When an executable is profiled with IPE profiling it's now possible to
associate id-related ticky counters to their source location.
This works by emitting the info table address as part of the counter
which can be looked up in the IPE table.
Add a `-ticky-ap-thunk` flag. This flag prevents the use of some standard thunks
which are precompiled into the RTS. This means reduced cache locality
and increased code size. But it allows better attribution of execution
cost to specific source locations instead of simple attributing it to
the standard thunk.
ticky-ticky now uses the `arg` field to emit additional information
about counters in json format. When ticky-ticky is used in combination
with the eventlog eventlog2html can be used to generate a html table
from the eventlog similar to the old text output for ticky-ticky.
|
|
|
|
|
|
|
|
| |
Remove these smart constructors for these reasons:
* mkLocalClosureTableLabel : Does the same as the non-local variant.
* mkLocalClosureLabel : Does the same as the non-local variant.
* mkLocalInfoTableLabel : Decide if we make a local label based on the name
and just use mkInfoTableLabel everywhere.
|
|
|
|
|
| |
It turns out this job hasn't been running for quite a while (perhaps
ever) so there are quite a few failures when running the linter locally.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
not specific to the tag analysis.
-------------------------
Metric Decrease:
T12545
T18698b
T18140
T18923
LargeRecord
Metric Increase:
LargeRecord
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13253-spj
T13379
T15164
T18282
T18304
T18698a
T1969
T20049
T3294
T4801
T5321FD
T5321Fun
T783
T9233
T9675
T9961
T19695
WWRec
-------------------------
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Compare expressions and types when comparing `CmmLoad`s.
Fixes #21016
|
| |
|
|
|
|
|
|
| |
This was achieved with
git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
StgToCmm: add Config, remove CgInfoDownwards
StgToCmm: runC api change to take StgToCmmConfig
StgToCmm: CgInfoDownad -> StgToCmmConfig
StgToCmm.Monad: update getters/setters/withers
StgToCmm: remove CallOpts in StgToCmm.Closure
StgToCmm: remove dynflag references
StgToCmm: PtrOpts removed
StgToCmm: add TMap to config, Prof - dynflags
StgToCmm: add omit yields to config
StgToCmm.ExtCode: remove redundant import
StgToCmm.Heap: remove references to dynflags
StgToCmm: codeGen api change, DynFlags -> Config
StgToCmm: remove dynflags in Env and StgToCmm
StgToCmm.DataCon: remove dynflags references
StgToCmm: remove dynflag references in DataCon
StgToCmm: add backend avx flags to config
StgToCmm.Prim: remove dynflag references
StgToCmm.Expr: remove dynflag references
StgToCmm.Bind: remove references to dynflags
StgToCmm: move DoAlignSanitisation to Cmm.Type
StgToCmm: remove PtrOpts in Cmm.Parser.y
DynFlags: update ipInitCode api
StgToCmm: Config Module is single source of truth
StgToCmm: Lazy config breaks IORef deadlock
testsuite: bump countdeps threshold
StgToCmm.Config: strictify fields except UpdFrame
Strictifying UpdFrameOffset causes the RTS build with stage1 to
deadlock. Additionally, before the deadlock performance of the RTS
is noticeably slower.
StgToCmm.Config: add field descriptions
StgToCmm: revert strictify on Module in config
testsuite: update CountDeps tests
StgToCmm: update comment, fix exports
Specifically update comment about loopification passed into dynflags
then stored into stgToCmmConfig. And remove getDynFlags from
Monad.hs exports
Types.Name: add pprFullName function
StgToCmm.Ticky: use pprFullname, fixup ExtCode imports
Cmm.Info: revert cmmGetClosureType removal
StgToCmm.Bind: use pprFullName, Config update comments
StgToCmm: update closureDescription api
StgToCmm: SAT altHeapCheck
StgToCmm: default render for Info table, ticky
Use default rendering contexts for info table and ticky ticky, which should be independent of command line input.
testsuite: bump count deps
pprFullName: flag for ticky vs normal style output
convertInfoProvMap: remove unused parameter
StgToCmm.Config: add backend flags to config
StgToCmm.Config: remove Backend from Config
StgToCmm.Prim: refactor Backend call sites
StgToCmm.Prim: remove redundant imports
StgToCmm.Config: refactor vec compatibility check
StgToCmm.Config: add allowQuotRem2 flag
StgToCmm.Ticky: print internal names with parens
StgToCmm.Bind: dispatch ppr based on externality
StgToCmm: Add pprTickyname, Fix ticky naming
Accidently removed the ctx for ticky SDoc output. The only relevant flag
is sdocPprDebug which was accidental set to False due to using
defaultSDocContext without altering the flag.
StgToCmm: remove stateful fields in config
fixup: config: remove redundant imports
StgToCmm: move Sequel type to its own module
StgToCmm: proliferate getCallMethod updated api
StgToCmm.Monad: add FCodeState to Monad Api
StgToCmm: add second reader monad to FCode
fixup: Prim.hs: missed a merge conflict
fixup: Match countDeps tests to HEAD
StgToCmm.Monad: withState -> withCgState
To disambiguate it from mtl withState. This withState shouldn't be
returning the new state as a value. However, fixing this means tackling
the knot tying in CgState and so is very difficult since it changes when
the thunk of the knot is forced which either leads to deadlock or to
compiler panic.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add files GHC.Cmm.Config, GHC.Driver.Config.Cmm
Cmm: DynFlag references --> CmmConfig
Cmm.Pipeline: reorder imports, add handshake
Cmm: DynFlag references --> CmmConfig
Cmm.Pipeline: DynFlag references --> CmmConfig
Cmm.LayoutStack: DynFlag references -> CmmConfig
Cmm.Info.Build: DynFlag references -> CmmConfig
Cmm.Config: use profile to retrieve platform
Cmm.CLabel: unpack NCGConfig in labelDynamic
Cmm.Config: reduce CmmConfig surface area
Cmm.Config: add cmmDoCmmSwitchPlans field
Cmm.Config: correct cmmDoCmmSwitchPlans flag
The original implementation dispatches work in cmmImplementSwitchPlans
in an `otherwise` branch, hence we must add a not to correctly dispatch
Cmm.Config: add cmmSplitProcPoints simplify Config
remove cmmBackend, and cmmPosInd
Cmm.CmmToAsm: move ncgLabelDynamic to CmmToAsm
Cmm.CLabel: remove cmmLabelDynamic function
Cmm.Config: rename cmmOptDoLinting -> cmmDoLinting
testsuite: update CountDepsAst CountDepsParser
|
|
|
|
|
|
|
| |
Here we introduce code generator support for instrument array primops
with bounds checking, enabled with the `-fcheck-prim-bounds` flag.
Introduced to debug #20769.
|
|
|
|
|
|
| |
See #20725.
The commit includes source-code changes and a test case.
|
|
|
|
| |
comparison (#20088)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we rework the handling of sub-word operations in the AArch64
backend, fixing a number of bugs and inconsistencies. In short,
we now impose the invariant that all subword values are represented in
registers in zero-extended form. Signed arithmetic operations are then
responsible for sign-extending as necessary.
Possible future work:
* Use `CMP`s extended register form to avoid burning an instruction
in sign-extending the second operand.
* Track sign-extension state of registers to elide redundant sign
extensions in blocks with frequent sub-word signed arithmetic.
|
|
|
|
|
| |
This is necessary for lint-correctness since we no longer allow such
shifts in Cmm.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously primops.txt.pp stipulated that the word-size shift primops
were only defined for shift offsets in [0, word_size). However, there
was no further guidance for the definition of Cmm's sub-word size shift
MachOps.
Here we fix this by explicitly disallowing (checked in many cases by
CmmLint) shift operations where the shift offset is larger than the
shiftee. This is consistent with LLVM's shift operations, avoiding the
miscompilation noted in #20637.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the constant-folding behavior for MO_S_Quot and MO_S_Rem
failed to narrow its arguments, meaning that a program like:
%zx64(%quot(%lobits8(0x00e1::bits16), 3::bits8))
would be miscompiled. Specifically, this program should reduce as
%lobits8(0x00e1::bits16) == -31
%quot(%lobits8(0x00e1::bits16), 3::bits8) == -10
%zx64(%quot(%lobits8(0x00e1::bits16), 3::bits8)) == 246
However, with this bug the `%lobits8(0x00e1::bits16)` would instead
be treated as `+31`, resulting in the incorrect result of `75`.
(cherry picked from commit 94e197e3dbb9a48991eb90a03b51ea13d39ba4cc)
|
|
|
|
|
|
| |
The goal here is to make the SRT selection logic a bit clearer and allow
configurations which we currently don't support (e.g. using a full word
in the info table even when TNTC is used).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
No-op assignments like R1 = R1 are not only wasteful. They can also
inhibit other optimizations like inlining assignments that read from
R1.
We now check for assignments being a no-op before and after we
simplify the RHS in Cmm sink which should eliminate most of these
no-ops.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit an Info Table Provenance Entry (IPE) for every stack represeted info table
if -finfo-table-map is turned on.
To decode a cloned stack, lookupIPE() is used. It provides a mapping between
info tables and their source location.
Please see these notes for details:
- [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)]
- [Mapping Info Tables to Source Positions]
Metric Increase:
T12545
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to make the packages in this repo "reinstallable", we need to
associate source code with a specific packages. Having a top level
`/includes` dir that mixes concerns (which packages' includes?) gets in
the way of this.
To start, I have moved everything to `rts/`, which is mostly correct.
There are a few things however that really don't belong in the rts (like
the generated constants haskell type, `CodeGen.Platform.h`). Those
needed to be manually adjusted.
Things of note:
- No symlinking for sake of windows, so we hard-link at configure time.
- `CodeGen.Platform.h` no longer as `.hs` extension (in addition to
being moved to `compiler/`) so as not to confuse anyone, since it is
next to Haskell files.
- Blanket `-Iincludes` is gone in both build systems, include paths now
more strictly respect per-package dependencies.
- `deriveConstants` has been taught to not require a `--target-os` flag
when generating the platform-agnostic Haskell type. Make takes
advantage of this, but Hadrian has yet to.
|
|
|
|
|
| |
On big-endian systems a narrow after a load cannot be replaced with
a narrow load.
|
|
|
|
|
|
|
|
| |
Previously the `MO_S_Quot` constant folding rule would incorrectly pass
the shift amount of the same width as the shifted value. However, the
machop's type expects the shift amount to be a Word.
Fixes #20142.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Word64#/Int64# are only used on 32-bit architectures. Before this patch,
operations on these types were directly using the FFI. Now we use real
primops that are then lowered into ccalls.
The advantage of doing this is that we can now perform constant folding on
Word64#/Int64# (#19024).
Most of this work was done by John Ericson in !3658. However this patch
doesn't go as far as e.g. changing Word64 to always be using Word64#.
Noticeable performance improvements
T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD
haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD
haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD
Metric Decrease:
T9203
haddock.Cabal
haddock.base
|
|
|
|
|
|
|
|
|
|
|
| |
This commit renames the `getErrorMessages` and
`getMessages` function in the parser code to `getPsErrorMessages` and
`getPsMessages`, to avoid import conflicts, as we have already
`getErrorMessages` and `getMessages` defined in `GHC.Types.Error`.
Fixes #19920.
Update haddock submodule
|
|
|
|
|
|
|
|
| |
Now that Outputable is independent of DynFlags, we can put tracing
functions using SDocs into their own module that doesn't transitively
depend on any GHC.Driver.* module.
A few modules needed to be moved to avoid loops in DEBUG mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce LogFlags as a independent subset of DynFlags used for logging.
As a consequence in many places we don't have to pass both Logger and
DynFlags anymore.
The main reason for this refactoring is that I want to refactor the
systools interfaces: for now many systools functions use DynFlags both
to use the Logger and to fetch their parameters (e.g. ldInputs for the
linker). I'm interested in refactoring the way they fetch their
parameters (i.e. use dedicated XxxOpts data types instead of DynFlags)
for #19877. But if I did this refactoring before refactoring the Logger,
we would have duplicate parameters (e.g. ldInputs from DynFlags and
linkerInputs from LinkerOpts). Hence this patch first.
Some flags don't really belong to LogFlags because they are subsystem
specific (e.g. most DumpFlags). For example -ddump-asm should better be
passed in NCGConfig somehow. This patch doesn't fix this tight coupling:
the dump flags are part of the UI but they are passed all the way down
for example to infer the file name for the dumps.
Because LogFlags are a subset of the DynFlags, we must update the former
when the latter changes (not so often). As a consequence we now use
accessors to read/write DynFlags in HscEnv instead of using `hsc_dflags`
directly.
In the process I've also made some subsystems less dependent on DynFlags:
- CmmToAsm: by passing some missing flags via NCGConfig (see new fields
in GHC.CmmToAsm.Config)
- Core.Opt.*:
- by passing -dinline-check value into UnfoldingOpts
- by fixing some Core passes interfaces (e.g. CallArity, FloatIn)
that took DynFlags argument for no good reason.
- as a side-effect GHC.Core.Opt.Pipeline.doCorePass is much less
convoluted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|