| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds eight new primops that fuse a multiplication and an
addition or subtraction:
- `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#`
fmadd x y z is x * y + z, computed with a single rounding step.
This patch implements code generation for these primops in the following
backends:
- X86, AArch64 and PowerPC NCG,
- LLVM
- C
WASM uses the C implementation. The primops are unsupported in the
JavaScript backend.
The following constant folding rules are also provided:
- compute a * b + c when a, b, c are all literals,
- x * y + 0 ==> x * y,
- ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x.
NB: the constant folding rules incorrectly handle signed zero.
This is a known limitation with GHC's floating-point constant folding
rules (#21227), which we hope to resolve in the future.
|
|
|
|
|
|
| |
Also remove the MetaStmt constructor from LlvmStatement and places the annotations into the Store statement.
Includes “Implement a workaround for -no-asm-shortcutting bug“ (https://gitlab.haskell.org/ghc/ghc/-/commit/2fda9e0df886cc551e2cd6b9c2a384192bdc3045)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch tracks the type of Cmm global registers. This is needed
in order to lint uses of polymorphic registers, such as SIMD vector
registers that can be used both for floating-point and integer values.
This changes allows us to refactor VanillaReg to not store VGcPtr,
as that information is instead stored in the type of the usage of the
register.
Fixes #22297
|
|
|
|
|
|
|
| |
Previously I used LLVM's `unordered` ordering for the C11 `relaxed`
ordering. However, this is wrong and should rather use the LLVM
`monotonic` ordering.
Fixes #22640
|
| |
|
| |
|
|
|
|
| |
We actually only emit MO_S_MulMayOflo and never emit MO_U_MulMayOflo anywhere.
|
|
|
|
|
|
|
|
|
|
|
| |
* Rename pprCLabel to pprCLabelStyle, and use the name pprCLabel
for a function using CStyle (analogous to pprAsmLabel)
* Move LabelStyle to the CLabel module, it no longer needs to be in Outputable.
* Move calls to 'text' right next to literals, to make sure the text/str
rule is triggered.
* Remove FastString/String roundtrip in Tc.Deriv.Generate
* Introduce showSDocForUser', which abstracts over a pattern in
GHCi.UI
|
|
|
|
|
| |
This allows to avoid further partiality, e. g., map head . group is
replaced by map NE.head . NE.group, and there are less panic calls.
|
|
|
|
|
|
|
|
|
|
| |
• Delete some dead code, largely under `GHC.Utils`.
• Clean up a few definitions in `GHC.Utils.(Misc, Monad)`.
• Clean up `GHC.Types.SrcLoc`.
• Derive stock `Functor, Foldable, Traversable` for more types.
• Derive more instances for newtypes.
Bump haddock submodule.
|
|
|
|
|
|
|
| |
This fixes various typos and spelling mistakes
in the compiler.
Fixes #21891
|
|
|
|
|
|
|
| |
Change calls to renderWithContext with showSDocOneLine; it's more
efficient and explanatory.
Remove polyPatSig (unused)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the SDocContext used for code generation contained
information whether the labels should use Asm or C style.
However, at every individual call site, this is known statically.
This removes the parameter to 'PprCode' and replaces every 'pdoc'
used to print a label in code style with 'pprCLabel' or 'pprAsmLabel'.
The OutputableP instance is now used only for dumps.
The output of T15155 changes, it now uses the Asm style
(which is faithful to what actually happens).
|
|
|
|
|
|
|
|
|
| |
Our aliasification logic would previously turn builtin LLVM variables
into aliases, which apparently confuses LLVM. This manifested in
initializers failing to be emitted, resulting in many profiling failures
with the LLVM backend.
Fixes #22019.
|
|
|
|
| |
We don't actually emit rodata16 sections anywhere.
|
|
|
|
|
|
|
| |
Here we reorganize `GHC.Cmm` to eliminate the orphan `Outputable` and
`OutputableP` instances for the Cmm AST. This makes it significantly
easier to use the Cmm pretty-printers in tracing output without
incurring module import cycles.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, `Backend` becomes an abstract type
(there are no more exposed value constructors).
Decisions that were formerly made by asking "is the
current back end equal to (or different from) this named value
constructor?" are now made by interrogating the back end about
its properties, which are functions exported by `GHC.Driver.Backend`.
There is a description of how to migrate code using `Backend` in the
user guide.
Clients using the GHC API can find a backdoor to access the Backend
datatype in GHC.Driver.Backend.Internal.
Bumps haddock submodule.
Fixes #20927
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LlvmConfig contains information read from llvm-passes and llvm-targets
files in GHC's top directory. Reading these files is done only when
needed (i.e. when the LLVM backend is used) and cached for the whole
compiler session. This patch changes the way this is done:
- Split LlvmConfig into LlvmConfig and LlvmConfigCache
- Store LlvmConfigCache in HscEnv instead of DynFlags: there is no
good reason to store it in DynFlags. As it is fixed per session, we
store it in the session state instead (HscEnv).
- Initializing LlvmConfigCache required some changes to driver functions
such as newHscEnv. I've used the opportunity to untangle initHscEnv
from initGhcMonad (in top-level GHC module) and to move it to
GHC.Driver.Main, close to newHscEnv.
- I've also made `cmmPipeline` independent of HscEnv in order to remove
the call to newHscEnv in regalloc_unit_tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC uses global initializers for a number of things including
cost-center registration, info-table provenance registration, and setup
of foreign exports. Previously, the global initializer arrays which
referenced these initializers would live in the object file of the C
stub, which would then be merged into the main object file of the
module.
Unfortunately, this approach is no longer tenable with the move to
Clang/LLVM on Windows (see #21019). Specifically, lld's PE backend does
not support object merging (that is, the -r flag). Instead we are now
rather packaging a module's object files into a static library. However,
this is problematic in the case of initializers as there are no
references to the C stub object in the archive, meaning that the linker
may drop the object from the final link.
This patch refactors our handling of global initializers to instead
place initializer arrays within the object file of the module to which
they belong. We do this by introducing a Cmm data declaration containing
the initializer array in the module's Cmm stream. While the initializer
functions themselves remain in separate C stub objects, the reference
from the module's object ensures that they are not dropped from the
final link.
In service of #21068.
|
|
|
|
| |
Fixes #20935 and #20924
|
| |
|
|
|
|
|
| |
This allows us to produce valid code for indexWord8ArrayAs*# on
platforms that lack unaligned memory access.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This was achieved with
git ls-tree --name-only HEAD -r | xargs sed -i -e 's/note \[/Note \[/g'
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CmmToLlvm: renamce lcgPlatform -> llvmCgPlatform
CmmToLlvm: rename lcgContext -> llvmCgContext
CmmToLlvm: rename lcgFillUndefWithGarbage
CmmToLlvm: rename lcgSplitSections
CmmToLlvm: lcgBmiVersion -> llvmCgBmiVersion
CmmToLlvm: lcgLlvmVersion -> llvmCgLlvmVersion
CmmToLlvm: lcgDoWarn -> llvmCgDoWarn
CmmToLlvm: lcgLlvmConfig -> llvmCgLlvmConfig
CmmToLlvm: llvmCgPlatformMisc --> llvmCgLlvmTarget
|
| |
|
|
|
|
|
|
|
| |
That is remove factorization of common strings and string building
code for the LLVM code gen ops. Replace these with string literals
to obey the FastString rewrite rule in GHC.Data.FastString and compute
the string length at compile time
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeOutput: LCGConfig, add handshake initLCGConfig
Add two modules:
GHC.CmmToLlvm.Config -- to hold the Llvm code gen config
GHC.Driver.Config.CmmToLlvm -- for initialization, other utils
CmmToLlvm: remove HasDynFlags, add LlvmConfig
CmmToLlvm: add lcgContext to LCGConfig
CmmToLlvm.Base: DynFlags --> LCGConfig
Llvm: absorb LlvmOpts into LCGConfig
CmmToLlvm.Ppr: swap DynFlags --> LCGConfig
CmmToLlvm.CodeGen: swap DynFlags --> LCGConfig
CmmToLlvm.CodeGen: swap DynFlags --> LCGConfig
CmmToLlvm.Data: swap LlvmOpts --> LCGConfig
CmmToLlvm: swap DynFlags --> LCGConfig
CmmToLlvm: move LlvmVersion to CmmToLlvm.Config
Additionally:
- refactor Config and initConfig to hold LlvmVersion
- push IO needed to get LlvmVersion to boundary between Cmm and LLvm
code generation
- remove redundant imports, this is much cleaner!
CmmToLlvm.Config: store platformMisc_llvmTarget
instead of all of platformMisc
|
|
|
|
|
|
|
|
|
| |
We should strive to make our includes in terms of the RTS as much as
possible. One place there that is not possible, the llvm version, we
make a new tiny header
Stage numbers are somewhat arbitrary, if we simple need a newer RTS, we
should say so.
|
|
|
|
|
|
| |
NCG needs to call slow FFI functions where we "borrow" the C compiler's
implementation, but there is no reason why we need to do that for LLVM,
or the unregisterized backend where everything is via C anyways!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Word64#/Int64# are only used on 32-bit architectures. Before this patch,
operations on these types were directly using the FFI. Now we use real
primops that are then lowered into ccalls.
The advantage of doing this is that we can now perform constant folding on
Word64#/Int64# (#19024).
Most of this work was done by John Ericson in !3658. However this patch
doesn't go as far as e.g. changing Word64 to always be using Word64#.
Noticeable performance improvements
T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD
haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD
haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD
Metric Decrease:
T9203
haddock.Cabal
haddock.base
|
|
|
|
|
|
|
| |
bound.
We use a non-inclusive upper bound so that setting the upper bound to 13 for
example means that all 12.x versions are accepted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce LogFlags as a independent subset of DynFlags used for logging.
As a consequence in many places we don't have to pass both Logger and
DynFlags anymore.
The main reason for this refactoring is that I want to refactor the
systools interfaces: for now many systools functions use DynFlags both
to use the Logger and to fetch their parameters (e.g. ldInputs for the
linker). I'm interested in refactoring the way they fetch their
parameters (i.e. use dedicated XxxOpts data types instead of DynFlags)
for #19877. But if I did this refactoring before refactoring the Logger,
we would have duplicate parameters (e.g. ldInputs from DynFlags and
linkerInputs from LinkerOpts). Hence this patch first.
Some flags don't really belong to LogFlags because they are subsystem
specific (e.g. most DumpFlags). For example -ddump-asm should better be
passed in NCGConfig somehow. This patch doesn't fix this tight coupling:
the dump flags are part of the UI but they are passed all the way down
for example to infer the file name for the dumps.
Because LogFlags are a subset of the DynFlags, we must update the former
when the latter changes (not so often). As a consequence we now use
accessors to read/write DynFlags in HscEnv instead of using `hsc_dflags`
directly.
In the process I've also made some subsystems less dependent on DynFlags:
- CmmToAsm: by passing some missing flags via NCGConfig (see new fields
in GHC.CmmToAsm.Config)
- Core.Opt.*:
- by passing -dinline-check value into UnfoldingOpts
- by fixing some Core passes interfaces (e.g. CallArity, FloatIn)
that took DynFlags argument for no good reason.
- as a side-effect GHC.Core.Opt.Pipeline.doCorePass is much less
convoluted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppose a safe call: myCall(x,y,z)
It is lowered into three unsafe calls in Cmm:
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
Consider the following situation for myCall arguments:
x = Sp[..] -- stack
y = Hp[..] -- heap
z = R1 -- global register
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
The sink pass assumes that unsafe calls clobber memory (heap and stack),
hence x and y assignments are not sunk after `suspendThread`. The sink
pass also correctly handles global register clobbering for all unsafe
calls, except `suspendThread`!
`suspendThread` is special because it releases the capability the thread
is running on. Hence the sink pass must also take into account global
registers that are mapped into memory (in the capability).
In the example above, we could get:
r = suspendThread(...);
z = R1
myCall(x,y,z);
resumeThread(r);
But this transformation isn't valid if R1 is (BaseReg->rR1) as BaseReg
is invalid between suspendThread and resumeThread. This caused argument
corruption at least with the C backend ("unregisterised") in #19237.
Fix #19237
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Replace uses of WARN macro with calls to:
warnPprTrace :: Bool -> SDoc -> a -> a
Remove the now unused HsVersions.h
Bump haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason to use CPP. __LINE__ and __FILE__ macros are now
better replaced with GHC's CallStack. As a bonus, assert error messages
now contain more information (function name, column).
Here is the mapping table (HasCallStack omitted):
* ASSERT: assert :: Bool -> a -> a
* MASSERT: massert :: Bool -> m ()
* ASSERTM: assertM :: m Bool -> m ()
* ASSERT2: assertPpr :: Bool -> SDoc -> a -> a
* MASSERT2: massertPpr :: Bool -> SDoc -> m ()
* ASSERTM2: assertPprM :: m Bool -> SDoc -> m ()
|
| |
|
|
|
|
|
| |
This requires adding another rewrite to the mangler, to avoid generating
PLT entries.
|
|
|
|
|
|
|
| |
Previously we would support only one LLVM major version. Here we
generalize this to accept a range, taking this range to be LLVM 10 to 11,
as 11 is necessary for Apple M1 support. We also accept 12, as that is
what apple ships with BigSur on the M1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'Stream' is implemented in the "yoneda" style for efficiency. By
representing a stream in this manner 'fmap' and '>>=' operations are
accumulated in the function parameters before being applied once when
the stream is destroyed. In the old implementation each usage of 'mapM'
and '>>=' would traverse the entire stream in order to apply the
substitution at the leaves. It is well-known for free monads that this
representation can improve performance, and the test results
demonstrate this for GHC as well.
The operation mapAccumL is not used in the compiler and can't be
implemented efficiently because it requires destroying and rebuilding
the stream.
I removed one use of mapAccumL_ which has similar problems but the other
use was difficult to remove. In the future it may be worth exploring
whether the 'Stream' encoding could be modified further to capture the
mapAccumL pattern, and likewise defer the passing of accumulation
parameter until the stream is finally consumed.
The >>= operation for 'Stream' was a hot-spot in the ticky profile for
the "ManyConstructors" test which called the 'cg' function many times in
"StgToCmm.hs"
Metric Decrease:
ManyConstructors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the only way to override GHC's default logging
behavior was to set `log_action`, `dump_action` and `trace_action`
fields in DynFlags. This patch introduces a new Logger abstraction and
stores it in HscEnv instead.
This is part of #17957 (avoid storing state in DynFlags). DynFlags are
duplicated and updated per-module (because of OPTIONS_GHC pragma), so
we shouldn't store global state in them.
This patch also fixes a race in parallel "--make" mode which updated
the `generatedDumps` IORef concurrently.
Bump haddock submodule
The increase in MultilayerModules is tracked in #19293.
Metric Increase:
MultiLayerModules
|
|
|
|
|
| |
Ensure that shift amount parameter has the same type as the parameter to
shift.
|
|
|
|
|
|
|
|
| |
For some architectures the C calling convention is that any integer
shorter than 64 bits is replaced by its 64 bits representation using
sign or zero extension.
Fixes #19023.
|
|
|
|
|
|
| |
Otherwise `opt` fails with:
error: use of undefined value '@memcmp$def'
|
| |
|