summaryrefslogtreecommitdiff
path: root/compiler/GHC/CmmToAsm
Commit message (Collapse)AuthorAgeFilesLines
* Add fused multiply-add instructionssheaf2023-05-1110-18/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds eight new primops that fuse a multiplication and an addition or subtraction: - `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#` fmadd x y z is x * y + z, computed with a single rounding step. This patch implements code generation for these primops in the following backends: - X86, AArch64 and PowerPC NCG, - LLVM - C WASM uses the C implementation. The primops are unsupported in the JavaScript backend. The following constant folding rules are also provided: - compute a * b + c when a, b, c are all literals, - x * y + 0 ==> x * y, - ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x. NB: the constant folding rules incorrectly handle signed zero. This is a known limitation with GHC's floating-point constant folding rules (#21227), which we hope to resolve in the future.
* Adjust AArch64 stackFrameHeaderSizeSven Tennie2023-05-091-7/+6
| | | | | The prologue of each stack frame are the saved LR and FP registers, 8 byte each. I.e. the size of the stack frame header is 2 * 8 byte.
* Misc cleanupKrzysztof Gogolewski2023-04-171-1/+1
| | | | | | - Use dedicated list functions - Make cloneBndrs and cloneRecIdBndrs monadic - Fix invalid haddock comments in libraries/base
* compiler: apply cmm node-splitting for wasm backendCheng Shao2023-04-111-0/+2
| | | | | This patch applies cmm node-splitting for wasm32 NCG, which is required when handling irreducible CFGs. Fixes #23237.
* compiler: make WasmCodeGenM an instance of MonadUniqueCheng Shao2023-04-112-6/+14
|
* driver: Unit State Data.Map -> GHC.Unique.UniqMapdoyougnu2023-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pursuit of #22426. The driver and unit state are major contributors. This commit also bumps the haddock submodule to reflect the API changes in UniqMap. ------------------------- Metric Decrease: MultiComponentModules MultiComponentModulesRecomp T10421 T10547 T12150 T12234 T12425 T13035 T16875 T18140 T18304 T18698a T18698b T18923 T20049 T5837 T6048 T9198 -------------------------
* nativeGen/AArch64: Fix bitmask immediate predicateBen Gamari2023-03-241-15/+35
| | | | | | | | | Previously the predicate for determining whether a logical instruction operand could be encoded as a bitmask immediate was far too conservative. This meant that, e.g., pointer untagged required five instructions whereas it should only require one. Fixes #23030.
* ncg/aarch64: Handle MULTILINE_COMMENT identically as COMMENTsZubin Duggal2023-03-021-5/+7
| | | | | | | | Commit 7566fd9de38c67360c090f828923d41587af519c with the fix for #22798 was incomplete as it failed to handle MULTILINE_COMMENT pseudo-instructions, and didn't completly fix the compiler panics when compiling with `-fregs-graph`. Fixes #23002
* compiler: fix cost centre profiling breakage in wasm NCG due to incorrect ↵Cheng Shao2023-02-201-2/+1
| | | | | | | | | | | | | | | | | register mapping The wasm NCG used to map CCCS to a wasm global, based on the observation that CCCS is a transient register that's already handled by thread state load/store logic, so it doesn't need to be backed by the rCCCS field in the register table. Unfortunately, this is wrong, since even when Cmm execution hasn't yielded back to the scheduler, the Cmm code may call enterFunCCS, which does use rCCCS. This breaks cost centre profiling in a subtle way, resulting in inaccurate stack traces in some test cases. The fix is simple though: just remove the CCCS mapping.
* nativeGen/AArch64: Emit Atomic{Read,Write} inlineBen Gamari2023-02-143-2/+37
| | | | | | | | | Previously the AtomicRead and AtomicWrite operations were emitted as out-of-line calls. However, these tend to be very important for performance, especially the RELAXED case (which only exists for ThreadSanitizer checking). Fixes #22115.
* nativeGen/AArch64: Fix graph-colouring allocatorBen Gamari2023-01-311-1/+10
| | | | | | | | Previously various `Instr` queries used by the graph-colouring allocator failed to handle a few pseudo-instructions. This manifested in compiler panicks while compiling `SHA`, which uses `-fregs-graph`. Fixes #22798.
* nativeGen: Teach graph-colouring allocator that x18 is unusableBen Gamari2023-01-311-4/+2
| | | | | | | | Previously trivColourable for AArch64 claimed that at 18 registers were trivially-colourable. This is incorrect as x18 is reserved by the platform on AArch64/Darwin. See #22798.
* nativeGen/AArch64: Fix debugging outputBen Gamari2023-01-311-10/+68
| | | | | | | Previously various panics would rely on a half-written Show instance, leading to very unhelpful errors. Fix this. See #22798.
* Cmm: track the type of global registerssheaf2023-01-317-83/+82
| | | | | | | | | | | | This patch tracks the type of Cmm global registers. This is needed in order to lint uses of polymorphic registers, such as SIMD vector registers that can be used both for floating-point and integer values. This changes allows us to refactor VanillaReg to not store VGcPtr, as that information is instead stored in the type of the usage of the register. Fixes #22297
* compiler: fix data section alignment in the wasm NCGCheng Shao2023-01-301-10/+10
| | | | | | | | | | | | | Previously we tried to lower the alignment requirement as far as possible, based on the section kind inferred from the CLabel. For info tables, .p2align 1 was applied given the GC should only need the lowest bit to tag forwarding pointers. But this would lead to unaligned loads/stores, which has a performance penalty even if the wasm spec permits it. Furthermore, the test suite has shown memory corruption in a few cases when compacting gc is used. This patch takes a more conservative approach: all data sections except C strings align to word size.
* compiler: properly handle ForeignHints in the wasm NCGCheng Shao2023-01-281-13/+52
| | | | | Properly handle ForeignHints of ccall arguments/return value, insert sign extends and truncations when handling signed subwords. Fixes #22852.
* Assorted changes to avoid Data.List.{head,tail}Bodigrim2023-01-281-1/+1
|
* compiler: fix lowering of CmmBlock in the wasm NCGCheng Shao2023-01-281-0/+2
| | | | | | | The CmmBlock datacon was not handled in lower_CmmLit, since I thought it would have been eliminated after proc-point splitting. Turns out it still occurs in very rare occasions, and this patch is needed to fix T9329 for wasm.
* compiler: fix subword literal narrowing logic in the wasm NCGCheng Shao2023-01-284-20/+15
| | | | | | This patch fixes the W8/W16 literal narrowing logic in the wasm NCG, which used to lower it to something like i32.const -1, without properly zeroing-out the unused higher bits. Fixes #22608.
* compiler: fix handling of MO_F_Neg in wasm NCGCheng Shao2023-01-253-4/+29
| | | | | | | | In the wasm NCG, we used to compile MO_F_Neg to 0.0-x. It was an oversight, there actually exists f32.neg/f64.neg opcodes in the wasm spec and those should be used instead! The old behavior almost works, expect when GHC compiles the -0.0 literal, which will incorrectly become 0.0.
* nativeGen/X86: MFENCE is unnecessary for release semanticsBen Gamari2023-01-181-1/+1
| | | | | | | | | | | In #22764 a user noticed that a program implementing a simple atomic counter via an STRef regressed significantly due to the introduction of necessary atomic operations in the MutVar# primops (#22468). This regression was caused by a bug in the NCG, which emitted an unnecessary MFENCE instruction for a release-ordered atomic write. MFENCE is rather only needed to achieve sequentially consistent ordering. Fixes #22764.
* Misc cleanupKrzysztof Gogolewski2023-01-111-2/+1
| | | | | | | | - Remove unused mkWildEvBinder - Use typeTypeOrConstraint - more symmetric and asserts that that the type is Type or Constraint - Fix escape sequences in Python; they raise a deprecation warning with -Wdefault
* Revert "NCG(x86): Compile add+shift as lea if possible."Matthew Pickering2023-01-091-36/+0
| | | | | | This reverts commit 20457d775885d6c3df020d204da9a7acfb3c2e5a. See #22666 and #21777
* compiler: add optional tail-call support in wasm NCGCheng Shao2022-12-163-20/+79
| | | | | | | | | When the `-mtail-call` clang flag is passed at configure time, wasm tail-call extension is enabled, and the wasm NCG will emit `return_call`/`return_call_indirect` instructions to take advantage of it and avoid the `StgRun` trampoline overhead. Closes #22461.
* compiler: change fallback function signature to Cmm function signature in ↵Cheng Shao2022-12-161-2/+4
| | | | | | | | | | wasm NCG In the wasm NCG, when handling a `CLabel` of undefined function without knowing its function signature, we used to fallback to `() -> ()` which is accepted by `wasm-ld`. This patch changes it to the signature of Cmm functions, which equally works, but would be required when we emit tail call instructions.
* compiler: add missing export list of GHC.CmmToAsm.Wasm.FromCmmCheng Shao2022-12-161-63/+7
| | | | Also removes some unreachable code here.
* compiler: remove obsolete commented code in wasm NCGCheng Shao2022-12-161-1/+0
| | | | | It was just a temporary hack to workaround a bug in the relooper, that bug has been fixed long before the wasm backend is merged.
* Codegen/x86: Eliminate barrier for relaxed accessesBen Gamari2022-12-151-7/+12
|
* cmm: Introduce MemoryOrderingsBen Gamari2022-12-154-10/+10
|
* Add initial support for LoongArch Architecture.lrzlin2022-12-084-0/+10
|
* compiler: remove unused MO_U_MulMayOfloCheng Shao2022-11-282-5/+0
| | | | We actually only emit MO_S_MulMayOflo and never emit MO_U_MulMayOflo anywhere.
* compiler: generate ccalls for clz/ctz/popcnt in wasm NCGCheng Shao2022-11-283-10/+21
| | | | | | We used to generate a single wasm clz/ctz/popcnt opcode, but it's wrong when it comes to subwords, so might as well generate ccalls for them. See #22470 for details.
* Move hs_mulIntMayOflo cbits to ghc-primCheng Shao2022-11-281-11/+6
| | | | | | It's only used by wasm NCG at the moment, but ghc-prim is a more reasonable place for hosting out-of-line primops. Also, we only need a single version of hs_mulIntMayOflo.
* PPC NCG: Fix generating assembler codePeter Trommler2022-11-191-6/+4
| | | | Fixes #22479
* Misc cleanupKrzysztof Gogolewski2022-11-165-10/+9
| | | | | | | * Replace catMaybes . map f with mapMaybe f * Use concatFS to concatenate multiple FastStrings * Fix documentation of -exclude-module * Cleanup getIgnoreCount in GHCi.UI
* Use a more efficient printer for code generation (#21853)Krzysztof Gogolewski2022-11-1117-464/+572
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes in `GHC.Utils.Outputable` are the bulk of the patch and drive the rest. The types `HLine` and `HDoc` in Outputable can be used instead of `SDoc` and support printing directly to a handle with `bPutHDoc`. See Note [SDoc versus HDoc] and Note [HLine versus HDoc]. The classes `IsLine` and `IsDoc` are used to make the existing code polymorphic over `HLine`/`HDoc` and `SDoc`. This is done for X86, PPC, AArch64, DWARF and dependencies (printing module names, labels etc.). Co-authored-by: Alexis King <lexi.lambda@gmail.com> Metric Decrease: CoOpt_Read ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13379 T18140 T18282 T18698a T18698b T1969 T20049 T21839c T21839r T3064 T3294 T4801 T5321FD T5321Fun T5631 T6048 T783 T9198 T9233
* compiler: wasm32 NCGCheng Shao2022-11-115-0/+2715
| | | | This patch adds the wasm32 NCG.
* compiler: annotate CmmFileEmbed with blob lengthCheng Shao2022-11-113-3/+3
| | | | | This patch adds the blob length field to CmmFileEmbed. The wasm32 NCG needs to know the precise size of each data segment.
* Add support for the wasm32-wasi target tupleCheng Shao2022-11-114-0/+10
| | | | | | This patch adds the wasm32-wasi tuple support to various places in the tree: autoconf, hadrian, ghc-boot and also the compiler. The codegen logic will come in subsequent commits.
* Minor refactor around FastStringsKrzysztof Gogolewski2022-11-058-12/+16
| | | | | | | Pass FastStrings to functions directly, to make sure the rule for fsLit "literal" fires. Remove SDoc indirection in GHCi.UI.Tags and GHC.Unit.Module.Graph.
* Export pprTrace and friends from GHC.Prelude.Andreas Klebinger2022-11-031-1/+0
| | | | | Introduces GHC.Prelude.Basic which can be used in modules which are a dependency of the ppr code.
* Drop a kludge for binutils<2.17, which is now over 10 years old.M Farkas-Dyck2022-11-012-39/+2
|
* Minor SDoc-related cleanupKrzysztof Gogolewski2022-10-281-2/+2
| | | | | | | | | | | * Rename pprCLabel to pprCLabelStyle, and use the name pprCLabel for a function using CStyle (analogous to pprAsmLabel) * Move LabelStyle to the CLabel module, it no longer needs to be in Outputable. * Move calls to 'text' right next to literals, to make sure the text/str rule is triggered. * Remove FastString/String roundtrip in Tc.Deriv.Generate * Introduce showSDocForUser', which abstracts over a pattern in GHCi.UI
* Scrub various partiality involving lists (again).M Farkas-Dyck2022-10-193-21/+27
| | | | Lets us avoid some use of `head` and `tail`, and some panics.
* ncg/aarch64: Fix sub-word sign extension yet againBen Gamari2022-10-141-12/+20
| | | | | | | | | | | | | | | | | | | | | | In adc7f108141a973b6dcb02a7836eed65d61230e8 we fixed a number of issues to do with sign extension in the AArch64 NCG found by ghc/test-primops>. However, this patch made a critical error, assuming that getSomeReg would allocate a fresh register for the result of its evaluation. However, this is not the case as `getSomeReg (CmmReg r) == r`. Consequently, any mutation of the register returned by `getSomeReg` may have unwanted side-effects on other expressions also mentioning `r`. In the fix listed above, this manifested as the registers containing the operands of binary arithmetic operations being incorrectly sign-extended. This resulted in #22282. Sadly, the rather simple structure of the tests generated by `test-primops` meant that this particular case was not exercised. Even more surprisingly, none of our testsuite caught this case. Here we fix this by ensuring that intermediate sign extension is performed in a fresh register. Fixes #22282.
* CLabel: fix isInfoTableLabelCheng Shao2022-10-112-2/+2
| | | | isInfoTableLabel does not take Cmm info table into account. This patch is required for data section layout of wasm32 NCG to work.
* Avoid Data.List.group; prefer Data.List.NonEmpty.groupBodigrim2022-09-281-8/+6
| | | | | This allows to avoid further partiality, e. g., map head . group is replaced by map NE.head . NE.group, and there are less panic calls.
* Minor refactor around OutputableKrzysztof Gogolewski2022-09-221-84/+92
| | | | | | | * Replace 'text . show' and 'ppr' with 'int'. * Remove Outputable.hs-boot, no longer needed * Use pprWithCommas * Factor out instructions in AArch64 codegen
* Clean up some. In particular:M Farkas-Dyck2022-09-172-41/+16
| | | | | | | | | | • Delete some dead code, largely under `GHC.Utils`. • Clean up a few definitions in `GHC.Utils.(Misc, Monad)`. • Clean up `GHC.Types.SrcLoc`. • Derive stock `Functor, Foldable, Traversable` for more types. • Derive more instances for newtypes. Bump haddock submodule.
* Fix typosKrzysztof Gogolewski2022-09-142-2/+2
|