delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	PPC NCG: Promote integers to word size in C calls	Peter Trommler	2019-01-31	1	-13/+23
\| \| \| \|	Fixes #16222
*	PPC NCG: Rename constructors	Peter Trommler	2019-01-17	1	-28/+29
\| \| \| \| \|	Rename constructors in calling convention data type to reflect the fact that they represent an ELF ABI not only a Linux ABI.
*	Fix tab and improve whitespace	Peter Trommler	2019-01-17	1	-7/+8
\|
*	PPC NCG: Make calling convention more general	Peter Trommler	2019-01-17	1	-6/+5
\| \| \| \| \|	All operating systems except AIX and Darwin follow the ELF specification.
*	PPC NCG: Remove Darwin support	Peter Trommler	2019-01-01	1	-65/+29
\| \| \| \| \| \| \|	Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
*	PPC NCG: Simple 64-bit condition code on 32-bit	Peter Trommler	2018-12-30	1	-3/+48
\|
*	PPC NCG: Generate MO_?_QuotRem for subword sizes	Peter Trommler	2018-12-11	1	-21/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle IntQuotRemOP and WordQuotRemOp in PPC NCG. Refactor common code with remainder operation. Test Plan: validate (I validated on Linux powerpc64le and x86_64) Reviewers: erikd, hvr, bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5323
*	PPC NCG: Implement MachOps for smaller sizes	Peter Trommler	2018-12-11	1	-161/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generate code for MachOps with smaller than wordsize data. Refactor conversion MachOps. Fixes #15854 Test Plan: validate (I validated on powerpc64le and x86_64 Linux) Reviewers: bgamari, hvr, erikd, simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15854 Differential Revision: https://phabricator.haskell.org/D5300
*	NCG: New code layout algorithm.	Andreas Klebinger	2018-11-17	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
*	Fix precision of asinh/acosh/atanh by making them primops	Artem Pelenitsyn	2018-08-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
*	Allow CmmLabelDiffOff with different widths	Simon Marlow	2018-05-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
*	Add 'addWordC#' PrimOp	Sebastian Graf	2018-05-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'. I found 'plusWord2#' while implementing this, which both lacks documentation and has a slightly different specification than 'addWordC#', which means the generic implementation is unnecessarily complex. While I was at it, I also added lacking meta-information on PrimOps and refactored 'subWordC#'s generic implementation to be branchless. Reviewers: bgamari, simonmar, jrtc27, dfeuer Reviewed By: bgamari, dfeuer Subscribers: dfeuer, thomie, carter Differential Revision: https://phabricator.haskell.org/D4592
*	PPC nativeGen: Add support for MO_SS_Conv_W32_W64	Peter Trommler	2018-03-19	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is required by D4363. D4362 has the implementation for i386 this commit adds PowerPC. Test Plan: validate Reviewers: erikd, hvr, simonmar, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4468
*	Add new mbmi and mbmi2 compiler flags	John Ky	2018-01-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
*	Revert "Add new mbmi and mbmi2 compiler flags"	Ben Gamari	2017-11-22	1	-2/+0
\| \| \| \| \| \|	This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
*	Add new mbmi and mbmi2 compiler flags	John Ky	2017-11-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
*	Fix PPC NCG after blockID patch	Peter Trommler	2017-11-09	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit rGHC8b007ab assigns the same label to the first basic block of a proc and to the proc entry point. This violates the PPC 64-bit ELF v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols. This patch fixes duplicate symbols caused by block labels In commit rGHCd7b8da1 an info table label is generated from a block id. Getting the entry label from that info label leads to an undefined symbol because a suffix "_entry" that is not present in the block label. To fix that issue add a new info table label flavour for labels derived from block ids. Converting such a label with toEntryLabel produces the original block label. Fixes #14311 Test Plan: ./validate Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14311 Differential Revision: https://phabricator.haskell.org/D4149
*	PPC NCG: Impl branch prediction, atomic ops.	Peter Trommler	2017-11-02	1	-32/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement AtomicRMW ops, atomic read, atomic write in PowerPC native code generator. Also implement branch prediction because we need it in atomic ops anyway. This patch improves the issue in #12537 a bit but does not fix it entirely. The fallback operations for atomicread and atomicwrite in libraries/ghc-prim/cbits/atomic.c are incorrect. This patch avoids those functions by implementing the operations directly in the native code generator. This is also what the x86/amd64 NCG and the LLVM backend do. Test Plan: validate on AIX and PowerPC (32-bit) Linux Reviewers: erikd, hvr, austin, bgamari, simonmar Reviewed By: hvr, bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #12537 Differential Revision: https://phabricator.haskell.org/D3984
*	Turn `compareByteArrays#` out-of-line primop into inline primop	alexbiehl	2017-10-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
*	A bunch of typofixes	Gabor Greif	2017-09-26	1	-1/+1
\|
*	compiler: introduce custom "GhcPrelude" Prelude	Herbert Valerio Riedel	2017-09-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
*	nativeGen: Consistently use blockLbl to generate CLabels from BlockIds	Ben Gamari	2017-09-19	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes #14221, where the NCG and the DWARF code were apparently giving two different names to the same block. Test Plan: Validate with DWARF support enabled. Reviewers: simonmar, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14221 Differential Revision: https://phabricator.haskell.org/D3977
*	Add support for producing position-independent executables	Ben Gamari	2017-08-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
*	Hoopl: remove dependency on Hoopl package	Michal Terepeta	2017-06-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
*	PPC NCG: Lower MO_*_Fabs as PowerPC fabs instruction	Peter Trommler	2017-05-01	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In Phab:D3265 we introduced MO_F32_Fabs and MO_F64_Fabs. This patch improves code generation by generating PowerPC fabs instructions. Test Plan: run numeric/should_run/numrun015 or validate Reviewers: austin, bgamari, hvr, simonmar, erikd Reviewed By: erikd Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3512
*	PPC NCG: Implement callish prim ops	Peter Trommler	2017-04-25	1	-64/+400
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide PowerPC optimised implementations of callish prim ops. MO_?_QuotRem The generic implementation of quotient remainder prim ops uses a division and a remainder operation. There is no remainder on PowerPC and so we need to implement remainder "by hand" which results in a duplication of the divide operation when using the generic code. Avoid this duplication by implementing the prim op in the native code generator. MO_U_Mul2 Use PowerPC's instructions for long multiplication. Addition and subtraction Use PowerPC add/subtract with carry/overflow instructions MO_Clz and MO_Ctz Use PowerPC's CNTLZ instruction and implement count trailing zeros using count leading zeros MO_QuotRem2 Implement an algorithm given by Henry Warren in "Hacker's Delight" using PowerPC divide instruction. TODO: Use long division instructions when available (POWER7 and later). Test Plan: validate on AIX and 32-bit Linux Reviewers: simonmar, erikd, hvr, austin, bgamari Reviewed By: erikd, hvr, bgamari Subscribers: trofi, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D2973
*	Generate better fp abs for X86 and llvm with default cmm otherwise	Dominic Steinitz	2017-03-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we have this in libraries/base/GHC/Float.hs: ``` abs x \| x == 0 = 0 -- handles (-0.0) \| x > 0 = x \| otherwise = negateFloat x ``` But 3-4 years ago it was noted that this was inefficient: https://mail.haskell.org/pipermail/libraries/2013-April/019690.html We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now. Reviewers: austin, simonmar, hvr, bgamari Reviewed By: bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3265
*	Typos in comments	Gabor Greif	2016-10-17	1	-1/+1
\|
*	PPC/CodeGen: fix lwa instruction generation	Peter Trommler	2016-10-01	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Opcode lwa is a 64-bit opcode and allows a DS-form only. This patch generates lwa opcodes only when the offset is a multiple of 4. Fixes #12621 Test Plan: validate Reviewers: erikd, hvr, simonmar, austin, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2547 GHC Trac Issues: #12621
*	PPC NCG: Implement minimal stack frame header.	Peter Trommler	2016-08-31	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the ABI specifications a minimal stack frame consists of a header and a minimum size parameter save area. We reserve the minimal size for each ABI. On PowerPC 64-bil Linux and AIX the parameter save area can accomodate up to eight parameters. So calls with eight parameters and fewer can be done without allocating a new stack frame and deallocating that stack frame after the call. On AIX one additional spill slot is available on the stack. Code size for all nofib benchmarks is 0.3 % smaller on powerpc64. Test Plan: validate on AIX Reviewers: hvr!, erikd, austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2445
*	PPC NCG: Fix and refactor TOC handling.	Peter Trommler	2016-06-19	1	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a call to a fixed function the TOC does not need to be saved. The linker handles TOC saving. Refactor TOC handling by folding the two functions toc_before and toc_after into the code generating the call sequence. This saves repeating the case distinction in those two functions. Test Plan: validate on PowerPC 32-bit Linux and AIX Reviewers: hvr, simonmar, austin, erikd, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2328
*	PPC NCG: Fix float parameter passing on 64-bit.	Peter Trommler	2016-06-19	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Linux 64-bit PowerPC the first 13 floating point parameters are passed in registers. We only passed the first 8 floating point params. The alignment of a floating point single precision value in ELF v1.9 is the second word of a doubleword. For ELF v2 we support only little endian and the least significant word of a doubleword is the first word, so no special handling is required. Add a regression test. Test Plan: validate on powerpc Linux and AIX Reviewers: erikd, hvr, austin, simonmar, bgamari Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2327 GHC Trac Issues: #12134
*	PPC NCG: Improve pointer de-tagging code	Peter Trommler	2016-04-29	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generate a clrr[wd]i instruction to clear the tag bits in a pointer. This saves one instruction and one temporary register. Optimize signed comparison with zero after andi. operation This saves one instruction when comparing a pointer tag with zero. This reduces code size by 0.6 % in all nofib benchmarks. Test Plan: validate on AIX and 32-bit Linux Reviewed By: erikd, hvr Differential Revision: https://phabricator.haskell.org/D2093
*	Remove code-duplication in the PPC NCG	Herbert Valerio Riedel	2016-03-24	1	-26/+19
\| \| \| \| \| \|	Reviewed By: bgamari, trommler Differential Revision: https://phabricator.haskell.org/D2020
*	Add NCG support for AIX/ppc32	Herbert Valerio Riedel	2016-03-24	1	-5/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the previous work to revive the unregisterised GHC build for AIX/ppc32. Strictly speaking, AIX runs on POWER4 (and later) hardware, but the PPC32 instructions implemented in the PPC NCG represent a compatible subset of the POWER4 ISA. IBM AIX follows the PowerOpen ABI (and shares many similiarites with the Linux PPC64 ELF V1 NCG backend) but uses the rather limited XCOFF format (compared to ELF). This doesn't support dynamic libraries yet. A major limiting factor is that the AIX assembler does not support the `@ha`/`@l` relocation types nor the ha16()/lo16() functions Darwin's assembler supports. Therefore we need to avoid emitting those. In case of numeric literals we simply compute the functions ourselves, while for labels we have to use local TOCs and hope everything fits into a 16bit offset (for ppc32 this gives us at most 16384 entries per TOC section, which is enough to compile GHC). Another issue is that XCOFF doesn't seem to have a relocation type for label-differences, and therefore the label-differences placed into tables-next-to-code can't be relocated, but the linker may rearrange different sections, so we need to place all read-only sections into the same `.text[PR]` section to workaround this. Finally, the PowerOpen ABI distinguishes between function-descriptors and actualy entry-point addresses. For AIX we need to be specific when emitting assembler code whether we want the address of the function descriptor `printf`) or for the entry-point (`.printf`). So we let the asm pretty-printer prefix a dot to all emitted subroutine calls (i.e. `BL`) on AIX only. For now, STG routines' entry-point labels are not prefixed by a label and don't have any associated function-descriptor. Reviewers: austin, trommler, erikd, bgamari Reviewed By: trommler, erikd, bgamari Differential Revision: https://phabricator.haskell.org/D2019
*	Implement function-sections for Haskell code, #8405	Simon Brenner	2015-11-12	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a flag -split-sections that does similar things to -split-objs, but using sections in single object files instead of relying on the Satanic Splitter and other abominations. This is very similar to the GCC flags -ffunction-sections and -fdata-sections. The --gc-sections linker flag, which allows unused sections to actually be removed, is added to all link commands (if the linker supports it) so that space savings from having base compiled with sections can be realized. Supported both in LLVM and the native code-gen, in theory for all architectures, but really tested on x86 only. In the GHC build, a new SplitSections variable enables -split-sections for relevant parts of the build. Test Plan: validate with both settings of SplitSections Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari Reviewed By: simonmar, thomie, bgamari Subscribers: hsyl20, erikd, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D1242 GHC Trac Issues: #8405
*	Add subWordC# on x86ish	Nikita Karetnikov	2015-10-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a subWordC# primop which implements subtraction with overflow reporting. Reviewers: tibbe, goldfire, rwbarton, bgamari, austin, hvr Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1334 GHC Trac Issues: #10962
*	Annotate CmmBranch with an optional likely target	Simon Marlow	2015-09-23	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows the code generator to give hints to later code generation steps about which branch is most likely to be taken. Right now it is only taken into account in one place: a special case in CmmContFlowOpt that swapped branches over to maximise the chance of fallthrough, which is now disabled when there is a likelihood setting. Test Plan: validate Reviewers: austin, simonpj, bgamari, ezyang, tibbe Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1273
*	Fix todo in compiler/nativeGen: Rename Size to Format	markus	2015-07-07	1	-94/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit renames the Size module in the native code generator to Format, as proposed by a todo, as well as adjusting parameter names in other modules that use it. Test Plan: validate Reviewers: austin, simonmar, bgamari Reviewed By: simonmar, bgamari Subscribers: bgamari, simonmar, thomie Projects: #ghc Differential Revision: https://phabricator.haskell.org/D865
*	Implement PowerPC 64-bit native code backend for Linux	Peter Trommler	2015-07-03	1	-154/+519
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the PowerPC 32-bit native code generator for "64-bit PowerPC ELF Application Binary Interface Supplement 1.9" by Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification -- OpenPOWER ABI for Linux Supplement" by IBM. The latter ABI is mainly used on POWER7/7+ and POWER8 Linux systems running in little-endian mode. The code generator supports both static and dynamic linking. PowerPC 64-bit code for ELF ABI 1.9 and 2 is mostly position independent anyway, and thus so is all the code emitted by the code generator. In other words, -fPIC does not make a difference. rts/stg/SMP.h support is implemented. Following the spirit of the introductory comment in PPC/CodeGen.hs, the rest of the code is a straightforward extension of the 32-bit implementation. Limitations: * Code is generated only in the medium code model, which is also gcc's default * Local symbols are not accessed directly, which seems to also be the case for 32-bit * LLVM does not work, but this does not work on 32-bit either * Must use the system runtime linker in GHCi, because the GHC linker for "static" object files (rts/Linker.c) for PPC 64-bit is not implemented. The system runtime (dynamic) linker works. * The handling of the system stack (register 1) is not ELF- compliant so stack traces break. Instead of allocating a new stack frame, spill code should use the "official" spill area in the current stack frame and deallocation code should restore the back chain * DWARF support is missing Fixes #9863 Test Plan: validate (on powerpc, too) Reviewers: simonmar, trofi, erikd, austin Reviewed By: trofi Subscribers: bgamari, arnons1, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D629 GHC Trac Issues: #9863
*	Encode alignment in MO_Memcpy and friends	Ben Gamari	2015-06-16	1	-15/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Alignment needs to be a compile-time constant. Previously the code generators had to jump through hoops to ensure this was the case as the alignment was passed as a CmmExpr in the arguments list. Now we take care of this up front. This fixes #8131. Authored-by: Reid Barton <rwbarton@gmail.com> Dusted-off-by: Ben Gamari <ben@smart-cactus.org> Tests for T8131 Test Plan: Validate Reviewers: rwbarton, austin Reviewed By: rwbarton, austin Subscribers: bgamari, carter, thomie Differential Revision: https://phabricator.haskell.org/D624 GHC Trac Issues: #8131
*	Refactor the story around switches (#10137)	Joachim Breitner	2015-03-30	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-implements the code generation for case expressions at the Stg → Cmm level, both for data type cases as well as for integral literal cases. (Cases on float are still treated as before). The goal is to allow for fancier strategies in implementing them, for a cleaner separation of the strategy from the gritty details of Cmm, and to run this later than the Common Block Optimization, allowing for one way to attack #10124. The new module CmmSwitch contains a number of notes explaining this changes. For example, it creates larger consecutive jump tables than the previous code, if possible. nofib shows little significant overall improvement of runtime. The rather large wobbling comes from changes in the code block order (see #8082, not much we can do about it). But the decrease in code size alone makes this worthwhile. ``` Program Size Allocs Runtime Elapsed TotalMem Min -1.8% 0.0% -6.1% -6.1% -2.9% Max -0.7% +0.0% +5.6% +5.7% +7.8% Geometric Mean -1.4% -0.0% -0.3% -0.3% +0.0% ``` Compilation time increases slightly: ``` -1 s.d. ----- -2.0% +1 s.d. ----- +2.5% Average ----- +0.3% ``` The test case T783 regresses a lot, but it is the only one exhibiting any regression. The cause is the changed order of branches in an if-then-else tree, which makes the hoople data flow analysis traverse the blocks in a suboptimal order. Reverting that gets rid of this regression, but has a consistent, if only very small (+0.2%), negative effect on runtime. So I conclude that this test is an extreme outlier and no reason to change the code. Differential Revision: https://phabricator.haskell.org/D720
*	Replace .lhs with .hs in compiler comments	Yuri de Wit	2015-02-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It looks like during .lhs -> .hs switch the comments were not updated. So doing exactly that. Reviewers: austin, jstolarek, hvr, goldfire Reviewed By: austin, jstolarek Subscribers: thomie, goldfire Differential Revision: https://phabricator.haskell.org/D621 GHC Trac Issues: #9986
*	Add unwind information to Cmm	Peter Wortmann	2014-12-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unwind information allows the debugger to discover more information about a program state, by allowing it to "reconstruct" other states of the program. In practice, this means that we explain to the debugger how to unravel stack frames, which comes down mostly to explaining how to find their Sp and Ip register values. * We declare yet another new constructor for CmmNode - and this time there's actually little choice, as unwind information can and will change mid-block. We don't actually make use of these capabilities, and back-end support would be tricky (generate new labels?), but it feels like the right way to do it. * Even though we only use it for Sp so far, we allow CmmUnwind to specify unwind information for any register. This is pretty cheap and could come in useful in future. * We allow full CmmExpr expressions for specifying unwind values. The advantage here is that we don't have to make up new syntax, and can e.g. use the WDS macro directly. On the other hand, the back-end will now have to simplify the expression until it can sensibly be converted into DWARF byte code - a process which might fail, yielding NCG panics. On the other hand, when you're writing Cmm by hand you really ought to know what you're doing. (From Phabricator D169)
*	Tick scopes	Peter Wortmann	2014-12-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch solves the scoping problem of CmmTick nodes: If we just put CmmTicks into blocks we have no idea what exactly they are meant to cover. Here we introduce tick scopes, which allow us to create sub-scopes and merged scopes easily. Notes: * Given that the code often passes Cmm around "head-less", we have to make sure that its intended scope does not get lost. To keep the amount of passing-around to a minimum we define a CmmAGraphScoped type synonym here that just bundles the scope with a portion of Cmm to be assembled later. * We introduce new scopes at somewhat random places, aligning with getCode calls. This works surprisingly well, but we might have to add new scopes into the mix later on if we find things too be too coarse-grained. (From Phabricator D169)
*	Source notes (Cmm support)	Peter Wortmann	2014-12-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds CmmTick nodes to Cmm code. This is relatively straight-forward, but also not very useful, as many blocks will simply end up with no annotations whatosever. Notes: * We use this design over, say, putting ticks into the entry node of all blocks, as it seems to work better alongside existing optimisations. Now granted, the reason for this is that currently GHC's main Cmm optimisations seem to mainly reorganize and merge code, so this might change in the future. * We have the Cmm parser generate a few source notes as well. This is relatively easy to do - worst part is that it complicates the CmmParse implementation a bit. (From Phabricator D169)
*	powerpc: fix and enable shared libraries by default on linux	Sergei Trofimovich	2014-12-14	1	-3/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: And fix things all the way down to it. Namely: - remove 'r30' from free registers, it's an .LCTOC1 register for gcc. generated .plt stubs expect it to be initialised. - fix PicBase computation, which originally forgot to use 'tmp' reg in 'initializePicBase_ppc.fetchPC' - mark 'ForeighTarget's as implicitly using 'PicBase' register (see comment for details) - add 64-bit MO_Sub and test on alloclimit3/4 regtests - fix dynamic label offsets to match with .LCTOC1 offset Signed-off-by: Sergei Trofimovich <siarheit@google.com> Test Plan: validate passes equal amount of vanilla/dyn tests Reviewers: simonmar, erikd, austin Reviewed By: erikd, austin Subscribers: carter, thomie Differential Revision: https://phabricator.haskell.org/D560 GHC Trac Issues: #8024, #9831
*	Add MO_AddIntC, MO_SubIntC MachOps and implement in X86 backend	Reid Barton	2014-08-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These MachOps are used by addIntC# and subIntC#, which in turn are used in integer-gmp when adding or subtracting small Integers. The following benchmark shows a ~6% speedup after this commit on x86_64 (building GHC with BuildFlavour=perf). {-# LANGUAGE MagicHash #-} import GHC.Exts import Criterion.Main count :: Int -> Integer count (I# n#) = go n# 0 where go :: Int# -> Integer -> Integer go 0# acc = acc go n# acc = go (n# -# 1#) $! acc + 1 main = defaultMain [bgroup "count" [bench "100" $ whnf count 100]] Differential Revision: https://phabricator.haskell.org/D140
*	Implement new CLZ and CTZ primops (re #9340)	Herbert Valerio Riedel	2014-08-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the new primops clz#, clz32#, clz64#, ctz#, ctz32#, ctz64# which provide efficient implementations of the popular count-leading-zero and count-trailing-zero respectively (see testcase for a pure Haskell reference implementation). On x86, NCG as well as LLVM generates code based on the BSF/BSR instructions (which need extra logic to make the 0-case well-defined). Test Plan: validate and succesful tests on i686 and amd64 Reviewers: rwbarton, simonmar, ezyang, austin Subscribers: simonmar, relrod, ezyang, carter Differential Revision: https://phabricator.haskell.org/D144 GHC Trac Issues: #9340
*	remove SPARC related comment in PPC code generator	Peter Trommler	2014-07-10	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PowerPC does not do delay slots and there is also no requirement to put extra instructions between FP operations and branches. Test Plan: None. Comment change only. Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D40