summaryrefslogtreecommitdiff
path: root/compiler/nativeGen/X86/Ppr.hs
Commit message (Collapse)AuthorAgeFilesLines
* Modules: CmmToAsm (#13009)Sylvain Henry2020-02-241-1014/+0
|
* Modules: Driver (#13009)Sylvain Henry2020-02-211-1/+1
| | | | submodule updates: nofib, haddock
* Disentangle DynFlags and SDocSylvain Henry2020-02-201-7/+12
| | | | | | | | | | | | | Remove several uses of `sdocWithDynFlags`. The remaining ones are mostly CodeGen related (e.g. depend on target platform constants) and will be fixed separately. Metric Decrease: T12425 T9961 WWRec T1969 T14683
* Do CafInfo/SRT analysis in CmmÖmer Sinan Ağacan2020-01-311-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs *after* CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] | Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 | | GHC HEAD | This MR | Diff | | --------------- | -------------- | -------------- | ------ | | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% | | Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% | | Max residency | 421,225,624 | 424,324,264 | +0.73% | * -O1 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% | | Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% | | Max residency | 982,643,440 | 991,663,432 | +0.91% | * -O2 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% | | Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% | | Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% | Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-5/+5
|
* Remove empty NCG.hJohn Ericson2019-09-131-1/+0
|
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-161-144/+6
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-031-6/+144
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Move 'Platform' to ghc-bootJohn Ericson2019-06-191-1/+1
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-151-0/+12
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* removing x87 register support from native code genCarter Schonwald2019-04-101-390/+22
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* codegen: use newtype for Alignment in BasicTypesArtem Pyanykh2019-04-091-8/+9
|
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-151-1/+1
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* Rip out object splittingBen Gamari2019-03-051-11/+8
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* NCG: fast compilation of very large strings (#16190)Sylvain Henry2019-02-141-3/+1
| | | | | | | | | | This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
* Compile count{Leading,Trailing}Zeros to corresponding x86_64 instructions ↵Dmitry Ivanov2019-01-301-0/+2
| | | | | | | under -mbmi2 This works similarly to existing implementation for popCount. Trac ticket: #16086.
* Revert "Batch merge"Ben Gamari2019-01-301-2/+0
| | | | This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
* Batch mergeBen Gamari2019-01-301-0/+2
|
* PPC NCG: Reduce memory consumption emitting string literalsPeter Trommler2019-01-131-41/+0
|
* Rename literal constructorsSylvain Henry2018-11-221-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
* Add Int8# and Word8#Michal Terepeta2018-11-021-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: primops.txt.pp gets two new sections for two new primitive types for signed and unsigned 8-bit integers (Int8# and Word8 respectively) along with basic arithmetic and comparison operations. PrimRep/RuntimeRep get two new constructors for them. All of the primops translate into the existing MachOPs. For CmmCalls the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. This is the second attempt at merging this, after the first attempt in D4475 had to be backed out due to regressions on i386. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate (on both x86-{32,64}) Reviewers: bgamari, hvr, goldfire, simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5258
* Fix rare undefined asm temp end label error in x86Zejun Wu2018-10-281-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Encountered assembly error due to undefined label `.LcaDcU_info_end` for following code generated by `pprFrameProc`: ``` .Lsat_sa8fp{v}_info_fde_end: .long .Lblock{v caDcU}_info_fde_end-.Lblock{v caDcU}_info_fde .Lblock{v caDcU}_info_fde: .long _nbHlD-.Lsection_frame .quad block{v caDcU}_info-1 .quad .Lblock{v caDcU}_info_end-block{v caDcU}_info+1 .byte 1 ``` This diff fixed the error. Test Plan: ./validate Also the case where we used to have assembly error is now fixed. Unfortunately, I have limited insight here and cannot get a small enough repro or test case for this. Ben says: > I think I see: Previously we only produced end symbols for the info > tables of top-level procedures. However, blocks within a procedure may > also have info tables, we will dutifully generate debug information for > and consequently we get undefined symbols. Reviewers: simonmar, scpmw, last_g, bgamari Reviewed By: bgamari Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5246
* Revert "Add Int8# and Word8#"Ben Gamari2018-10-091-5/+5
| | | | | | | | | This unfortunately broke i386 support since it introduced references to byte-sized registers that don't exist on that architecture. Reverts binary submodule This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
* Add Int8# and Word8#Michal Terepeta2018-10-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: - `primops.txt.pp` gets two new sections for two new primitive types for signed and unsigned 8-bit integers (`Int8#` and `Word8` respectively) along with basic arithmetic and comparison operations. `PrimRep`/`RuntimeRep` get two new constructors for them. All of the primops translate into the existing `MachOP`s. - For `CmmCall`s the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. - x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate with new tests Reviewers: hvr, goldfire, bgamari, simonmar Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4475
* Mark code related symbols as @function not @objectSergei Azovskov2018-09-141-5/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This diff is a part of the bigger project which goal is to improve common profiling tools support (perf) for GHC binaries. A similar job was already done and reverted in the past: * https://phabricator.haskell.org/rGHCb1f453e16f0ce11a2ab18cc4c350bdcbd36299a6 * https://phabricator.haskell.org/rGHCf1f3c4f50650110ad0f700d6566a44c515b0548f Reasoning: `Perf` and similar tools build in memory symbol table from the .symtab section of the ELF file to display human-readable function names instead of the addresses in the output. `Perf` uses only two types of symbols: `@function` and `@notype` but GHC is not capable to produce any `@function` symbols so the `perf` output is pretty useless (All the haskell symbols that you can see in `perf` now are `@notype` internal symbols extracted by mistake/hack). The changes: * mark code related symbols as @function * small hack to mark InfoTable symbols as code if TABLES_NEXT_TO_CODE is true Limitations: * The perf symbolization support is not complete after this patch but I'm working on the second patch. * Constructor symbols are not supported. To fix that we can issue extra local symbols which mark code sections as code and will be only used for debug. Test Plan: tests any additional ideas? Perf output on stock ghc 8.4.1: ``` 9.78% FibbSlow FibbSlow [.] ckY_info 9.59% FibbSlow FibbSlow [.] cjqd_info 7.17% FibbSlow FibbSlow [.] c3sg_info 6.62% FibbSlow FibbSlow [.] c1X_info 5.32% FibbSlow FibbSlow [.] cjsX_info 4.18% FibbSlow FibbSlow [.] s3rN_info 3.82% FibbSlow FibbSlow [.] c2m_info 3.68% FibbSlow FibbSlow [.] cjlJ_info 3.26% FibbSlow FibbSlow [.] c3sb_info 3.19% FibbSlow FibbSlow [.] cjPQ_info 3.05% FibbSlow FibbSlow [.] cjQd_info 2.97% FibbSlow FibbSlow [.] cjAB_info 2.78% FibbSlow FibbSlow [.] cjzP_info 2.40% FibbSlow FibbSlow [.] cjOS_info 2.38% FibbSlow FibbSlow [.] s3rK_info 2.27% FibbSlow FibbSlow [.] cjq0_info 2.18% FibbSlow FibbSlow [.] cKQ_info 2.13% FibbSlow FibbSlow [.] cjSl_info 1.99% FibbSlow FibbSlow [.] s3rL_info 1.98% FibbSlow FibbSlow [.] c2cC_info 1.80% FibbSlow FibbSlow [.] s3rO_info 1.37% FibbSlow FibbSlow [.] c2f2_info ... ``` Perf output on patched ghc: ``` 7.97% FibbSlow FibbSlow [.] c3rM_info 6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8 6.63% FibbSlow FibbSlow [.] cifA_info 4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info 4.55% FibbSlow FibbSlow [.] chXn_info 4.52% FibbSlow FibbSlow [.] c3rH_info 4.45% FibbSlow FibbSlow [.] chZB_info 4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info 4.03% FibbSlow FibbSlow [.] stg_ap_0_fast 3.76% FibbSlow FibbSlow [.] chXA_info 3.67% FibbSlow FibbSlow [.] cifu_info 3.25% FibbSlow FibbSlow [.] ci4r_info 2.64% FibbSlow FibbSlow [.] s3rf_info 2.42% FibbSlow FibbSlow [.] s3rg_info 2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info 2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info 2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info 2.09% FibbSlow FibbSlow [.] cicc_info 2.03% FibbSlow FibbSlow [.] 0x0000000000331e15 2.02% FibbSlow FibbSlow [.] s3ri_info 1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8 1.89% FibbSlow FibbSlow [.] ci4N_info ... ``` Reviewers: simonmar, niteria, bgamari, goldfire Reviewed By: simonmar, bgamari Subscribers: lelf, rwbarton, thomie, carter GHC Trac Issues: #15501 Differential Revision: https://phabricator.haskell.org/D4713
* Allow aligning of cmm procs at specific boundryklebinger.andreas@gmx.at2018-06-031-0/+7
| | | | | | | | | | | | | | | | | | | | Allows to align CmmProcs at the given boundries. It makes performance usually worse but can be helpful to limit the effect of a unrelated function B becoming faster/slower after changing function A. Test Plan: ci, using it. Reviewers: bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #15148 Differential Revision: https://phabricator.haskell.org/D4706
* Allow CmmLabelDiffOff with different widthsSimon Marlow2018-05-161-1/+1
| | | | | | | | | | | | | | | | | | Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
* Improve X86CodeGen's pprASCII.HE, Tao2018-02-061-13/+34
| | | | | | | | | | | | | | | | | | | | | The original implementation generates a list of SDoc then concatenates them using `hcat`. For memory optimization, we can transform the given literal string into escaped string the construct SDoc directly. This optimization will decreate the memory allocation when there's big literal strings in haskell code, see Trac #14741. Signed-off-by: HE, Tao <sighingnow@gmail.com> Reviewers: bgamari, mpickering, simonpj Reviewed By: simonpj Subscribers: simonpj, rwbarton, thomie, carter GHC Trac Issues: #14741 Differential Revision: https://phabricator.haskell.org/D4384
* Add new mbmi and mbmi2 compiler flagsJohn Ky2018-01-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
* Revert "Add new mbmi and mbmi2 compiler flags"Ben Gamari2017-11-221-13/+0
| | | | | | This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
* Add new mbmi and mbmi2 compiler flagsJohn Ky2017-11-151-0/+13
| | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* nativeGen: Consistently use blockLbl to generate CLabels from BlockIdsBen Gamari2017-09-191-3/+4
| | | | | | | | | | | | | | | This fixes #14221, where the NCG and the DWARF code were apparently giving two different names to the same block. Test Plan: Validate with DWARF support enabled. Reviewers: simonmar, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14221 Differential Revision: https://phabricator.haskell.org/D3977
* Add debugPprTypeSimon Peyton Jones2017-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We pretty-print a type by converting it to an IfaceType and pretty-printing that. But (a) that's a bit indirect, and (b) delibrately loses information about (e.g.) the kind on the /occurrences/ of a type variable So this patch implements debugPprType, which pretty prints the type directly, with no fancy formatting. It's just used for debugging. I took the opportunity to refactor the debug-pretty-printing machinery a little. In particular, define these functions and use them: ifPprDeubug :: SDoc -> SDOc -> SDoc -- Says what to do with and without -dppr-debug whenPprDebug :: SDoc -> SDoc -- Says what to do with -dppr-debug; without is empty getPprDebug :: (Bool -> SDoc) -> SDoc getPprDebug used to be called sdocPprDebugWith whenPprDebug used to be called ifPprDebug So a lot of files get touched in a very mechanical way
* Hoopl: remove dependency on Hoopl packageMichal Terepeta2017-06-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
* nativeGen: Use SSE2 SQRT instructionBen Gamari2017-04-281-0/+1
| | | | | | | | | | Reviewers: austin, dfeuer Subscribers: dfeuer, rwbarton, thomie GHC Trac Issues: #13629 Differential Revision: https://phabricator.haskell.org/D3508
* x86 nativeGen: Fix test with mask in range [128,255] (#13425)Reid Barton2017-03-241-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | My commit bdb0c43c7 optimized the encoding of instructions to test tag bits, but it did not always set exactly the same condition codes since the testb instruction does a single-byte comparison, rather than a full-word comparison. It would be correct to optimize the expression `x .&. 128 > 0` to the sequence testb $128, %al seta %al ; note: 'a' for unsigned comparison, ; not 'g' for signed comparison but the pretty-printer is not the right place to make this kind of context-sensitive optimization. Test Plan: harbormaster Reviewers: trofi, austin, bgamari, dfeuer Reviewed By: trofi, dfeuer Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D3359
* Generate better fp abs for X86 and llvm with default cmm otherwiseDominic Steinitz2017-03-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Currently we have this in libraries/base/GHC/Float.hs: ``` abs x | x == 0 = 0 -- handles (-0.0) | x > 0 = x | otherwise = negateFloat x ``` But 3-4 years ago it was noted that this was inefficient: https://mail.haskell.org/pipermail/libraries/2013-April/019690.html We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now. Reviewers: austin, simonmar, hvr, bgamari Reviewed By: bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3265
* Honour -dsuppress-uniques more thoroughlySimon Peyton Jones2017-02-171-6/+6
| | | | | | | | | | | I found that tests parser/should_compile/DumpRenamedAst and friends were printing uniques, which makes the test fragile. But -dsuppress-uniques made no difference! It turned out that pprName wasn't properly consulting Opt_SuppressUniques. This patch fixes the problem, and updates those three tests to use -dsuppress-uniques
* Generalize CmmUnwind and pass unwind information through NCGBen Gamari2017-02-081-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack unwinding information produced by GHC is currently quite approximate. Essentially we assume that register values do not change at all within a basic block. While this is somewhat true in normal Haskell code, blocks containing foreign calls often break this assumption. This results in unreliable call stacks, especially in the code containing foreign calls. This is worse than it sounds as unreliable unwinding information can at times result in segmentation faults. This patch set attempts to improve this situation by tracking unwinding information with finer granularity. By dispensing with the assumption of one unwinding table per block, we allow the compiler to accurately represent the areas surrounding foreign calls. Towards this end we generalize the representation of unwind information in the backend in three ways, * Multiple CmmUnwind nodes can occur per block * CmmUnwind nodes can now carry unwind information for multiple registers (while not strictly necessary; this makes emitting unwinding information a bit more convenient in the compiler) * The NCG backend is given an opportunity to modify the unwinding records since it may need to make adjustments due to, for instance, native calling convention requirements for foreign calls (see #11353). This sets the stage for resolving #11337 and #11338. Test Plan: Validate Reviewers: scpmw, simonmar, austin, erikd Subscribers: qnikst, thomie Differential Revision: https://phabricator.haskell.org/D2741
* BlockId: remove BlockMap and BlockSet synonymsMichal Terepeta2016-12-081-2/+2
| | | | | | | | | | | | | | | | | | | | This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
* Reduce the size of string literals in binaries.Thijs Alkemade2016-12-061-5/+21
| | | | | | | | | | | | | | | Removed the alignment for strings and mark then as cstring sections in the generated asm so the linker can merge duplicate sections. Reviewers: rwbarton, trofi, austin, trommler, simonmar, hvr, bgamari Reviewed By: hvr, bgamari Subscribers: simonpj, hvr, thomie Differential Revision: https://phabricator.haskell.org/D1290 GHC Trac Issues: #9577
* Correct spelling of command-line option in commentGabor Greif2016-11-171-1/+1
|
* Inline compiler/NOTES into X86/Ppr.hsMatthew Pickering2016-11-161-0/+18
| | | | | | | | Reviewers: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2721
* Revert "codeGen: Remove binutils<2.17 hack, fixes T11758"Simon Peyton Jones2016-08-191-2/+21
| | | | | | | This reverts commit e3e2e49a8f6952e1c8a19321c729c17b294d8c92. I'm reverting because it makes ghc-stage2 seg-fault on 64-bit Windows machines. Even ghc-stage2 --version seg-faults.
* codeGen: Remove binutils<2.17 hack, fixes T11758Alex Dzyoba2016-08-051-21/+2
| | | | | | | | | | | | | | | | | | | There was a complication on the x86_64 platform, where pointers were 64 bits, but the tools didn't support 64-bit relative relocations. This was true before binutils 2.17, which nowadays is quite standart (even CentOs 5 is shipped with 2.17). Hacks were removed from x86 genSwitch and asm pretty printer. Also [x86-64-relative] note was dropped from includes/rts/storage/InfoTables.h as it's not referenced anywhere now. Reviewers: austin, simonmar, rwbarton, erikd, bgamari Reviewed By: simonmar, erikd, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2426
* Restore original alignment for info tablesSimon Brenner2016-01-271-0/+9
| | | | | | | | | | | | | | | | | | | | This was broken in 4a32bf925b8aba7885d9c745769fe84a10979a53, meaning that info tables and subsequent code are no longer guaranteed to have the recommended alignment. Split up the section header and section alignment printers, and print an appropriate alignment directive before each info table. Fixes Trac #11486 Reviewers: austin, bgamari, rwbarton Reviewed By: bgamari, rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1847 GHC Trac Issues: #11486
* Replace calls to `ptext . sLit` with `text`Jan Stolarek2016-01-181-49/+49
| | | | | | | | | | | | | | | | | | | | Summary: In the past the canonical way for constructing an SDoc string literal was the composition `ptext . sLit`. But for some time now we have function `text` that does the same. Plus it has some rules that optimize its runtime behaviour. This patch takes all uses of `ptext . sLit` in the compiler and replaces them with calls to `text`. The main benefits of this patch are clener (shorter) code and less dependencies between module, because many modules now do not need to import `FastString`. I don't expect any performance benefits - we mostly use SDocs to report errors and it seems there is little to be gained here. Test Plan: ./validate Reviewers: bgamari, austin, goldfire, hvr, alanz Subscribers: goldfire, thomie, mpickering Differential Revision: https://phabricator.haskell.org/D1784
* Support multiple debug output levelsBen Gamari2015-11-231-3/+3
| | | | | | | | | We now only strip block information from DebugBlocks when compiling with `-g1`, intended to be used when only minimal debug information is desired. `-g2` is assumed when `-g` is passed without any integer argument. Differential Revision: https://phabricator.haskell.org/D1281