summaryrefslogtreecommitdiff
path: root/compiler/llvmGen/LlvmCodeGen/Ppr.hs
Commit message (Collapse)AuthorAgeFilesLines
* Modules: Llvm (#13009)Sylvain Henry2020-02-181-100/+0
|
* Do CafInfo/SRT analysis in CmmÖmer Sinan Ağacan2020-01-311-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs *after* CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] | Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 | | GHC HEAD | This MR | Diff | | --------------- | -------------- | -------------- | ------ | | Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% | | Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% | | Max residency | 421,225,624 | 424,324,264 | +0.73% | * -O1 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% | | Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% | | Max residency | 982,643,440 | 991,663,432 | +0.91% | * -O2 | | GHC HEAD | This MR | Diff | | --------------- | --------------- | --------------- | ------ | | Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% | | Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% | | Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% | Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
* Module hierarchy: Cmm (cf #13009)Sylvain Henry2020-01-251-2/+2
|
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-151-1/+1
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* llvmGen: Eliminate duplicate definitionGabor Greif2018-11-221-1/+1
| | | | remove local
* compiler: introduce custom "GhcPrelude" PreludeHerbert Valerio Riedel2017-09-191-0/+2
| | | | | | | | | | | | | | | | | | This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
* Clean up opt and llcMoritz Angermann2017-09-061-62/+1
| | | | | | | | | | | | | | | | | | | | | The LLVM backend shells out to LLVMs `opt` and `llc` tools. This clean up introduces a shared data structure to carry the arguments we pass to each tool so that corresponding flags are next to each other. It drops the hard coded data layouts in favor of using `-mtriple` and have LLVM infer them. Furthermore we add `clang` as a proper tool, so we don't rely on assuming that `clang` is called `clang` on the `PATH` when using `clang` as the assembler. Finally this diff also changes the type of `optLevel` from `Int` to `Word`, as we do not have negative optimization levels. Reviewers: erikd, hvr, austin, rwbarton, bgamari, kavon Reviewed By: kavon Subscribers: michalt, Ericson2314, ryantrinkle, dfeuer, carter, simonpj, kavon, simonmar, thomie, erikd, snowleopard Differential Revision: https://phabricator.haskell.org/D3352
* Revert "Make LLVM output robust to -dead_strip on mach-o platforms"Ben Gamari2017-06-081-71/+2
| | | | This reverts commit 667abf17dced8b4a4cd2dc6a291a6f244ffa031f.
* Typos in comments [ci skip]Gabor Greif2017-05-051-2/+2
|
* Make LLVM output robust to -dead_strip on mach-o platformsMoritz Angermann2017-05-011-2/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverses commit 1686f30951292e94bf3076ce8b3eafb0bcbba91d (Mangle .subsections_via_symbols away., D3287), and implements proper support for `-dead_strip` via the injection of `.alt_entry` symbols for the function definition pointing to the beginning of the prefix data. This is the result of a lengthy discussion with rwbarton, and the following llvm-dev mailing list thread: http://lists.llvm.org/pipermail/llvm-dev/2017-March/110733.html The essential problem is that there is no reference from a function to its info table. This combined with `.subsections_via_symbols`, which llvm emits unconditionally, leads the linker to believe that the prefix data is unnecessary and stripping it away if presented with the `-dead_strip` flag. The NCG has for this purpose special $dsp (dead strip preventer) symbols and adds a relocation to the end of each function body pointing to that function's $dsp symbol. We cannot easily do the same thing via LLVM. Instead we use the `.alt_entry` directive on the function symbol, which causes the linker to treat it as a continuation of the previous symbol, namely the $dsp symbol. As a result the function body will not be separated from its info table. Reviewers: erikd, austin, rwbarton, bgamari Reviewed By: bgamari Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D3290
* Adds x86_64-apple-darwin14 target.Moritz Angermann2016-07-051-7/+10
| | | | | | | | | | | | | | | | | | x86_64-apple-darwin14, is the target for the 64bit simulator. Ideally, we'd have (i386|armv7|arm64|x64_86)-apple-ios, yet, many #ifdefs depend on `darwin`, notably libffi. Hence, this only adds x86_64-apple-darwin14 as a target. This also updates the comment to add the `-S` flag, and dump the output to stdout; and adjusts the `datalayout` and `triple` values, as obtained through the method mentioned in the comment. Reviewers: hvr, erikd, austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2378
* Implement function-sections for Haskell code, #8405Simon Brenner2015-11-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag -split-sections that does similar things to -split-objs, but using sections in single object files instead of relying on the Satanic Splitter and other abominations. This is very similar to the GCC flags -ffunction-sections and -fdata-sections. The --gc-sections linker flag, which allows unused sections to actually be removed, is added to all link commands (if the linker supports it) so that space savings from having base compiled with sections can be realized. Supported both in LLVM and the native code-gen, in theory for all architectures, but really tested on x86 only. In the GHC build, a new SplitSections variable enables -split-sections for relevant parts of the build. Test Plan: validate with both settings of SplitSections Reviewers: dterei, Phyx, austin, simonmar, thomie, bgamari Reviewed By: simonmar, thomie, bgamari Subscribers: hsyl20, erikd, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D1242 GHC Trac Issues: #8405
* Fix GHCi on Arm (#10375).Erik de Castro Lopo2015-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arm has two instruction sets, Arm and Thumb, and an execution mode for each. Executing Arm code in Thumb mode or vice-versa will likely result in an Illegal instruction exception. Furthermore, Haskell code compiled via LLVM was generating Arm instructions while C code compiled via GCC was generating Thumb code by default. When these two object code types were being linked by the system linker, all was fine, because the system linker knows how to jump and call from one instruction set to the other. The first problem was with GHCi's object code loader which did not know about Thumb vs Arm. When loading an object file `StgCRun` would jump into the loaded object which could change the mode causing a crash after it returned. This was fixed by forcing all C code to generate Arm instructions by passing `-marm` to GCC. The second problem was the `mkJumpToAddr` function which was generating Thumb instructions. Changing that to generate Arm instructions instead results in a working GHCi on Arm. Test Plan: validate on x86_64 and arm Reviewers: bgamari, austin, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1323 GHC Trac Issues: #10375
* llvmGen: move to LLVM 3.6 exclusivelyBen Gamari2015-02-091-65/+25
| | | | | | | | | | | | | | | | | | | Summary: Rework llvmGen to use LLVM 3.6 exclusively. The plans for the 7.12 release are to ship LLVM alongside GHC in the interests of user (and developer) sanity. Along the way, refactor TNTC support to take advantage of the new `prefix` data support in LLVM 3.6. This allows us to drop the section-reordering component of the LLVM mangler. Test Plan: Validate, look at emitted code Reviewers: dterei, austin, scpmw Reviewed By: austin Subscribers: erikd, awson, spacekitteh, thomie, carter Differential Revision: https://phabricator.haskell.org/D530 GHC Trac Issues: #10074
* LlvmCodeGen cross-compiling fixes (#9895)Erik de Castro Lopo2014-12-291-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: * Throw an error when cross-compiling without a target definition. When cross compiling via LLVM, a target 'datalayout' and 'triple' must be defined or LLVM will generate code for the compile host instead of the compile target. * Add aarch64-unknown-linux-gnu target. The datalayout and triple lines were found by using clang to compile a small C program and -emit-llvm to get the LLVM IR output. Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com> Test Plan: validate Reviewers: rwbarton, carter, hvr, bgamari, austin Reviewed By: austin Subscribers: carter, thomie, garious Differential Revision: https://phabricator.haskell.org/D585 GHC Trac Issues: #9895
* llvmGen: Compatibility with LLVM 3.5 (re #9142)Ben Gamari2014-11-211-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to changes in LLVM 3.5 aliases now may only refer to definitions. Previously to handle symbols defined outside of the current commpilation unit GHC would emit both an `external` declaration, as well as an alias pointing to it, e.g., @stg_BCO_info = external global i8 @stg_BCO_info$alias = alias private i8* @stg_BCO_info Where references to `stg_BCO_info` will use the alias `stg_BCO_info$alias`. This is not permitted under the new alias behavior, resulting in errors resembling, Alias must point to a definition i8* @"stg_BCO_info$alias" To fix this, we invert the naming relationship between aliases and definitions. That is, now the symbol definition takes the name `@stg_BCO_info$def` and references use the actual name, `@stg_BCO_info`. This means the external symbols can be handled by simply emitting an `external` declaration, @stg_BCO_info = external global i8 Whereas in the case of a forward declaration we emit, @stg_BCO_info = alias private i8* @stg_BCO_info$def Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D155
* arm64: 64bit iOS and SMP support (#7942)Luke Iannini2014-11-191-0/+3
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* Allow -dead_strip linking on platforms with .subsections_via_symbolsMoritz Angermann2014-11-191-1/+7
| | | | | | | | | | | | | | | | | Summary: This allows to link objects produced with the llvm code generator to be linked with -dead_strip. This applies to at least the iOS cross compiler and OS X compiler. Signed-off-by: Moritz Angermann <moritz@lichtzwerge.de> Test Plan: Create a ffi library and link it with -dead_strip. If the resulting binary does not crash, the patch works as advertised. Reviewers: rwbarton, simonmar, hvr, dterei, mzero, ezyang, austin Reviewed By: dterei, ezyang, austin Subscribers: thomie, mzero, simonmar, ezyang, carter Differential Revision: https://phabricator.haskell.org/D206
* Add LANGUAGE pragmas to compiler/ source filesHerbert Valerio Riedel2014-05-151-1/+2
| | | | | | | | | | | | | | | | | | In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been reorganized, while following the convention, to - place `{-# LANGUAGE #-}` pragmas at the top of the source file, before any `{-# OPTIONS_GHC #-}`-lines. - Moreover, if the list of language extensions fit into a single `{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each individual language extension. In both cases, try to keep the enumeration alphabetically ordered. (The latter layout is preferable as it's more diff-friendly) While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
* Delete trailing whitespace in LlvmCodeGen/Ppr.hsAustin Seipp2013-08-241-2/+1
| | | | Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Add support for iOS simulator (issue #8152).Austin Seipp2013-08-241-0/+3
| | | | | | | | The iOS simulator is essentially an iOS target but for an x86 machine instead. It doesn't support the native code generator either, though. Authored-by: Stephen Blackheath <...@blacksapphire.com> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Major Llvm refactoringPeter Wortmann2013-06-271-45/+37
| | | | | | | | | | | | | | | | | | | | | | This combined patch reworks the LLVM backend in a number of ways: 1. Most prominently, we introduce a LlvmM monad carrying the contents of the old LlvmEnv around. This patch completely removes LlvmEnv and refactors towards standard library monad combinators wherever possible. 2. Support for streaming - we can now generate chunks of Llvm for Cmm as it comes in. This might improve our speed. 3. To allow streaming, we need a more flexible way to handle forward references. The solution (getGlobalPtr) unifies LlvmCodeGen.Data and getHsFunc as well. 4. Skip alloca-allocation for registers that are actually never written. LLVM will automatically eliminate these, but output is smaller and friendlier to human eyes this way. 5. We use LlvmM to collect references for llvm.used. This allows places other than cmmProcLlvmGens to generate entries.
* Extend globals to aliasesPeter Wortmann2013-06-271-4/+4
| | | | | Also give them a proper constructor - getGlobalVar and getGlobalValue map directly to the accessors.
* Support QNXNTO for arm under LLVMStephen Paul Weber2013-06-201-0/+3
| | | | Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Add iOS specific module layout entry to LLVM codegen; fixes #7721Ian Lynagh2013-03-021-0/+3
| | | | Patch from Stephen Blackheath
* Added support to cross-compile to androidNathan2013-01-241-0/+3
| | | | Signed-off-by: David Terei <davidterei@gmail.com>
* Remove OldCmm, convert backends to consume new CmmSimon Marlow2012-11-121-1/+1
| | | | | | | | | | | | | | | | | | This removes the OldCmm data type and the CmmCvt pass that converts new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been converted to consume new Cmm. The main difference between the two data types is that conditional branches in new Cmm have both true/false successors, whereas in OldCmm the false case was a fallthrough. To generate slightly better code we occasionally need to invert a conditional to ensure that the branch-not-taken becomes a fallthrough; this was previously done in CmmCvt, and it is now done in CmmContFlowOpt. We could go further and use the Hoopl Block representation for native code, which would mean that we could use Hoopl's postorderDfs and analyses for native code, but for now I've left it as is, using the old ListGraph representation for native code.
* Generate correct LLVM for the new register allocation scheme.Geoffrey Mainland2012-10-301-2/+2
| | | | | | | | | | | | | We now have accurate global register liveness information attached to all Cmm procedures and jumps. With this patch, the LLVM back end uses this information to pass only the live floating point (F and D) registers on tail calls. This makes the LLVM back end compatible with the new register allocation strategy. Ideally the GHC LLVM calling convention would put all registers that are always live first in the parameter sequence. Unfortunately the specification is written so that on x86-64 SpLim (always live) is passed after the R registers. Therefore we must always pass *something* in the R registers, so we pass the LLVM value undef.
* Attach global register liveness info to Cmm procedures.Geoffrey Mainland2012-10-301-1/+1
| | | | | | | All Cmm procedures now include the set of global registers that are live on procedure entry, i.e., the global registers used to pass arguments to the procedure. Only global registers that are use to pass arguments are included in this list.
* Move wORD_SIZE into platformConstantsIan Lynagh2012-09-161-2/+3
|
* Pass DynFlags down to llvmWordIan Lynagh2012-09-161-2/+2
|
* Remove some CPP from llvmGen/LlvmCodeGen/Ppr.hsIan Lynagh2012-08-281-35/+24
| | | | | | I changed the behaviour slightly, e.g. i386/FreeBSD will no longer fall through and use the Linux "i386-pc-linux-gnu", but will get the final empty case instead. I assume that that's the right thing to do.
* Use SDoc rather than Doc in LLVMIan Lynagh2012-06-121-7/+6
| | | | | In particular, this makes life simpler when we want to use a general GHC SDoc in the middle of some LLVM.
* Use Type Based Alias Analysis (TBAA) in LLVM backend (#5567)David Terei2012-01-121-5/+11
| | | | | | | TBAA allows us to specify a type hierachy in metadata with the property that nodes on different branches don't alias. This should somewhat improve the optimizations LLVM does that rely on alias information.
* More improvements to llvm output style (#5750)David Terei2012-01-121-1/+1
|
* Fix trac # 5486David Terei2011-12-051-2/+2
| | | | | | | | | LLVM has a problem when the user imports some FFI types like memcpy and memset in a manner that conflicts with the types that GHC uses internally. So now we pre-initialise the environment with the most general types for these functions.
* More CPP removal: pprDynamicLinkerAsmLabel in CLabelIan Lynagh2011-10-021-3/+3
| | | | And some knock-on changes
* Renaming onlySimon Peyton Jones2011-08-251-4/+4
| | | | | CmmTop -> CmmDecl CmmPgm -> CmmGroup
* enable ARM specific target data layout and triple againKarel Gardas2011-08-211-5/+5
| | | | | | | | | | This patch is allowed by the 'on ARMv7 with VFPv3[D16] support pass appropriate -mattr value to LLVM llc' patch. The trick is that LLVM by default (probably!) does not enable VFP, but GHC requires it so LLVM's llc asserts on unavailable floating point register. i.e. GHC/LLVM backend compiles into LLVM code which is using floats, but llc thinks no float regs for this are available. Passing appropriate llc option which is implemented in patch mentioned above fixes this issue.
* disable for now ARM specific target data layout and tripleKarel Gardas2011-08-101-5/+5
| | | | | | | | This patch disables ARM specific target data layout and triple. The reason for this is that LLVM asserts on some files if this is in use. The assert looks: Formal argument #8 has unhandled type i32UNREACHABLE executed at /llvm-ghc-arm/lib/CodeGen/CallingConvLower.cpp:81!
* fix ARM/LLVM target data layout specification together with target tripleKarel Gardas2011-08-101-2/+2
| | | | | | This patch fixes ARM/LLVM target data layout specification based on what Clang is using itself. I've modified Clang's used triple a little bit from armv4t-* to arm-* though
* LLVM: set target data layout for arm-unknown-linux tripletKarel Gardas2011-08-101-1/+8
|
* Refactoring: explicitly mark whether we have an info table in RawCmmMax Bolingbroke2011-07-061-10/+10
| | | | | | | | | | | | I introduced this to support explicitly recording the info table label in RawCmm for another patch I am working on, but it turned out to lead to significant simplification in those parts of the compiler that consume RawCmm. Now, instead of lots of tests for null [CmmStatic] we have a simple test of a Maybe, and have reduced the number of guys that need to know how to convert entry->info labels by a TON. There are only 3 callers of that function now!
* Refactoring: use a structured CmmStatics type rather than [CmmStatic]Max Bolingbroke2011-07-051-2/+2
| | | | | | | | | | | | | | | | | | I observed that the [CmmStatics] within CmmData uses the list in a very stylised way. The first item in the list is almost invariably a CmmDataLabel. Many parts of the compiler pattern match on this list and fail if this is not true. This patch makes the invariant explicit by introducing a structured type CmmStatics that holds the label and the list of remaining [CmmStatic]. There is one wrinkle: the x86 backend sometimes wants to output an alignment directive just before the label. However, this can be easily fixed up by parameterising the native codegen over the type of CmmStatics (though the GenCmmTop parameterisation) and using a pair (Alignment, CmmStatics) there instead. As a result, I think we will be able to remove CmmAlign and CmmDataLabel from the CmmStatic data type, thus nuking a lot of code and failing pattern matches. This change will come as part of my next patch.
* LLVM: Support LLVM 2.9 (#5103)David Terei2011-05-041-20/+11
| | | | | Instead of using the GNU As subsection feature on Linux/Windows for TNTC we now use the LLVM Mangler on all platforms.
* Merge in new code generator branch.Simon Marlow2011-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This changes the new code generator to make use of the Hoopl package for dataflow analysis. Hoopl is a new boot package, and is maintained in a separate upstream git repository (as usual, GHC has its own lagging darcs mirror in http://darcs.haskell.org/packages/hoopl). During this merge I squashed recent history into one patch. I tried to rebase, but the history had some internal conflicts of its own which made rebase extremely confusing, so I gave up. The history I squashed was: - Update new codegen to work with latest Hoopl - Add some notes on new code gen to cmm-notes - Enable Hoopl lag package. - Add SPJ note to cmm-notes - Improve GC calls on new code generator. Work in this branch was done by: - Milan Straka <fox@ucw.cz> - John Dias <dias@cs.tufts.edu> - David Terei <davidterei@gmail.com> Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD and fixed a few bugs.
* LLVM: Use mangler to fix up stack alignment issues on OSXDavid Terei2010-07-181-5/+20
|
* LLVM: Fix mistype in last commit which broke TNTC under win/linux.David Terei2010-07-141-1/+1
|
* LLVM: Add in new LLVM mangler for implementing TNTC on OSXDavid Terei2010-07-131-11/+23
|
* LLVM: Add alias type defenitions to LlvmModule.David Terei2010-07-071-1/+5
|