summaryrefslogtreecommitdiff
path: root/compiler/cmm
Commit message (Collapse)AuthorAgeFilesLines
...
* Strings and comments only: 'to to ' fixesGabor Greif2013-08-221-1/+1
| | | | I'd still prefer if a native english speaker would check them.
* Only use real XMM registers when assigning arguments.Geoffrey Mainland2013-08-061-5/+4
| | | | | | | | My original change to the calling convention mistakenly used all 6 XMM registers---which live in the global register table---on x86 (32 bit). This royally screwed up the floating point code generated for that platform because floating point arguments were passed in global registers instead of on the stack!
* Rename SSE -> XMM for consistency.Geoffrey Mainland2013-08-061-13/+13
| | | | | We were using SSE is some places and XMM in others. Better to keep a consistent naming scheme.
* Implement "roles" into GHC.Richard Eisenberg2013-08-021-2/+40
| | | | | | | | | | | | | | | | Roles are a solution to the GeneralizedNewtypeDeriving type-safety problem. Roles were first described in the "Generative type abstraction" paper, by Stephanie Weirich, Dimitrios Vytiniotis, Simon PJ, and Steve Zdancewic. The implementation is a little different than that paper. For a quick primer, check out Note [Roles] in Coercion. Also see http://ghc.haskell.org/trac/ghc/wiki/Roles and http://ghc.haskell.org/trac/ghc/wiki/RolesImplementation For a more formal treatment, check out docs/core-spec/core-spec.pdf. This fixes Trac #1496, #4846, #7148.
* Fix a bug in stack layout with safe foreign calls (#8083)Simon Marlow2013-07-246-20/+21
| | | | | | | We weren't properly tracking the number of stack arguments in the continuation of a foreign call. It happened to work when the continuation was not a join point, but when it was a join point we were using the wrong amount of stack fixup.
* Temporarily disable common block elimination; fixes #8083 for nowIan Lynagh2013-07-231-3/+5
|
* Add support for byte endian swapping for Word 16/32/64.Austin Seipp2013-07-172-0/+2
| | | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machop: MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Authored-by: Vincent Hanquez <tab@snarc.org> Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Fix many ASSERT uses under Clang.Austin Seipp2013-06-181-1/+1
| | | | | | Clang doesn't like whitespace between macro and arguments. Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Revert "Add support for byte endian swapping for Word 16/32/64."Simon Peyton Jones2013-06-112-2/+0
| | | | This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
* Add support for byte endian swapping for Word 16/32/64.Ian Lynagh2013-06-092-0/+2
| | | | | | | | | | | | * Exposes bSwap{,16,32,64}# primops * Add a new machops MO_BSwap * Use a Stg implementation (hs_bswap{16,32,64}) for other implementation in NCG. * Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr instead of using xchg. * Generate llvm.bswap intrinsics in llvm codegen. Patch from Vincent Hanquez.
* Implement cardinality analysisSimon Peyton Jones2013-06-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This major patch implements the cardinality analysis described in our paper "Higher order cardinality analysis". It is joint work with Ilya Sergey and Dimitrios Vytiniotis. The basic is augment the absence-analysis part of the demand analyser so that it can tell when something is used never at most once some other way The "at most once" information is used a) to enable transformations, and in particular to identify one-shot lambdas b) to allow updates on thunks to be omitted. There are two new flags, mainly there so you can do performance comparisons: -fkill-absence stops GHC doing absence analysis at all -fkill-one-shot stops GHC spotting one-shot lambdas and single-entry thunks The big changes are: * The Demand type is substantially refactored. In particular the UseDmd is factored as follows data UseDmd = UCall Count UseDmd | UProd [MaybeUsed] | UHead | Used data MaybeUsed = Abs | Use Count UseDmd data Count = One | Many Notice that UCall recurses straight to UseDmd, whereas UProd goes via MaybeUsed. The "Count" embodies the "at most once" or "many" idea. * The demand analyser itself was refactored a lot * The previously ad-hoc stuff in the occurrence analyser for foldr and build goes away entirely. Before if we had build (\cn -> ...x... ) then the "\cn" was hackily made one-shot (by spotting 'build' as special. That's essential to allow x to be inlined. Now the occurrence analyser propagates info gotten from 'build's stricness signature (so build isn't special); and that strictness sig is in turn derived entirely automatically. Much nicer! * The ticky stuff is improved to count single-entry thunks separately. One shortcoming is that there is no DEBUG way to spot if an allegedly-single-entry thunk is acually entered more than once. It would not be hard to generate a bit of code to check for this, and it would be reassuring. But it's fiddly and I have not done it. Despite all this fuss, the performance numbers are rather under-whelming. See the paper for more discussion. nucleic2 -0.8% -10.9% 0.10 0.10 +0.0% sphere -0.7% -1.5% 0.08 0.08 +0.0% -------------------------------------------------------------------------------- Min -4.7% -10.9% -9.3% -9.3% -50.0% Max -0.4% +0.5% +2.2% +2.3% +7.4% Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8% I don't quite know how much credence to place in the runtime changes, but movement seems generally in the right direction.
* Comments and white space onlySimon Peyton Jones2013-06-061-3/+3
|
* Fix the GHC package DLL-splittingIan Lynagh2013-05-141-2/+2
| | | | | | | There's now an internal -dll-split flag, which we use to tell GHC how the GHC package is split into 2 separate DLLs. This is used by Packages.isDllName to determine whether a call is within the same DLL, or whether it is a call to another DLL.
* Make the current module available to labelDynamicIan Lynagh2013-05-131-2/+2
| | | | It doesn't actually use it yet
* Treat foreign imported things in CMM as being in this packageIan Lynagh2013-05-091-1/+1
| | | | | | They used to be treated as being in an exnternal package, which went wrong on Windows (it tried to call them via an imp wrapper, rather than calling them directly).
* In CMM, only allow foreign calls to labels, not arbitrary expressionsIan Lynagh2013-04-244-28/+19
| | | | | | | | | I'm not sure if we want to make this change permanently, but for now it fixes the unreg build. I've also removed some redundant special-case code that generated prototypes for foreign functions. The standard pprTempAndExternDecls now generates them.
* Merge branch 'master' of http://darcs.haskell.org/ghcSimon Peyton Jones2013-04-1913-114/+89
|\
| * Whitespace only in CmmNodeIan Lynagh2013-04-141-21/+14
| |
| * Merge branch 'master' of darcs.haskell.org:/srv/darcs//ghcIan Lynagh2013-04-062-19/+8
| |\
| | * Rewrite usingInconsistentPicReg as a table for clarityGabor Greif2013-04-061-5/+5
| | | | | | | | | | | | No change in functionality intended
| | * Derive instance Eq for CmmNodeGabor Greif2013-04-061-14/+3
| | |
| * | Detab modules with tabs on 5 lines or fewerIan Lynagh2013-04-063-32/+13
| |/
| * Fix typosGabor Greif2013-04-061-3/+3
| |
| * ticky enhancementsNicolas Frisby2013-03-292-14/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info * resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation *of* closure in addition to allocation *by* that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
| * Remove unnecessary warnings suppressions, fixes ticket #7756; thanks ↵Edward Z. Yang2013-03-095-11/+1
| | | | | | | | | | | | monoidal for submitting. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
| * Remove warning-suppression (not needed)Simon Peyton Jones2013-03-091-5/+0
| |
| * Remove unused functions cmmConstrTag, cmmGetTagSimon Peyton Jones2013-03-091-7/+4
| | | | | | | | | | Patch offered by Boris Sukholitko <boriss@gmail.com> Trac #7757
| * commentsSimon Marlow2013-03-051-2/+3
| |
* | Comment onlySimon Peyton Jones2013-04-191-1/+1
|/
* Mimic OldCmm basic block ordering in the LLVM backend.Geoffrey Mainland2013-02-011-1/+30
| | | | | | | | | In OldCmm, the false case of a conditional was a fallthrough. In Cmm, conditionals have both true and false successors. When we convert Cmm to LLVM, we now first re-order Cmm blocks so that the false successor of a conditional occurs next in the list of basic blocks, i.e., it is a fallthrough, just like it (necessarily) did in OldCmm. Surprisingly, this can make a big performance difference.
* Add prefetch primops.Geoffrey Mainland2013-02-012-0/+5
|
* Add support for passing SSE vectors in registers.Geoffrey Mainland2013-02-015-19/+51
| | | | | | | This patch adds support for 6 XMM registers on x86-64 which overlap with the F and D registers and may hold 128-bit wide SIMD vectors. Because there is not a good way to attach type information to STG registers, we aggressively bitcast in the LLVM back-end.
* Add the Int32X4# primitive type and associated primops.Paul Monday2013-02-012-0/+52
|
* Add the Float32X4# primitive type and associated primops.Geoffrey Mainland2013-02-012-0/+59
| | | | | | | | | | | | | This patch lays the groundwork needed for primop support for SIMD vectors. In addition to the groundwork, we add support for the FloatX4# primitive type and associated primops. * Add the FloatX4# primitive type and associated primops. * Add CodeGen support for Float vectors. * Compile vector operations to LLVM vector operations in the LLVM code generator. * Make the x86 native backend fail gracefully when encountering vector primops. * Only generate primop wrappers for vector primops when using LLVM.
* Always pass vector values on the stack.Geoffrey Mainland2013-02-011-10/+24
| | | | | Vector values are now always passed on the stack. This isn't particularly efficient, but it will have to do for now.
* Add a bits128 type to C--.Geoffrey Mainland2013-02-012-0/+5
|
* Add Cmm support for representing 128-bit-wide SIMD vectors.Geoffrey Mainland2013-02-016-15/+89
|
* Merge branch 'master' of http://darcs.haskell.org/ghcSimon Peyton Jones2013-01-301-2/+5
|\ | | | | | | | | Conflicts: compiler/types/Coercion.lhs
| * hopefully fix #7620Simon Marlow2013-01-291-2/+5
| |
* | Merge branch 'master' of http://darcs.haskell.org/ghcSimon Peyton Jones2013-01-243-2/+156
|\ \ | |/
| * Tidy up: move info-table related stuff to CmmInfoSimon Marlow2013-01-233-2/+156
| | | | | | | | Prep for #709
* | Introduce CPR for sum types (Trac #5075)Simon Peyton Jones2013-01-241-1/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main payload of this patch is to extend CPR so that it detects when a function always returns a result constructed with the *same* constructor, even if the constructor comes from a sum type. This doesn't matter very often, but it does improve some things (results below). Binary sizes increase a little bit, I think because there are more wrappers. This with -split-objs. Without split-ojbs binary sizes increased by 6% even for HelloWorld.hs. It's hard to see exactly why, but I think it was because System.Posix.Types.o got included in the linked binary, whereas it didn't before. Program Size Allocs Runtime Elapsed TotalMem fluid +1.8% -0.3% 0.01 0.01 +0.0% tak +2.2% -0.2% 0.02 0.02 +0.0% ansi +1.7% -0.3% 0.00 0.00 +0.0% cacheprof +1.6% -0.3% +0.6% +0.5% +1.4% parstof +1.4% -4.4% 0.00 0.00 +0.0% reptile +2.0% +0.3% 0.02 0.02 +0.0% ---------------------------------------------------------------------- Min +1.1% -4.4% -4.7% -4.7% -15.0% Max +2.3% +0.3% +8.3% +9.4% +50.0% Geometric Mean +1.9% -0.1% +0.6% +0.7% +0.3% Other things in this commit ~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Got rid of the Lattice class in Demand * Refactored the way that products and newtypes are decomposed (no change in functionality)
* Rename all of the 'cmmz' flags and make them more consistent.Austin Seipp2012-12-192-20/+19
| | | | | | | | | | | | | | | | There's only a single compiler backend now, so the 'z' suffix means nothing. Also, the flags were confusingly named ('cmm-foo' vs 'foo-cmm',) and counter-intuitively, '-ddump-cmm' did not do at all what you expected since the new backend went live. Basically, all of the -ddump-cmmz-* flags are now -ddump-cmm-*. Some were renamed to be more consistent. This doesn't update the manual; it already mentions '-ddump-cmm' and that flag implies all the others anyway, which is probably what you want. Signed-off-by: Austin Seipp <mad.one@gmail.com>
* Implement word2Float# and word2Double#Johan Tibell2012-12-132-0/+3
|
* Pessimistically assume that unknown arches can't do unaligned loadsIan Lynagh2012-12-071-0/+3
|
* Tweak commentsIan Lynagh2012-12-021-2/+3
|
* Fix broken -fPIC on Darwin/PPC (#7442)PHO2012-11-241-4/+12
| | | | The workaround described in note [darwin-x86-pic] applies to Darwin/PPC too.
* C backend: put the entry block firstSimon Marlow2012-11-191-1/+1
|
* Code-size optimisation for top-level indirections (#7308)Simon Marlow2012-11-192-2/+12
| | | | | | | | | | | | | | | Top-level indirections are often generated when there is a cast, e.g. foo :: T foo = bar `cast` (some coercion) For these we were generating a full-blown CAF, which is a fair chunk of code. This patch makes these indirections generate a single IND_STATIC closure (4 words) instead. This is exactly what the CAF would evaluate to eventually anyway, we're just shortcutting the whole process.
* C backend: ignore MO_TouchSimon Marlow2012-11-161-0/+2
|