| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
I'd still prefer if a native english speaker would check them.
|
|
|
|
|
|
|
|
| |
My original change to the calling convention mistakenly used all 6 XMM
registers---which live in the global register table---on x86 (32 bit). This
royally screwed up the floating point code generated for that platform because
floating point arguments were passed in global registers instead of on the
stack!
|
|
|
|
|
| |
We were using SSE is some places and XMM in others. Better to keep a consistent
naming scheme.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Roles are a solution to the GeneralizedNewtypeDeriving type-safety
problem.
Roles were first described in the "Generative type abstraction" paper,
by Stephanie Weirich, Dimitrios Vytiniotis, Simon PJ, and Steve Zdancewic.
The implementation is a little different than that paper. For a quick
primer, check out Note [Roles] in Coercion. Also see
http://ghc.haskell.org/trac/ghc/wiki/Roles
and
http://ghc.haskell.org/trac/ghc/wiki/RolesImplementation
For a more formal treatment, check out docs/core-spec/core-spec.pdf.
This fixes Trac #1496, #4846, #7148.
|
|
|
|
|
|
|
| |
We weren't properly tracking the number of stack arguments in the
continuation of a foreign call. It happened to work when the
continuation was not a join point, but when it was a join point we
were using the wrong amount of stack fixup.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machop: MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Authored-by: Vincent Hanquez <tab@snarc.org>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
| |
Clang doesn't like whitespace between macro and arguments.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
| |
This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machops MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Patch from Vincent Hanquez.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This major patch implements the cardinality analysis described
in our paper "Higher order cardinality analysis". It is joint
work with Ilya Sergey and Dimitrios Vytiniotis.
The basic is augment the absence-analysis part of the demand
analyser so that it can tell when something is used
never
at most once
some other way
The "at most once" information is used
a) to enable transformations, and
in particular to identify one-shot lambdas
b) to allow updates on thunks to be omitted.
There are two new flags, mainly there so you can do performance
comparisons:
-fkill-absence stops GHC doing absence analysis at all
-fkill-one-shot stops GHC spotting one-shot lambdas
and single-entry thunks
The big changes are:
* The Demand type is substantially refactored. In particular
the UseDmd is factored as follows
data UseDmd
= UCall Count UseDmd
| UProd [MaybeUsed]
| UHead
| Used
data MaybeUsed = Abs | Use Count UseDmd
data Count = One | Many
Notice that UCall recurses straight to UseDmd, whereas
UProd goes via MaybeUsed.
The "Count" embodies the "at most once" or "many" idea.
* The demand analyser itself was refactored a lot
* The previously ad-hoc stuff in the occurrence analyser for foldr and
build goes away entirely. Before if we had build (\cn -> ...x... )
then the "\cn" was hackily made one-shot (by spotting 'build' as
special. That's essential to allow x to be inlined. Now the
occurrence analyser propagates info gotten from 'build's stricness
signature (so build isn't special); and that strictness sig is
in turn derived entirely automatically. Much nicer!
* The ticky stuff is improved to count single-entry thunks separately.
One shortcoming is that there is no DEBUG way to spot if an
allegedly-single-entry thunk is acually entered more than once. It
would not be hard to generate a bit of code to check for this, and it
would be reassuring. But it's fiddly and I have not done it.
Despite all this fuss, the performance numbers are rather under-whelming.
See the paper for more discussion.
nucleic2 -0.8% -10.9% 0.10 0.10 +0.0%
sphere -0.7% -1.5% 0.08 0.08 +0.0%
--------------------------------------------------------------------------------
Min -4.7% -10.9% -9.3% -9.3% -50.0%
Max -0.4% +0.5% +2.2% +2.3% +7.4%
Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8%
I don't quite know how much credence to place in the runtime changes,
but movement seems generally in the right direction.
|
| |
|
|
|
|
|
|
|
| |
There's now an internal -dll-split flag, which we use to tell GHC how
the GHC package is split into 2 separate DLLs. This is used by
Packages.isDllName to determine whether a call is within the same
DLL, or whether it is a call to another DLL.
|
|
|
|
| |
It doesn't actually use it yet
|
|
|
|
|
|
| |
They used to be treated as being in an exnternal package, which went
wrong on Windows (it tried to call them via an imp wrapper, rather
than calling them directly).
|
|
|
|
|
|
|
|
|
| |
I'm not sure if we want to make this change permanently, but for now it
fixes the unreg build.
I've also removed some redundant special-case code that generated
prototypes for foreign functions. The standard pprTempAndExternDecls
now generates them.
|
|\ |
|
| | |
|
| |\ |
|
| | |
| | |
| | |
| | | |
No change in functionality intended
|
| | | |
|
| |/ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* the new StgCmmArgRep module breaks a dependency cycle; I also
untabified it, but made no real changes
* updated the documentation in the wiki and change the user guide to
point there
* moved the allocation enters for ticky and CCS to after the heap check
* I left LDV where it was, which was before the heap check at least
once, since I have no idea what it is
* standardized all (active?) ticky alloc totals to bytes
* in order to avoid double counting StgCmmLayout.adjustHpBackwards
no longer bumps ALLOC_HEAP_ctr
* I resurrected the SLOW_CALL counters
* the new module StgCmmArgRep breaks cyclic dependency between
Layout and Ticky (which the SLOW_CALL counters cause)
* renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL
* added ALLOC_RTS_ctr and _tot ticky counters
* eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap_*_info
* resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and
ALLOC_PRIM
* added -ticky and -DTICKY_TICKY in ways.mk for debug ways
* added a ticky counter for total LNE entries
* new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE
* all off by default
* -ticky-allocd: tracks allocation *of* closure in addition to
allocation *by* that closure
* -ticky-dyn-thunk tracks dynamic thunks as if they were functions
* -ticky-LNE tracks LNEs as if they were functions
* updated the ticky report format, including making the argument
categories (more?) accurate again
* the printed name for things in the report include the unique of
their ticky parent as well as if they are not top-level
|
| |
| |
| |
| |
| |
| | |
monoidal for submitting.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| | |
|
| |
| |
| |
| |
| | |
Patch offered by Boris Sukholitko <boriss@gmail.com>
Trac #7757
|
| | |
|
|/ |
|
|
|
|
|
|
|
|
|
| |
In OldCmm, the false case of a conditional was a fallthrough. In Cmm,
conditionals have both true and false successors. When we convert Cmm to LLVM,
we now first re-order Cmm blocks so that the false successor of a conditional
occurs next in the list of basic blocks, i.e., it is a fallthrough, just like it
(necessarily) did in OldCmm. Surprisingly, this can make a big performance
difference.
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for 6 XMM registers on x86-64 which overlap with the F
and D registers and may hold 128-bit wide SIMD vectors. Because there is not a
good way to attach type information to STG registers, we aggressively bitcast in
the LLVM back-end.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lays the groundwork needed for primop support for SIMD vectors. In
addition to the groundwork, we add support for the FloatX4# primitive type and
associated primops.
* Add the FloatX4# primitive type and associated primops.
* Add CodeGen support for Float vectors.
* Compile vector operations to LLVM vector operations in the LLVM code
generator.
* Make the x86 native backend fail gracefully when encountering vector primops.
* Only generate primop wrappers for vector primops when using LLVM.
|
|
|
|
|
| |
Vector values are now always passed on the stack. This isn't particularly
efficient, but it will have to do for now.
|
| |
|
| |
|
|\
| |
| |
| |
| | |
Conflicts:
compiler/types/Coercion.lhs
|
| | |
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Prep for #709
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main payload of this patch is to extend CPR so that it
detects when a function always returns a result constructed
with the *same* constructor, even if the constructor comes from
a sum type. This doesn't matter very often, but it does improve
some things (results below).
Binary sizes increase a little bit, I think because there are more
wrappers. This with -split-objs. Without split-ojbs binary sizes
increased by 6% even for HelloWorld.hs. It's hard to see exactly why,
but I think it was because System.Posix.Types.o got included in the
linked binary, whereas it didn't before.
Program Size Allocs Runtime Elapsed TotalMem
fluid +1.8% -0.3% 0.01 0.01 +0.0%
tak +2.2% -0.2% 0.02 0.02 +0.0%
ansi +1.7% -0.3% 0.00 0.00 +0.0%
cacheprof +1.6% -0.3% +0.6% +0.5% +1.4%
parstof +1.4% -4.4% 0.00 0.00 +0.0%
reptile +2.0% +0.3% 0.02 0.02 +0.0%
----------------------------------------------------------------------
Min +1.1% -4.4% -4.7% -4.7% -15.0%
Max +2.3% +0.3% +8.3% +9.4% +50.0%
Geometric Mean +1.9% -0.1% +0.6% +0.7% +0.3%
Other things in this commit
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Got rid of the Lattice class in Demand
* Refactored the way that products and newtypes are
decomposed (no change in functionality)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's only a single compiler backend now, so the 'z' suffix means
nothing. Also, the flags were confusingly named ('cmm-foo' vs
'foo-cmm',) and counter-intuitively, '-ddump-cmm' did not do at all what
you expected since the new backend went live.
Basically, all of the -ddump-cmmz-* flags are now -ddump-cmm-*. Some were
renamed to be more consistent.
This doesn't update the manual; it already mentions '-ddump-cmm' and
that flag implies all the others anyway, which is probably what you
want.
Signed-off-by: Austin Seipp <mad.one@gmail.com>
|
| |
|
| |
|
| |
|
|
|
|
| |
The workaround described in note [darwin-x86-pic] applies to Darwin/PPC too.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Top-level indirections are often generated when there is a cast, e.g.
foo :: T
foo = bar `cast` (some coercion)
For these we were generating a full-blown CAF, which is a fair chunk
of code.
This patch makes these indirections generate a single IND_STATIC
closure (4 words) instead. This is exactly what the CAF would
evaluate to eventually anyway, we're just shortcutting the whole
process.
|
| |
|