|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| | I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future. | 
| | |  | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Simon Marlow spotted that we were #include'ing MachRegs.h several times,
but that doesn't work as (a) it uses ifdeffery to avoid being included
multiple times, and (b) even if we work around that, then the #define's
from previous inclusions are still defined when we #include it again.
So we now put the platform code for each platform in a separate .hs file. | 
| | 
| 
| 
| 
| 
| 
| 
| | This means that we now generate the same code whatever platform we are
on, which should help avoid changes on one platform breaking the build
on another.
It's also another step towards full cross-compilation. | 
| | 
| 
| 
| | No functional differences yet | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is a first step on the way to refactoring the FastString type.
FastBytes currently has no unique, mainly because there isn't currently
a nice way to produce them in Binary.
Also, we don't currently do the "Dictionary" thing with FastBytes in
Binary. I'm not sure whether this is important.
We can change both decisions later, but in the meantime this gets the
refactoring underway. | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | We now carry around with CmmJump statements a list of
the STG registers that are live at that jump site.
This is used by the LLVM backend so it can avoid
unnesecarily passing around dead registers, improving
perfromance. This gives us the framework to finally
fix trac #4308. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This means that both time and heap profiling work for parallel
programs.  Main internal changes:
  - CCCS is no longer a global variable; it is now another
    pseudo-register in the StgRegTable struct.  Thus every
    Capability has its own CCCS.
  - There is a new built-in CCS called "IDLE", which records ticks for
    Capabilities in the idle state.  If you profile a single-threaded
    program with +RTS -N2, you'll see about 50% of time in "IDLE".
  - There is appropriate locking in rts/Profiling.c to protect the
    shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS.  Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs). | 
| | 
| 
| 
| 
| | This field was doing nothing.  I think it originally appeared in a
very old incarnation of the new code generator. | 
| | |  | 
| | 
| 
| 
| 
| 
| | It's still a panic, as it wouldn't be trivial to give a proper error
at the point that we generate it, but it's now a bit nicer:
    Registers above R10 are not supported (tried to use R11) | 
| | 
| 
| 
| | See Note [atomic CAFs] in rts/sm/Storage.c | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | LitInteger now carries around the id of mkInteger, which it uses
to construct the core to build Integer literals. This way we don't
have to build in info about lots of Ids.
We also no longer have any special-casing for integer-simple, so
there is less code involved. | 
| | 
| 
| 
| 
| 
| | We now treat them as literals until CorePrep, when we finally
convert them into the real Core representation. This makes it a lot
simpler to implement built-in rules on them. | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| | CmmTop -> CmmDecl
   CmmPgm -> CmmGroup | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | I observed that the [CmmStatics] within CmmData uses the list in a very stylised way.
The first item in the list is almost invariably a CmmDataLabel. Many parts of the
compiler pattern match on this list and fail if this is not true.
This patch makes the invariant explicit by introducing a structured type CmmStatics
that holds the label and the list of remaining [CmmStatic].
There is one wrinkle: the x86 backend sometimes wants to output an alignment directive just
before the label. However, this can be easily fixed up by parameterising the native codegen
over the type of CmmStatics (though the GenCmmTop parameterisation) and using a pair
(Alignment, CmmStatics) there instead.
As a result, I think we will be able to remove CmmAlign and CmmDataLabel from the CmmStatic
data type, thus nuking a lot of code and failing pattern matches. This change will come as part
of my next patch. | 
| | 
| 
| 
| 
| 
| | assignTemp_ is intended to make sure that the expression gets assigned
to a temporary in case that's needed in order to avoid a register
getting trashed due to a function call. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This changes the new code generator to make use of the Hoopl package
for dataflow analysis.  Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch.  I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
  - Update new codegen to work with latest Hoopl
  - Add some notes on new code gen to cmm-notes
  - Enable Hoopl lag package.
  - Add SPJ note to cmm-notes
  - Improve GC calls on new code generator.
Work in this branch was done by:
   - Milan Straka <fox@ucw.cz>
   - John Dias <dias@cs.tufts.edu>
   - David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs. | 
| | 
| 
| 
| 
| 
| 
| 
| | This is already handled by the Cmm code generator so LLVM is simply
duplicating work. LLVM also doesn't know which ones are actually live
so saves them all which causes a fair performance overhead for C calls
on x64. We stop llvm saving them across the call by storing undef to
them just before the call. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This was done as part of an honours thesis at UNSW, the paper describing the
work and results can be found at:
http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
A Homepage for the backend can be found at:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM
Quick summary of performance is that for the 'nofib' benchmark suite, runtimes
are within 5% slower than the NCG and generally better than the C code
generator.  For some code though, such as the DPH projects benchmark, the LLVM
code generator outperforms the NCG and C code generator by about a 25%
reduction in run times. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Allow a temporary assignment to be pushed past an assignment to a
global if the global is not mentioned in the rhs of the assignment we
are inlining.
This fixes up some bad code.  We should make sure we're doing
something equivalent in the new backend in due course. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The type of the CmmLabel ctor is now
  CmmLabel :: PackageId -> FastString -> CmmLabelInfo -> CLabel
  
 - When you construct a CmmLabel you have to explicitly say what
   package it is in. Many of these will just use rtsPackageId, but
   I've left it this way to remind people not to pretend labels are
   in the RTS package when they're not. 
   
 - When parsing a Cmm file, labels that are not defined in the 
   current file are assumed to be in the RTS package. 
   
   Labels imported like
      import label
   are assumed to be in a generic "foreign" package, which is different
   from the current one.
   
   Labels imported like
      import "package-name" label
   are marked as coming from the named package.
   
   This last one is needed for the integer-gmp library as we want to
   refer to labels that are not in the same compilation unit, but
   are in the same non-rts package.
   
   This should help remove the nasty #ifdef __PIC__ stuff from
   integer-gmp/cbits/gmp-wrappers.cmm | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
 - Rts.h now includes everything that the RTS exposes publicly,
   rather than a random subset of it.
 - Most of the public header files have moved into subdirectories, and
   many of them have been renamed.  But clients should not need to
   include any of the other headers directly, just #include the main
   public headers: Rts.h, HsFFI.h, RtsAPI.h.
 - All the headers needed for via-C compilation have moved into the
   stg subdirectory, which is self-contained.  Most of the headers for
   the rest of the RTS APIs have moved into the rts subdirectory.
 - I left MachDeps.h where it is, because it is so widely used in
   Haskell code.
 
 - I left a deprecated stub for RtsFlags.h in place.  The flag
   structures are now exposed by Rts.h.
 - Various internal APIs are no longer exposed by public header files.
 - Various bits of dead code and declarations have been removed
 - More gcc warnings are turned on, and the RTS code is more
   warning-clean.
 - More source files #include "PosixSource.h", and hence only use
   standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass.  I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We used to generated things like:
    extern StgWordArray (newCAF) __attribute__((aligned (8)));
    ((void (*)(void *))(W_)&newCAF)((void *)R1.w);
(which is to say, pretend that newCAF is some data, then cast it to a
function and call it).
This goes wrong on at least IA64, where:
    A function pointer on the ia64 does not point to the first byte of
    code. Intsead, it points to a structure that describes the function.
    The first quadword in the structure is the address of the first byte
    of code
so we end up dereferencing function pointers one time too many, and
segfaulting. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This merge does not turn on the new codegen (which only compiles
a select few programs at this point),
but it does introduce some changes to the old code generator.
The high bits:
1. The Rep Swamp patch is finally here.
   The highlight is that the representation of types at the
   machine level has changed.
   Consequently, this patch contains updates across several back ends.
2. The new Stg -> Cmm path is here, although it appears to have a
   fair number of bugs lurking.
3. Many improvements along the CmmCPSZ path, including:
   o stack layout
   o some code for infotables, half of which is right and half wrong
   o proc-point splitting | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Eager blackholing can improve parallel performance by reducing the
chances that two threads perform the same computation.  However, it
has a cost: one extra memory write per thunk entry.  
To get the best results, any code which may be executed in parallel
should be compiled with eager blackholing turned on.  But since
there's a cost for sequential code, we make it optional and turn it on
for the parallel package only.  It might be a good idea to compile
applications (or modules) with parallel code in with
-feager-blackholing.
ToDo: document -feager-blackholing. | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | o Moved BlockId stuff to a new file to avoid module recursion
o Defined stack areas for parameter-passing locations and spill slots
o Part way through replacing copy in and copy out nodes
  - added movement instructions for stack pointer
  - added movement instructions for call and return parameters
    (but not with the proper calling conventions)
o Inserting spills and reloads for proc points is now procpoint-aware
  (it was relying on the presence of a CopyIn node as a proxy for
   procpoint knowledge)
o Changed ZipDataflow to expect AGraphs (instead of being polymorphic in
   the type of graph) | 
| | 
| 
| 
| 
| 
| | C-- no longer has 'hints'; to guide parameter passing, it
has 'kinds'.  Renamed type constructor, data constructor, and record
fields accordingly | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| | This allows the instance of UserOfLocalRegs to be within Haskell98, and IMHO
 makes the code a little cleaner generally.
This is one small (though tedious) step towards making GHC's code more
 portable... | 
| | |  |