| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
The card table is an array of bytes, placed directly following the
actual array data. This means that array reading is unaffected, but
array writing needs to read the array size from the header in order to
find the card table.
We use a bytemap rather than a bitmap, because updating the card table
must be multi-thread safe. Each byte refers to 128 entries of the
array, but this is tunable by changing the constant
MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GC had a two-level structure, G generations each of T steps.
Steps are for aging within a generation, mostly to avoid premature
promotion.
Measurements show that more than 2 steps is almost never worthwhile,
and 1 step is usually worse than 2. In theory fractional steps are
possible, so the ideal number of steps is somewhere between 1 and 3.
GHC's default has always been 2.
We can implement 2 steps quite straightforwardly by having each block
point to the generation to which objects in that block should be
promoted, so blocks in the nursery point to generation 0, and blocks
in gen 0 point to gen 1, and so on.
This commit removes the explicit step structures, merging generations
with steps, thus simplifying a lot of code. Performance is
unaffected. The tunable number of steps is now gone, although it may
be replaced in the future by a way to tune the aging in generation 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of refactoring to remove some of the GC's global
state, as we move towards CPU-local GC.
- allocateLocal() now allocates large objects into the local
nursery, rather than taking a global lock and allocating
then in gen 0 step 0.
- allocatePinned() was still allocating from global storage and
taking a lock each time, now it uses local storage.
(mallocForeignPtrBytes should be faster with -threaded).
- We had a gen 0 step 0, distinct from the nurseries, which are
stored in a separate nurseries[] array. This is slightly strange.
I removed the g0s0 global that pointed to gen 0 step 0, and
removed all uses of it. I think now we don't use gen 0 step 0 at
all, except possibly when there is only one generation. Possibly
more tidying up is needed here.
- I removed the global allocate() function, and renamed
allocateLocal() to allocate().
- the alloc_blocks global is gone. MAYBE_GC() and
doYouWantToGC() now check the local nursery only.
|
|
|
|
|
| |
While fixing #3578 I noticed that this function was just a field
access to StgTRecHeader, so I inlined it manually.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two bugs, and had it not been for the first one we would
not have noticed the second one, so this is quite fortunate.
The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a
pending exception to raise, but don't end up raising it, there was a
missing adjustment to the stack pointer.
The second bug was that this case was actually happening at all: it
ought to be incredibly rare, because the pending exception thread
would have to be killed between us finding it and attempting to raise
the exception. This made me suspicious. It turned out that there was
a race condition on the tso->flags field; multiple threads were
updating this bitmask field non-atomically (one of the bits is the
dirty-bit for the generational GC). The fix is to move the dirty bit
into its own field of the TSO, making the TSO one word larger (sadly).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first phase of this tidyup is focussed on the header files, and in
particular making sure we are exposinng publicly exactly what we need
to, and no more.
- Rts.h now includes everything that the RTS exposes publicly,
rather than a random subset of it.
- Most of the public header files have moved into subdirectories, and
many of them have been renamed. But clients should not need to
include any of the other headers directly, just #include the main
public headers: Rts.h, HsFFI.h, RtsAPI.h.
- All the headers needed for via-C compilation have moved into the
stg subdirectory, which is self-contained. Most of the headers for
the rest of the RTS APIs have moved into the rts subdirectory.
- I left MachDeps.h where it is, because it is so widely used in
Haskell code.
- I left a deprecated stub for RtsFlags.h in place. The flag
structures are now exposed by Rts.h.
- Various internal APIs are no longer exposed by public header files.
- Various bits of dead code and declarations have been removed
- More gcc warnings are turned on, and the RTS code is more
warning-clean.
- More source files #include "PosixSource.h", and hence only use
standard POSIX (1003.1c-1995) interfaces.
There is a lot more tidying up still to do, this is just the first
pass. I also intend to standardise the names for external RTS APIs
(e.g use the rts_ prefix consistently), and declare the internal APIs
as hidden for shared libraries.
|
| |
|
| |
|
|
|
|
| |
No longer need them as temp vars in the cmm primop implementations.
|
| |
|
|
|
|
|
| |
We now also need to cast the values to (unsigned long), as on some
platforms sizeof returns (unsigned int).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
not longer reachable.
Patch originally by Ivan Tomac <tomac@pacific.net.au>, amended by
Simon Marlow:
- mkWeakFinalizer# commoned up with mkWeakFinalizerEnv#
- GC parameters to ALLOC_PRIM fixed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merge does not turn on the new codegen (which only compiles
a select few programs at this point),
but it does introduce some changes to the old code generator.
The high bits:
1. The Rep Swamp patch is finally here.
The highlight is that the representation of types at the
machine level has changed.
Consequently, this patch contains updates across several back ends.
2. The new Stg -> Cmm path is here, although it appears to have a
fair number of bugs lurking.
3. Many improvements along the CmmCPSZ path, including:
o stack layout
o some code for infotables, half of which is right and half wrong
o proc-point splitting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eager blackholing can improve parallel performance by reducing the
chances that two threads perform the same computation. However, it
has a cost: one extra memory write per thunk entry.
To get the best results, any code which may be executed in parallel
should be compiled with eager blackholing turned on. But since
there's a cost for sequential code, we make it optional and turn it on
for the parallel package only. It might be a good idea to compile
applications (or modules) with parallel code in with
-feager-blackholing.
ToDo: document -feager-blackholing.
|
| |
|
|
|
|
|
| |
Fixes a long-standing bug that could in some cases cause sub-optimal
scheduling behaviour.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
There was an accidental endian-dependency in changes related to RET_FUN.
The changes in question weren't strictly necessary - they were left
over from the original workaround for the compacting GC problems, so
I've just reverted those changes in this patch, which should hopefully
fix the PPC problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements pointer tagging as per our ICFP'07 paper "Faster
laziness using dynamic pointer tagging". It improves performance by
10-15% for most workloads, including GHC itself.
The original patches were by Alexey Rodriguez Yakushev
<mrchebas@gmail.com>, with additions and improvements by me. I've
re-recorded the development as a single patch.
The basic idea is this: we use the low 2 bits of a pointer to a heap
object (3 bits on a 64-bit architecture) to encode some information
about the object pointed to. For a constructor, we encode the "tag"
of the constructor (e.g. True vs. False), for a function closure its
arity. This enables some decisions to be made without dereferencing
the pointer, which speeds up some common operations. In particular it
enables us to avoid costly indirect jumps in many cases.
More information in the commentary:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
|
| |
|
| |
|
| |
|
|
|
|
| |
This is a simplification & minor optimisation for GHCi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following changes restore ticky-ticky profiling to functionality
from its formerly bit-rotted state. Sort of. (It got bit-rotted as part
of the switch to the C-- back-end.)
The way that ticky-ticky is supposed to work is documented in Section 5.7
of the GHC manual (though the manual doesn't mention that it hasn't worked
since sometime around 6.0, alas). Changes from this are as follows (which
I'll document on the wiki):
* In the past, you had to build all of the libraries with way=t in order to
use ticky-ticky, because it entailed a different closure layout. No longer.
You still need to do make way=t in rts/ in order to build the ticky RTS,
but you should now be able to mix ticky and non-ticky modules.
* Some of the counters that worked in the past aren't implemented yet.
I was originally just trying to get entry counts to work, so those should
be correct. The list of counters was never documented in the first place,
so I hope it's not too much of a disaster that some don't appear anymore.
Someday, someone (perhaps me) should document all the counters and what
they do. For now, all of the counters are either accurate (or at least as
accurate as they always were), zero, or missing from the ticky profiling
report altogether.
This hasn't been particularly well-tested, but these changes shouldn't
affect anything except when compiling with -fticky-ticky (famous last
words...)
Implementation details:
I got rid of StgTicky.h, which in the past had the macros and declarations
for all of the ticky counters. Now, those macros are defined in Cmm.h.
StgTicky.h was still there for inclusion in C code. Now, any remaining C
code simply cannot call the ticky macros -- or rather, they do call those
macros, but from the perspective of C code, they're defined as no-ops.
(This shouldn't be too big a problem.)
I added a new file TickyCounter.h that has all the declarations for ticky
counters, as well as dummy macros for use in C code. Someday, these
declarations should really be automatically generated, since they need
to be kept consistent with the macros defined in Cmm.h.
Other changes include getting rid of the header that was getting added to
closures before, and getting rid of various code having to do with eager
blackholing and permanent indirections (the changes under compiler/
and rts/Updates.*).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for parallel GC, split up the monolithic GC.c file into
smaller parts. Also in this patch (and difficult to separate,
unfortunatley):
- Don't include Stable.h in Rts.h, instead just include it where
necessary.
- consistently use STATIC_INLINE in source files, and INLINE_HEADER
in header files. STATIC_INLINE is now turned off when DEBUG is on,
to make debugging easier.
- The GC no longer takes the get_roots function as an argument.
We weren't making use of this generalisation.
|
| |
|
|
|
|
|
|
|
|
|
| |
Fixed version of an old patch by Simon Marlow. His description read:
Also, now an arbitrarily short context switch interval may now be
specified, as we increase the RTS ticker's resolution to match the
requested context switch interval. This also applies to +RTS -i (heap
profiling) and +RTS -I (the idle GC timer). +RTS -V is actually only
required for increasing the resolution of the profile timer.
|
|
|
|
| |
So that we can build the RTS with the NCG.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes throwTo work with -threaded, and also refactors large
parts of the concurrency support in the RTS to clean things up. We
have some new files:
RaiseAsync.{c,h} asynchronous exception support
Threads.{c,h} general threading-related utils
Some of the contents of these new files used to be in Schedule.c,
which is smaller and cleaner as a result of the split.
Asynchronous exception support in the presence of multiple running
Haskell threads is rather tricky. In fact, to my annoyance there are
still one or two bugs to track down, but the majority of the tests run
now.
|
|
Most of the other users of the fptools build system have migrated to
Cabal, and with the move to darcs we can now flatten the source tree
without losing history, so here goes.
The main change is that the ghc/ subdir is gone, and most of what it
contained is now at the top level. The build system now makes no
pretense at being multi-project, it is just the GHC build system.
No doubt this will break many things, and there will be a period of
instability while we fix the dependencies. A straightforward build
should work, but I haven't yet fixed binary/source distributions.
Changes to the Building Guide will follow, too.
|