| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Fix Win32 DEBUG warnings
|
|
|
|
| |
wibble
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Big re-hash of the threaded/SMP runtime
This is a significant reworking of the threaded and SMP parts of
the runtime. There are two overall goals here:
- To push down the scheduler lock, reducing contention and allowing
more parts of the system to run without locks. In particular,
the scheduler does not require a lock any more in the common case.
- To improve affinity, so that running Haskell threads stick to the
same OS threads as much as possible.
At this point we have the basic structure working, but there are some
pieces missing. I believe it's reasonably stable - the important
parts of the testsuite pass in all the (normal,threaded,SMP) ways.
In more detail:
- Each capability now has a run queue, instead of one global run
queue. The Capability and Task APIs have been completely
rewritten; see Capability.h and Task.h for the details.
- Each capability has its own pool of worker Tasks. Hence, Haskell
threads on a Capability's run queue will run on the same worker
Task(s). As long as the OS is doing something reasonable, this
should mean they usually stick to the same CPU. Another way to
look at this is that we're assuming each Capability is associated
with a fixed CPU.
- What used to be StgMainThread is now part of the Task structure.
Every OS thread in the runtime has an associated Task, and it
can ask for its current Task at any time with myTask().
- removed RTS_SUPPORTS_THREADS symbol, use THREADED_RTS instead
(it is now defined for SMP too).
- The RtsAPI has had to change; we must explicitly pass a Capability
around now. The previous interface assumed some global state.
SchedAPI has also changed a lot.
- The OSThreads API now supports thread-local storage, used to
implement myTask(), although it could be done more efficiently
using gcc's __thread extension when available.
- I've moved some POSIX-specific stuff into the posix subdirectory,
moving in the direction of separating out platform-specific
implementations.
- lots of lock-debugging and assertions in the runtime. In particular,
when DEBUG is on, we catch multiple ACQUIRE_LOCK()s, and there is
also an ASSERT_LOCK_HELD() call.
What's missing so far:
- I have almost certainly broken the Win32 build, will fix soon.
- any kind of thread migration or load balancing. This is high up
the agenda, though.
- various performance tweaks to do
- throwTo and forkProcess still do not work in SMP mode
|
|
|
|
|
|
|
|
| |
if TARGETPLATFORM differs from HOSTPLATFORM, don't attempt to build
DerivedConstants,h, ghcautoconf.h and GHCConstants.h. If these aren't
present, emit a message to remind the user to copy them from the
target system. Hopefully this should make bootstrapping slightly less
error prone.
|
|
|
|
| |
add protos for HeapStackCheck.cmm:stg_block_blackhole_* entry points
|
|
|
|
| |
DEBUG_FILL_SLOP: don't do anything on SMP, zeroing slop words isn't safe
|
|
|
|
| |
Fix bug in previous commit (fixes recent seg faults in nightly stage2)
|
|
|
|
| |
DEBUG_FILL_SLOP(): fill slop for AP_STACK closures too
|
|
|
|
|
|
| |
Remove the ForeignObj# type, and all its PrimOps. The new efficient
representation of ForeignPtr doesn't use ForeignObj# underneath, and
there seems no need to keep it.
|
|
|
|
|
| |
Tweaks to the GC to improve perforrmance. Might be as much as 10% on
some programs.
|
|
|
|
| |
Fix mulMayOflo() on 64-bit archs. This fixes the arith003 failures on x86_64.
|
|
|
|
|
| |
Move mallocBytesRWX into RtsUtils, rename it to stgMallocBytesRWX, and
export it.
|
|
|
|
| |
declare stg_returnToSchedNotPaused (forgot to commit this yesterday)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After some experiments, it seems like we're stealing too many registers from
newer GCCs on SPARC, leading to
"unable to find a register to spill in class `GENERAL_REGS'"
errors. The fix is to leave l6 and l7 to GCC. Tested with a full 2-stage
bootstrap (including OpenGL/GLUT packages) on SPARC Solaris 8 with GCC 3.4.4.
A test case for this (which I'm too lazy/tired to commit) is:
module Blah ( foo ) where
import Foreign.Ptr ( FunPtr )
type Bar = Int -> Double -> Double -> Double -> IO ()
foreign import ccall unsafe "dynamic" foo :: FunPtr Bar -> Bar
SimonM: MERGE TO STABLE (if nobody yells)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block allocator performance fix: instead of keeping the free list
ordered, keep it doubly-linked, and introduce a new flag BF_FREE so we
can tell when a block is free. We can still coalesce blocks on the
free list because block descriptors are kept consecutively in memory,
so we can tell based on the BF_FREE flag whether to coalesce with the
next higher/lower blocks when freeing a block.
This (almost) make freeChain O(n) rather than O(n^2), and has been
reported to help a lot when dealing with very large heaps.
|
|
|
|
| |
Remove SMP-only fields from STM data structures from non-SMP builds
|
|
|
|
| |
Update STM implementation for SMP builds
|
|
|
|
| |
Move RTS_SUPPORTS_THREADS into RtsConfig.h
|
|
|
|
| |
implement lockClosure properly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Move the call to threadPaused() from the scheduler into STG land,
and put it in a new code fragment (stg_returnToSched) that we pass
through every time we return from STG to the scheduler. Also, the
SAVE_THREAD_STATE() is now in stg_returnToSched which might save a
little code space (at the expense of an extra jump for every return
to the scheduler).
- SMP: when blocking on an MVar, we now wait until the thread has been
made fully safe and placed on the blocked queue of the MVar before
we unlock the MVar. This closes a race whereby another OS thread could
begin waking us up before the current TSO had been properly tidied up.
Fixes one cause of crashes when using MVars with SMP. I still have a
deadlock problem to track down.
|
|
|
|
| |
Slop-filling fixes for SMP/DEBUG
|
|
|
|
| |
Fix unreg build on Windows
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow the amount of idle time which must pass before we force a major
GC to be configured at runtime with the +RTS -I<secs> option.
The idle GC only happens in the threaded RTS, and it is useful because
it can make finalizers run more promptly, and also detect cases of
deadlock. Without the idle GC, Haskell computation must be taking
place in order for finalizers to run or deadlock to be detected, and
the only way some Haskell computation can take place is usually by
in-calls.
+RTS -I0 turns off the idle GC, the default is +RTS -I0.3.
We might need to add more tuning if it turns out that the idle GC is
problematic, for example we don't check how long the GC actually took,
and we should probably back off if major GCs are taking too long and
adversely affecting interactive responsiveness.
|
|
|
|
| |
gcc 4.0.0 fix: don't declare static static_objects as extern
|
|
|
|
| |
Declare checkNurserySanity()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two SMP-related changes:
- New storage manager interface:
bdescr *allocateLocal(StgRegTable *reg, nat words)
which allocates from the current thread's nursery (being careful
not to clash with the heap pointer). It can do this without
taking any locks; the lock only has to be taken if a block needs
to be allocated. allocateLocal() is now used instead of allocate()
in a few PrimOps.
This removes locks from most Integer operations, cutting down
the overhead for SMP a bit more.
To make this work, we have to be able to grab the current thread's
Capability out of thin air (i.e. when called from GMP), so the
Capability subsystem needs to keep a hash from thread IDs to
Capabilities.
- Small MVar optimisation: instead of taking the global
storage-manager lock, do our own locking of MVars with a bit of
inline assembly (x86 only for now).
|
|
|
|
| |
Fix the offsets and macros for AP_STACK closures (was wrong for SMP only)
|
|
|
|
|
| |
StgFunInfoExtra_slow_apply(): convert the slow_apply_offset to a W_
before arithmetic.
|
|
|
|
|
|
| |
Small code-size optimisation: I forgot to add a specialised case for
functions with no argument words (which might happen if the function
takes a void argument, for example).
|
|
|
|
|
|
| |
When using -H<size> in SMP mode, divide the total nursery size amongst
the various nurseries. -H<size> now does something reasonable with
SMP.
|
|
|
|
|
|
|
| |
Hold the sm_mutex around access to the mutable list.
The SMP RTS now seems quite stable, I've run my simple test program
with 64 threads without crashes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[mingw only]
Better handling of I/O request abortions upon throwing an exception
to a Haskell thread. As was, a thread blocked on an I/O request was
simply unblocked, but its corresponding worker thread wasn't notified
that the request had been abandoned.
This manifested itself in GHCi upon Ctrl-C being hit at the prompt -- the
worker thread blocked waiting for input on stdin prior to Ctrl-C would
stick around even though its corresponding Haskell thread had been
thrown an Interrupted exception. The upshot was that the worker would
consume the next character typed in after Ctrl-C, but then just dropping
it. Dealing with this turned out to be even more interesting due to
Win32 aborting any console reads when Ctrl-C/Break events are delivered.
The story could be improved upon (at the cost of portability) by making
the Scheduler able to abort worker thread system calls; as is, requests
are cooperatively abandoned. Maybe later.
Also included are other minor tidyups to Ctrl-C handling under mingw.
Merge to STABLE.
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Now that labels are always prefixed with '&' in .hc code, we have to
fix some sloppiness in the RTS .cmm code. Fortunately it's not too
painful.
- SMP: acquire/release the storage manager lock around
atomicModifyMutVar#. This is a hack: atomicModifyMutVar# isn't
atomic under SMP otherwise, but the SM lock is a large sledgehammer.
I think I'll apply the sledgehammer to the MVar primitives too, for
the time being.
|
|
|
|
|
|
|
| |
SMP: the rest of the changes to support safe thunk entry & updates. I
thought the compiler changes were independent, but I ended up breaking
the HEAD, so I'll have to commit the rest. non-SMP compilation should
not be affected.
|
|
|
|
| |
remove gaps in the closure type sequence, and add a big warning comment
|
|
|
|
|
|
| |
The in_haskell sanity check should be per-Capability rather than global.
I just ran a Haskell program in 8 pthreads simultaneously :-)
|
|
|
|
|
|
|
|
|
| |
Per-task nurseries for SMP. This was kind-of implemented before, but
it's much cleaner now. There is now one *step* per capability, so we
have somewhere to hang the block count. So for SMP, there are simply
multiple instances of generation 0 step 0. The rNursery entry in the
register table now points to the step rather than the head block of
the nurersy.
|
|
|
|
|
|
|
|
|
| |
Support handling signals in the threaded RTS by passing the signal
number down the pipe to the IO manager. This avoids needing
synchronisation in the signal handler.
Signals should now work with -threaded. Since this is a bugfix, I'll
merge the changes into the 6.4 branch.
|
|
|
|
| |
wibble
|
|
|
|
| |
unreg wibble
|
|
|
|
| |
wibble to fix the unreg way
|
|
|
|
| |
wibble
|
|
|
|
| |
Catch up with InfoTable changes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some multi-processor hackery, including
- Don't hang blocked threads off BLACKHOLEs any more, instead keep
them all on a separate queue which is checked periodically for
threads to wake up.
This is good because (a) we don't have to worry about locking the
closure in SMP mode when we want to block on it, and (b) it means
the standard update code doesn't need to wake up any threads or
check for a BLACKHOLE_BQ, simplifying the update code.
The downside is that if there are lots of threads blocked on
BLACKHOLEs, we might have to do a lot of repeated list traversal.
We don't expect this to be common, though. conc023 goes slower
with this change, but we expect most programs to benefit from the
shorter update code.
- Fixing up the Capability code to handle multiple capabilities (SMP
mode), and related changes to get the SMP mode at least building.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Main x86_64 hacking: we have a problem on this arch where binutils
can't generate 64-bit relative relocations (R_X86_64_PC64), which many
of our info-table fields are. So far we've been hacking around it by
putting everything in the text section, but I've decided to adopt
another approach: we'll use explicit 32-bit offset fields on this
platform instead. This is safe in the default "small" memory model
where all symbols are guaranteed to be in the lower 2Gb of the address
space.
NCG changes coming; mangler changes are probably required too.
|
|
|
|
| |
Give prototypes for getAllocations and revertCAFs.
|
|
|
|
| |
unconditionally define oFFSET_StgRegTable_rL1
|
|
|
|
|
|
|
| |
Track size change of alloc_blocks and alloc_blocks_lim.
(They are of type nat, which used to be the same size as W_, but now is
the same size as CInt).
|
|
|
|
|
|
| |
reverse rev 1.4: nat should be unsigned int, not unsigned long. I'm
doing this (a) to fix some printf type errors, and (b) to see what
breaks.
|
|
|
|
|
|
|
|
|
|
| |
* Some preprocessors don't like the C99/C++ '//' comments after a
directive, so use '/* */' instead. For consistency, a lot of '//' in
the include files were converted, too.
* UnDOSified libraries/base/cbits/runProcess.c.
* My favourite sport: Killed $Id$s.
|