| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
Nothing from gmp is used in the rts anymore.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need this, or something equivalent, to be able to implement
stgAllocForGMP outside of the rts. That's because we want to use
allocateLocal which allocates from the given capability without
having to take any locks. In the gmp primops we're basically in
an unsafe foreign call, that is a context where we hold a current
capability. So it's safe for us to use allocateLocal. We just
need a way to get the current capability. The method to get the
current capability varies depends on whether we're using the
threaded rts or not. When stgAllocForGMP is built inside the rts
that's ok because we can do it conditionally on THREADED_RTS.
Outside the rts we need a single api we can call without knowing
if we're talking to a threaded rts or not, hence this addition.
|
|
|
|
| |
No longer need them as temp vars in the cmm primop implementations.
|
|
|
|
|
| |
This means that, on Linux, we get functions like gamma defined when we
#include math.h
|
|
|
|
| |
Allows hs_free_fun_ptr() to be called by a separate thread
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
I've also added some missing $s to some makefiles. These aren't
technically necessary, but it's nice to be consistent.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The old Reg type is now split into VirtualReg and RealReg.
* For the graph coloring allocator, the type of the register graph
is now (Graph VirtualReg RegClass RealReg), which shows that it colors
in nodes representing virtual regs with colors representing real regs.
(as was intended)
* RealReg contains two contructors, RealRegSingle and RealRegPair,
where RealRegPair is used to represent a SPARC double reg
constructed from two single precision FP regs.
* On SPARC we can now allocate double regs into an arbitrary register
pair, instead of reserving some reg ranges to only hold float/double values.
|
|
|
|
|
| |
We now also need to cast the values to (unsigned long), as on some
platforms sizeof returns (unsigned int).
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
# -----------------------------------------------------------------------------
#
# (c) 2009 The University of Glasgow
#
# This file is part of the GHC build system.
#
# To understand how the build system works and how to modify it, see
# http://hackage.haskell.org/trac/ghc/wiki/Building/Architecture
# http://hackage.haskell.org/trac/ghc/wiki/Building/Modifying
#
# -----------------------------------------------------------------------------
|
| |
|
| |
|
|
|
|
| |
Also some tidyups and renaming
|
|
|
|
| |
Part of the fix for #3171
|
| |
|
|
|
|
|
|
|
|
|
|
| |
CapabilityNum to CapNo. Added helper functions postCapNo() and postThreadID().
ThreadID was StgWord64, but should have been StgThreadID, which is
currently StgWord32. Changed name from CapabilityNum to CapNo to better
reflect naming in Capability struct where "no" is the capability number.
Modified EventLog.c to use the helper functions postCapNo() and
postThreadID () for CapNo and ThreadID.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the out of memory errors we were getting on sparc
after the following patch:
Fri Mar 13 03:45:16 PDT 2009 Simon Marlow <marlowsd@gmail.com>
* Instead of a separate context-switch flag, set HpLim to zero
Ignore-this: 6c5bbe1ce2c5ef551efe98f288483b0
This reduces the latency between a context-switch being triggered and
the thread returning to the scheduler, which in turn should reduce the
cost of the GC barrier when there are many cores.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate binary log files from the RTS containing a log of runtime
events with timestamps. The log file can be visualised in various
ways, for investigating runtime behaviour and debugging performance
problems. See for example the forthcoming ThreadScope viewer.
New GHC option:
-eventlog (link-time option) Enables event logging.
+RTS -l (runtime option) Generates <prog>.eventlog with
the binary event information.
This replaces some of the tracing machinery we already had in the RTS:
e.g. +RTS -vg for GC tracing (we should do this using the new event
logging instead).
Event logging has almost no runtime cost when it isn't enabled, though
in the future we might add more fine-grained events and this might
change; hence having a link-time option and compiling a separate
version of the RTS for event logging. There's a small runtime cost
for enabling event-logging, for most programs it shouldn't make much
difference.
(Todo: docs)
|
|
|
|
|
|
|
|
|
| |
Since we introduced pointer tagging, we no longer always enter a
closure to evaluate it. However, the biographical profiler relies on
closures being entered in order to mark them as "used", so we were
getting spurious amounts of data attributed to VOID. It turns out
there are various places that need to be fixed, and I think at least
one of them was also wrong before pointer tagging (CgCon.cgReturnDataCon).
|
|
|
|
|
| |
Somebody needs to implement getNumberOfProcessors() for MacOS X,
currently it will return 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New flag: "+RTS -qb" disables load-balancing in the parallel GC
(though this is subject to change, I think we will probably want to do
something more automatic before releasing this).
To get the "PARGC3" configuration described in the "Runtime support
for Multicore Haskell" paper, use "+RTS -qg0 -qb -RTS".
The main advantage of this is that it allows us to easily disable
load-balancing altogether, which turns out to be important in parallel
programs. Maintaining locality is sometimes more important that
spreading the work out in parallel GC. There is a side benefit in
that the parallel GC should have improved locality even when
load-balancing, because each processor prefers to take work from its
own queue before stealing from others.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the latency between a context-switch being triggered and
the thread returning to the scheduler, which in turn should reduce the
cost of the GC barrier when there are many cores.
We still retain the old context_switch flag which is checked at the
end of each block of allocation. The idea is that setting HpLim may
fail if the the target thread is modifying HpLim at the same time; the
context_switch flag is a fallback. It also allows us to "context
switch soon" without forcing an immediate switch, which can be costly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add newAlignedPinnedByteArray# for allocating pinned BAs with
arbitrary alignment
- the old newPinnedByteArray# now aligns to 16 bytes
Foreign.alloca will use newAlignedPinnedByteArray#, and so might end
up wasting less space than before (we used to align to 8 by default).
Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte
aligned memory, which is enough to avoid problems with SSE
instructions on x86, for example.
There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes,
but would have failed if the header was not a multiple of 8
(fortunately it always was, even with profiling). Also we
occasionally wasted some space unnecessarily due to alignment in
allocatePinned().
I haven't done anything about Foreign.malloc/mallocBytes, which will
give you the same alignment guarantees as malloc() (8 bytes on
Linux/x86 here).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The API is the same (for now). The new implementation has the
capability to define signal handlers that have access to the siginfo
of the signal (#592), but this functionality is not exposed in this
patch.
#2451 is the ticket for the new API.
The main purpose of bringing this in now is to fix race conditions in
the old signal handling code (#2858). Later we can enable the new
API in the HEAD.
Implementation differences:
- More of the signal-handling is moved into Haskell. We store the
table of signal handlers in an MVar, rather than having a table of
StablePtrs in the RTS.
- In the threaded RTS, the siginfo of the signal is passed down the
pipe to the IO manager thread, which manages the business of
starting up new signal handler threads. In the non-threaded RTS,
the siginfo of caught signals is stored in the RTS, and the
scheduler starts new signal handler threads.
|
| |
|
| |
|
|
|
|
|
| |
Darwin 9.6.0 + GCC 4.0.1 doesn't understand "msync".
I think "sync" means the same thing.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
In this version, I untag R1 before using it, and even enter R2 at the
end rather than simply returning it (which didn't work right when R2
was a thunk).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pthread_mutex_unlock().
Sun Jan 4 19:24:43 GMT 2009 Matthias Kilian <kili@outback.escape.de>
Don't check pthread_mutex_*lock() only on Linux and/or only if DEBUG
is defined. The return values of those functions are well defined
and should be supported on all operation systems with pthreads. The
checks are cheap enough to do them even in the default build (without
-DDEBUG).
While here, recycle an unused macro ASSERT_LOCK_NOTHELD, and let
the debugBelch part enabled with -DLOCK_DEBUG work independently
of -DDEBUG.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pthread_mutex_unlock().
This patch caused problems on Mac OS X, undoing until we can do it better.
rolling back:
Sun Jan 4 19:24:43 GMT 2009 Matthias Kilian <kili@outback.escape.de>
* Always check the result of pthread_mutex_lock() and pthread_mutex_unlock().
Don't check pthread_mutex_*lock() only on Linux and/or only if DEBUG
is defined. The return values of those functions are well defined
and should be supported on all operation systems with pthreads. The
checks are cheap enough to do them even in the default build (without
-DDEBUG).
While here, recycle an unused macro ASSERT_LOCK_NOTHELD, and let
the debugBelch part enabled with -DLOCK_DEBUG work independently
of -DDEBUG.
M ./includes/OSThreads.h -30 +10
|