| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
| |
It was assuming that long's are word-sized, which is not the case on
Win64.
|
| |
|
|
|
|
|
| |
We may need to do this differently once we get as far as building the
RTS in the dyn ways.
|
| |
|
|
|
|
| |
Convert some sizes, as CLong is a different size to pointers
|
|
|
|
| |
e.g. Win64
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Patchset from Stephen Blackheath <stephen.blackheath@ipwnstudios.com>
|
|
|
|
|
|
|
| |
I don't think it was causing any problems, but
TimeToUS(x+y)
would have evaluated to
x + (y / 1000)
|
| |
|
|
|
|
|
| |
I haven't been able to test whether this works or not due to #5754,
but at least it doesn't appear to break anything.
|
|
|
|
|
| |
This is working towards being able to put ghcautoconf.h and
ghcplatform.h in includes/dist
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The sizes obtained this way do not work on a target system in general.
So in a future cross-compilable setup we need another way of obtaining
expansions for the macros OFFSET, FIELD_SIZE and TYPE_SIZE.
Guarded against accidental use of 'sizeof' by poisoning.
Verified that the generated *Constants.h/hs files are unchanged.
|
|
|
|
| |
Needed by #5357
|
|
|
|
| |
Needed by #5357
|
|
|
|
|
|
| |
pseudo-register
Needed by #5357
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an experimental tweak to the parallel GC that avoids waking up
a Capability to do parallel GC if we know that the capability has been
idle for a (tunable) number of GC cycles. The idea is that if you're
only using a few Capabilities, there's no point waking up the ones
that aren't busy.
e.g. +RTS -qi3
says "A Capability will participate in parallel GC if it was running
at all since the last 3 GC cycles."
Results are a bit hit and miss, and I don't completely understand why
yet. Hence, for now it is turned off by default, and also not
documented except in the +RTS -? output.
|
| |
|
|
|
|
|
|
|
|
| |
The primitive array types, such as 'ByteArray#', have kind #, but are represented by pointers. They are boxed, but unpointed types (i.e., they cannot be 'undefined').
The two categories of array types —[Mutable]Array# and [Mutable]ByteArray#— are containers for unboxed (and unpointed) as well as for boxed and pointed types. So far, we lacked support for containers for boxed, unpointed types (i.e., containers for the primitive arrays themselves). This is what the new primtypes provide.
Containers for boxed, unpointed types are crucial for the efficient implementation of scattered nested arrays, which are central to the new DPH backend library dph-lifted-vseg. Without such containers, we cannot eliminate all unboxing from the inner loops of traversals processing scattered nested arrays.
|
|
|
|
|
| |
At present the number of capabilities can only be *increased*, not
decreased. The latter presents a few more challenges!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider this experimental for the time being. There are a lot of
things that could go wrong, but I've verified that at least it works
on the test cases we have.
I also did some API cleanups while I was here. Previously we had:
Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret);
but this API is particularly error-prone: if you forget to discard the
Capability * you passed in and use the return value instead, then
you're in for subtle bugs with +RTS -N later on. So I changed all
these functions to this form:
void rts_eval (/* inout */ Capability **cap,
/* in */ HaskellObj p,
/* out */ HaskellObj *ret)
It's much harder to use this version incorrectly, because you have to
pass the Capability in by reference.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Attach a SrcSpan to every CostCentre. This had the side effect
that CostCentres that used to be merged because they had the same
name are now considered distinct; so I had to add a Unique to
CostCentre to give them distinct object-code symbols.
- New flag: -fprof-auto-calls. This flag adds an automatic SCC to
every call site (application, to be precise). This is typically
more useful for call stacks than annotating whole functions.
Various tidy-ups at the same time: removed unused NoCostCentre
constructor, and refactored a bit in Coverage.lhs.
The call stack we get from traceStack now looks like this:
Stack trace:
Main.CAF (<entire-module>)
Main.main.xs (callstack002.hs:18:12-24)
Main.map (callstack002.hs:13:12-16)
Main.map.go (callstack002.hs:15:21-34)
Main.map.go (callstack002.hs:15:21-23)
Main.f (callstack002.hs:10:7-43)
|
| | |
|
|/
|
|
|
|
|
|
|
|
| |
When they existed, they were getting included in the includes_H_FILES
variable (as it uses wildcard to find all header files). But the
.depends files for the programs that generate the headers depend on
$(includes_H_FILES), so the .depends files looked out-of-date once the
headers had been created. This caused unnecessary make reinvocations.
So now we put them in dist* directories, where they ought to be anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parallel GC was using setContextSwitches() to stop all the other
threads, which sets the context_switch flag on every Capability. That
had the side effect of causing every Capability to also switch
threads, and since GCs can be much more frequent than context
switches, this increased the context switch frequency. When context
switches are expensive (because the switch is between two bound
threads or a bound and unbound thread), the difference is quite
noticeable.
The fix is to have a separate flag to indicate that a Capability
should stop and return to the scheduler, but not switch threads. I've
called this the "interrupt" flag.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Terminology cleanup: the type "Ticks" has been renamed "Time", which
is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds).
The terminology "tick" is now used consistently to mean the interval
between timer signals.
The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if
we have it). Before it used CPU time in the non-threaded RTS and
realtime in the threaded RTS, but I've discovered that the CPU timer
has terrible resolution (at least on Linux) and isn't much use for
profiling. So now we always use realtime. This should also fix
The default tick interval is now 10ms, except when profiling where we
drop it to 1ms. This gives more accurate profiles without affecting
runtime too much (<1%).
Lots of cleanups - the resolution of Time is now in one place
only (Rts.h) rather than having calculations that depend on the
resolution scattered all over the RTS. I hope I found them all.
|
|
|
|
| |
hppa1, m68k
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds support to the autoconf scripts to detect
when we are using a C compiler that uses an LLVM back end.
An LLVM back end does not support all of the extensions use
by GCC, so we need to perform some conditional compilation
in the runtime, particularly for handling thread local
storage and global register variables.
The changes here will set the CC_LLVM_BACKEND in the
autoconf scripts if we detect an llvm-based compiler. We use
this variable to define the llvm_CC_FLAVOR variable that we
can use in the runtime code to conditionally compile for
LLVM.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
LLVM does not support the __thread attribute for thread
local storage and may generate incorrect code for global
register variables. We want to allow building the runtime with
LLVM-based compilers such as llvm-gcc and clang,
particularly for MacOS.
This patch changes the gct variable used by the garbage
collector to use pthread_getspecific() for thread local
storage when an llvm based compiler is used to build the
runtime.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We avoid calling "rm -rf" with no file arguments; this fixes cleaning
on Solaris, where that fails.
We also check for suspicious arguments: anything containing "..",
starting "/", or containing a "*" (you need to call $(wildcard ...)
yourself now if you really want globbing). This should make things
a little safer.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than have main() be statically compiled as part of the RTS, we
now generate it into the tiny C file that we compile when linking a
binary.
The main motivation is that we want to pass the settings for the
-rtsotps and -with-rtsopts flags into the RTS, rather than relying on
fragile linking semantics to override the defaults, which don't work
with DLLs on Windows (#5373). In order to do this, we need to extend
the API for initialising the RTS, so now we have:
void hs_init_ghc (int *argc, char **argv[], // program arguments
RtsConfig rts_config); // RTS configuration
hs_init_ghc() can optionally be used instead of hs_init(), and allows
passing in configuration options for the RTS. RtsConfig is a struct,
which currently has two fields:
typedef struct {
RtsOptsEnabledEnum rts_opts_enabled;
const char *rts_opts;
} RtsConfig;
but might have more in the future. There is a default value for the
struct, defaultRtsConfig, the idea being that you start with this and
override individual fields as necessary.
In fact, main() was in a separate static library, libHSrtsmain.a.
That's now gone.
|
| | |
|
| |
| |
| |
| |
| |
| | |
The existing GHC.Conc.labelThread will now also emit the the thread
label into the eventlog. Profiling tools like ThreadScope could then
use the thread labels rather than thread numbers.
|