delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Per-thread allocation counters and limits"	Simon Marlow	2014-05-04	1	-8/+0
\| \| \| \| \| \| \| \|	Problems were found on 32-bit platforms, I'll commit again when I have a fix. This reverts the following commits: 54b31f744848da872c7c6366dea840748e01b5cf b0534f78a73f972e279eed4447a5687bd6a8308e
*	Per-thread allocation counters and limits	Simon Marlow	2014-05-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This tracks the amount of memory allocation by each thread in a counter stored in the TSO. Optionally, when the counter drops below zero (it counts down), the thread can be sent an asynchronous exception: AllocationLimitExceeded. When this happens, given a small additional limit so that it can handle the exception. See documentation in GHC.Conc for more details. Allocation limits are similar to timeouts, but - timeouts use real time, not CPU time. Allocation limits do not count anything while the thread is blocked or in foreign code. - timeouts don't re-trigger if the thread catches the exception, allocation limits do. - timeouts can catch non-allocating loops, if you use -fno-omit-yields. This doesn't work for allocation limits. I couldn't measure any impact on benchmarks with these changes, even for nofib/smp.
*	Globally replace "hackage.haskell.org" with "ghc.haskell.org"	Simon Marlow	2013-10-01	1	-1/+1
\|
*	Another overhaul of the recent_activity / idle GC handling (#5991)	Simon Marlow	2012-09-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements: - we now turn off the timer signal in the non-threaded RTS after idleGCDelay. This should make the xmonad users on #5991 happy. - we now turn off the timer signal after idleGCDelay even if the idle GC is disabled with +RTS -I0. - we now do not turn off the timer when profiling. - more comments to explain the meaning of the various ACTIVITY_* values
*	use idiomatic (GHC) types	Gabor Greif	2012-02-27	1	-6/+6
\|
*	New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GC	Simon Marlow	2011-12-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
*	Time handling overhaul	Simon Marlow	2011-11-25	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
*	Add an RTS eventlog tracing class for user messages	Duncan Coutts	2011-10-27	1	-0/+1
\| \| \| \|	Enables people to turn them on/off. Defaults to on.
*	Add new fully-accurate per-spark trace/eventlog events	Duncan Coutts	2011-07-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events covering all stages of the spark lifcycle: create, dud, overflow, run, steal, fizzle, gc The sampled spark events are still available. There are now two event classes for sparks, the sampled and the fully accurate. They can be enabled/disabled independently. By default +RTS -l includes the sampled but not full detail spark events. Use +RTS -lf-p to enable the detailed 'f' and disable the sampled 'p' spark. Includes work by Mikolaj <mikolaj.konarski@gmail.com>
*	Move GC tracing into a separate trace class	Duncan Coutts	2011-07-18	1	-0/+1
\| \| \| \| \| \| \|	Previously GC was included in the scheduler trace class. It can be enabled specifically with +RTS -vg or -lg, though note that both -v and -l on their own now default to a sensible set of trace classes, currently: scheduler, gc and sparks.
*	add a new trace class for spark events	Duncan Coutts	2011-07-18	1	-1/+1
\|
*	prog_argv and rts_argv now contain copies of the args passed to	Simon Marlow	2011-05-25	1	-2/+2
\| \| \| \| \| \|	setupRtsFlags(), rather than sharing the memory. Previously if the caller of hs_init() passed in dynamically-allocated memory and then freed it, random crashes could happen later (#5177).
*	Cleanup sweep and fix a bug in RTS flag processing.	Simon Marlow	2011-04-12	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	This code has accumulated a great deal of cruft over the years, this pass cleans up a lot of the surrounding cruft but leaves the actual argument processing alone - so there's still more that could be done. Bug fixed: - ghc_rts_opts should not be subject to the --rtsopts setting. If the programmer explicitly declares options with ghc_rts_opts, they shouldn't also have to accept command-line RTS options to make them work.
*	+RTS -qw hasn't done anything since 7.0.1; remove the implementation & docs	Simon Marlow	2011-02-01	1	-2/+1
\| \| \| \|	It is still (silently) accepted for backwards compatibility.
*	Implement stack chunks and separate TSO/STACK objects	Simon Marlow	2010-12-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
*	Add support for collecting PAPI native events	dmp@rice.edu	2010-06-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch extends the PAPI support in the RTS to allow collection of native events. PAPI can collect data for native events that are exposed by the hardware beyond the PAPI present events. The native events supported on your hardware can found by using the papi_native_avail tool. The RTS already allows users to specify PAPI preset events from the command line. This patch extends that support to allow users to specify native events. The changes needed are: 1) New option (#) for the RTS PAPI flag for native events. For example, to collect the native event 0x40000000, use ./a.out +RTS -a#0x40000000 -sstderr 2) Update the PAPI_FLAGS struct to store whether the user specified event is a papi preset or a native event 3) Update init_countable_events function to add the native events after parsing the event code and decoding the name using PAPI_event_code_to_name
*	Implement a new heap-tuning option: -H	Simon Marlow	2009-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-H alone causes the RTS to use a larger nursery, but without exceeding the amount of memory that the application is already using. It trades off GC time against locality: the default setting is to use a fixed-size 512k nursery, but this is sometimes worse than using a very large nursery despite the worse locality. Not all programs get faster, but some programs that use large heaps do much better with -H. e.g. this helps a lot with #3061 (binary-trees), though not as much as specifying -H<large>. Typically using -H<large> is better than plain -H, because the runtime doesn't know ahead of time how much memory you want to use. Should -H be on by default? I'm not sure, it makes some programs go slower, but others go faster.
*	Add a way to generate tracing events programmatically	Simon Marlow	2009-09-25	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	added: primop TraceEventOp "traceEvent#" GenPrimOp Addr# -> State# s -> State# s { Emits an event via the RTS tracing framework. The contents of the event is the zero-terminated byte string passed as the first argument. The event will be emitted either to the .eventlog file, or to stderr, depending on the runtime RTS flags. } and added the required RTS functionality to support it. Also a bit of refactoring in the RTS tracing code.
*	Improve the default parallel GC settings, and sanitise the flags (#3340)	Simon Marlow	2009-09-15	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flags (from +RTS -?): -qg[<n>] Use parallel GC only for generations >= <n> (default: 0, -qg alone turns off parallel GC) -qb[<n>] Use load-balancing in the parallel GC only for generations >= <n> (default: 1, -qb alone turns off load-balancing) these are good defaults for most parallel programs. Single-threaded programs that want to make use of parallel GC will probably want +RTS -qg1 (this is documented). I've also updated the docs.
*	Unify event logging and debug tracing.	Simon Marlow	2009-08-29	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- tracing facilities are now enabled with -DTRACING, and -DDEBUG additionally enables debug-tracing. -DEVENTLOG has been removed. - -debug now implies -eventlog - events can be printed to stderr instead of being sent to the binary .eventlog file by adding +RTS -v (which is implied by the +RTS -Dx options). - -Dx debug messages can be sent to the binary .eventlog file by adding +RTS -l. This should help debugging by reducing the impact of debug tracing on execution time. - Various debug messages that duplicated the information in events have been removed.
*	Declare RTS-private prototypes with __attribute__((visibility("hidden")))	Simon Marlow	2009-08-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This has no effect with static libraries, but when the RTS is in a shared library it does two things: - it prevents the function from being exposed by the shared library - internal calls to the function can use the faster non-PLT calls, because the function cannot be overriden at link time.
*	Tidy up file headers and copyrights; point to the wiki for docs	Simon Marlow	2009-08-25	1	-1/+6
\| \| \| \| \| \| \|	I've updated the wiki page about the RTS headers http://hackage.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes to reflect the new layout and explain some of the rationale. All the header files now point to this page.
*	RTS tidyup sweep, first phase	Simon Marlow	2009-08-02	1	-0/+239
	The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.