delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fix #650: use a card table to mark dirty sections of mutable arrays	Simon Marlow	2009-12-17	3	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \|	The card table is an array of bytes, placed directly following the actual array data. This means that array reading is unaffected, but array writing needs to read the array size from the header in order to find the card table. We use a bytemap rather than a bitmap, because updating the card table must be multi-thread safe. Each byte refers to 128 entries of the array, but this is tunable by changing the constant MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
*	Expose all EventLog events as DTrace probes	Manuel M T Chakravarty	2009-12-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Defines a DTrace provider, called 'HaskellEvent', that provides a probe for every event of the eventlog framework. - In contrast to the original eventlog, the DTrace probes are available in all flavours of the runtime system (DTrace probes have virtually no overhead if not enabled); when -DTRACING is defined both the regular event log as well as DTrace probes can be used. - Currently, Mac OS X only. User-space DTrace probes are implemented differently on Mac OS X than in the original DTrace implementation. Nevertheless, it shouldn't be too hard to enable these probes on other platforms, too. - Documentation is at http://hackage.haskell.org/trac/ghc/wiki/DTrace
*	Correction to the allocation stats following earlier refactoring	Simon Marlow	2009-12-04	1	-1/+2
\|
*	GC refactoring, remove "steps"	Simon Marlow	2009-12-03	2	-41/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GC had a two-level structure, G generations each of T steps. Steps are for aging within a generation, mostly to avoid premature promotion. Measurements show that more than 2 steps is almost never worthwhile, and 1 step is usually worse than 2. In theory fractional steps are possible, so the ideal number of steps is somewhere between 1 and 3. GHC's default has always been 2. We can implement 2 steps quite straightforwardly by having each block point to the generation to which objects in that block should be promoted, so blocks in the nursery point to generation 0, and blocks in gen 0 point to gen 1, and so on. This commit removes the explicit step structures, merging generations with steps, thus simplifying a lot of code. Performance is unaffected. The tunable number of steps is now gone, although it may be replaced in the future by a way to tune the aging in generation 0.
*	Make allocatePinned use local storage, and other refactorings	Simon Marlow	2009-12-01	1	-28/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
*	Implement a new heap-tuning option: -H	Simon Marlow	2009-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-H alone causes the RTS to use a larger nursery, but without exceeding the amount of memory that the application is already using. It trades off GC time against locality: the default setting is to use a fixed-size 512k nursery, but this is sometimes worse than using a very large nursery despite the worse locality. Not all programs get faster, but some programs that use large heaps do much better with -H. e.g. this helps a lot with #3061 (binary-trees), though not as much as specifying -H<large>. Typically using -H<large> is better than plain -H, because the runtime doesn't know ahead of time how much memory you want to use. Should -H be on by default? I'm not sure, it makes some programs go slower, but others go faster.
*	Store a destination step in the block descriptor	Simon Marlow	2009-11-29	2	-14/+25
\| \| \| \| \| \| \|	At the moment, this just saves a memory reference in the GC inner loop (worth a percent or two of GC time). Later, it will hopefully let me experiment with partial steps, and simplifying the generation/step infrastructure.
*	threadStackOverflow: check whether stack squeezing released some stack (#3677)	Simon Marlow	2009-11-25	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a stack overflow situation, stack squeezing may reduce the stack size, but we don't know whether it has been reduced enough for the stack check to succeed if we try again. Fortunately stack squeezing is idempotent, so all we need to do is record whether any squeezing happened. If we are at the stack's absolute -K limit, and stack squeezing happened, then we try running the thread again. We also want to avoid enlarging the stack if squeezing has already released some of it. However, we don't want to get into a pathalogical situation where a thread has a nearly full stack (near its current limit, but not near the absolute -K limit), keeps allocating a little bit, squeezing removes a little bit, and then it runs again. So to avoid this, if we squeezed and there is still less than BLOCK_SIZE_W words free, then we enlarge the stack anyway.
*	add a comment to TSO_MARKED	Simon Marlow	2009-11-25	1	-0/+4
\|
*	Switch EventThreadID back to 32 bits.	Simon Marlow	2009-11-12	1	-1/+1
\| \| \| \| \|	The log file format was still using 32 bits, this just updates the header file to match; there should be no functional changes.
*	Second attempt to fix #1185 (forkProcess and -threaded)	Simon Marlow	2009-11-11	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Patch 1/2: second part of the patch is to libraries/base This time without dynamic linker hacks, instead I've expanded the existing rts/Globals.c to cache more CAFs, specifically those in GHC.Conc. We were already using this trick for signal handlers, I should have realised before. It's still quite unsavoury, but we can do away with rts/Globals.c in the future when we switch to a dynamically-linked GHCi.
*	Add events to show when GC threads are idle/working	Simon Marlow	2009-10-15	1	-1/+4
\|
*	Use "rep; nop" inside a spin-lock loop on x86/x86-64	Simon Marlow	2009-09-29	1	-0/+2
\| \| \| \| \|	This helps on a hyperthreaded CPU by yielding to the other thread in a spinlock loop.
*	Add a way to generate tracing events programmatically	Simon Marlow	2009-09-25	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	added: primop TraceEventOp "traceEvent#" GenPrimOp Addr# -> State# s -> State# s { Emits an event via the RTS tracing framework. The contents of the event is the zero-terminated byte string passed as the first argument. The event will be emitted either to the .eventlog file, or to stderr, depending on the runtime RTS flags. } and added the required RTS functionality to support it. Also a bit of refactoring in the RTS tracing code.
*	implement case-on-Word in the byte code generator/interpreter (#2881)	Simon Marlow	2009-09-18	1	-0/+2
\|
*	Fix #3439: -debug implies -ticky, and -ticky code links with any RTS	Simon Marlow	2009-09-18	1	-1/+2
\|
*	Event tracing: put the capability in the block marker, omit it from the events	Simon Marlow	2009-09-15	1	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \|	This makes events smaller and tracing quicker, and speeds up reading and sorting the trace file. HEADS UP: this changes the format of event log files. Corresponding changes to the ghc-events package are required (and will be pushed soon). Normally we would make backwards-compatible changes, but this changes the format of every event (to remove the capability) so I'm breaking the rules this time. This will be the only time we can do this, since the format becomes public in 6.12.1.
*	Add event block markers	Simon Marlow	2009-09-13	1	-1/+2
\| \| \| \| \|	These indicate the size and time span of a sequence of events in the event log, to make it easier to sort and navigate a large event log.
*	Improve the default parallel GC settings, and sanitise the flags (#3340)	Simon Marlow	2009-09-15	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flags (from +RTS -?): -qg[<n>] Use parallel GC only for generations >= <n> (default: 0, -qg alone turns off parallel GC) -qb[<n>] Use load-balancing in the parallel GC only for generations >= <n> (default: 1, -qb alone turns off load-balancing) these are good defaults for most parallel programs. Single-threaded programs that want to make use of parallel GC will probably want +RTS -qg1 (this is documented). I've also updated the docs.
*	Unify event logging and debug tracing.	Simon Marlow	2009-08-29	3	-12/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- tracing facilities are now enabled with -DTRACING, and -DDEBUG additionally enables debug-tracing. -DEVENTLOG has been removed. - -debug now implies -eventlog - events can be printed to stderr instead of being sent to the binary .eventlog file by adding +RTS -v (which is implied by the +RTS -Dx options). - -Dx debug messages can be sent to the binary .eventlog file by adding +RTS -l. This should help debugging by reducing the impact of debug tracing on execution time. - Various debug messages that duplicated the information in events have been removed.
*	Fix incorrectly hidden RTS symbols	Simon Marlow	2009-08-29	4	-0/+89
\|
*	Declare RTS-private prototypes with __attribute__((visibility("hidden")))	Simon Marlow	2009-08-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This has no effect with static libraries, but when the RTS is in a shared library it does two things: - it prevents the function from being exposed by the shared library - internal calls to the function can use the faster non-PLT calls, because the function cannot be overriden at link time.
*	Tidy up file headers and copyrights; point to the wiki for docs	Simon Marlow	2009-08-25	24	-19/+127
\| \| \| \| \| \| \|	I've updated the wiki page about the RTS headers http://hackage.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes to reflect the new layout and explain some of the rationale. All the header files now point to this page.
*	Restore the entry field in StgInfoTable when !defined(TABLES_NEXT_TO_CODE)	Simon Marlow	2009-08-19	1	-0/+4
\| \| \| \| \|	Somehow this got lost, probably in the recent RTS tidy-up. Fixes segfaults in unregisterised compilation.
*	Fix #3429: a tricky race condition	Simon Marlow	2009-08-18	2	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were two bugs, and had it not been for the first one we would not have noticed the second one, so this is quite fortunate. The first bug is in stg_unblockAsyncExceptionszh_ret, when we found a pending exception to raise, but don't end up raising it, there was a missing adjustment to the stack pointer. The second bug was that this case was actually happening at all: it ought to be incredibly rare, because the pending exception thread would have to be killed between us finding it and attempting to raise the exception. This made me suspicious. It turned out that there was a race condition on the tso->flags field; multiple threads were updating this bitmask field non-atomically (one of the bits is the dirty-bit for the generational GC). The fix is to move the dirty bit into its own field of the TSO, making the TSO one word larger (sadly).
*	Fix the build on OS X	Ian Lynagh	2009-08-07	1	-0/+4
\|
*	profiling build fixes	Simon Marlow	2009-08-05	2	-6/+9
\|
*	move termios prototypes into a public header	Simon Marlow	2009-08-03	1	-0/+15
\|
*	move StgEntCounter type into its own header	Simon Marlow	2009-08-03	1	-0/+31
\|
*	remove the GUM closure types	Simon Marlow	2009-08-03	1	-16/+11
\|
*	move gc_alloc_block to make it visible on 32-bit	Simon Marlow	2009-08-03	1	-4/+5
\|
*	Windows build fixes	Simon Marlow	2009-08-03	1	-0/+7
\|
*	x86_64 build fix: declare gc_alloc_block_sync	Simon Marlow	2009-08-03	1	-0/+4
\|
*	RTS tidyup sweep, first phase	Simon Marlow	2009-08-02	34	-0/+4216
	The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.