summaryrefslogtreecommitdiff
path: root/includes/Cmm.h
Commit message (Collapse)AuthorAgeFilesLines
...
* profiling fixesSimon Marlow2012-10-191-8/+6
|
* profiling fixesSimon Marlow2012-10-091-5/+7
|
* Produce new-style Cmm from the Cmm parserSimon Marlow2012-10-081-77/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.
* Start separating out the RTS and Haskell imports of MachRegs.hIan Lynagh2012-08-061-1/+1
| | | | No functional differences yet
* Don't define STOLEN_X86_REGS in Cmm.hIan Lynagh2012-08-061-1/+0
| | | | | | We weren't defining it in the other places that MachRegs.h gets imported, which seems a little suspicious. And if it's not defined then it defaults to 4 anyway, so this define doesn't seem necessary.
* comment-only typoGabor Greif2012-07-181-2/+2
|
* Define W_TO_LONG in Cmm.hIan Lynagh2012-06-201-0/+6
|
* Some Win64 fixesIan Lynagh2012-03-151-1/+1
| | | | Convert some sizes, as CLong is a different size to pointers
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Add array copy/clone primopsDaniel Peebles2011-05-191-2/+4
|
* Count allocations more accuratelySimon Marlow2010-12-211-1/+1
| | | | | | | | | | | The allocation stats (+RTS -s etc.) used to count the slop at the end of each nursery block (except the last) as allocated space, now we count the allocated words accurately. This should make allocation figures more predictable, too. This has the side effect of reducing the apparent allocations by a small amount (~1%), so remember to take this into account when looking at nofib results.
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* FIX #38000 Store StgArrWords payload size in bytesAntoine Latter2010-01-011-0/+3
|
* fix 64-bit value for W_SHIFT, which thankfully appears to be not usedSimon Marlow2010-04-221-1/+1
|
* Remove the IND_OLDGEN and IND_OLDGEN_PERM closure typesSimon Marlow2010-04-011-2/+0
| | | | | | | These are no longer used: once upon a time they used to have different layout from IND and IND_PERM respectively, but that is no longer the case since we changed the remembered set to be an array of addresses instead of a linked list of closures.
* Fix #650: use a card table to mark dirty sections of mutable arraysSimon Marlow2009-12-171-0/+3
| | | | | | | | | | | | The card table is an array of bytes, placed directly following the actual array data. This means that array reading is unaffected, but array writing needs to read the array size from the header in order to find the card table. We use a bytemap rather than a bitmap, because updating the card table must be multi-thread safe. Each byte refers to 128 entries of the array, but this is tunable by changing the constant MUT_ARR_PTRS_CARD_BITS in includes/Constants.h.
* Correction to the allocation stats following earlier refactoringSimon Marlow2009-12-041-1/+1
|
* GC refactoring, remove "steps"Simon Marlow2009-12-031-1/+1
| | | | | | | | | | | | | | | | | | | | | The GC had a two-level structure, G generations each of T steps. Steps are for aging within a generation, mostly to avoid premature promotion. Measurements show that more than 2 steps is almost never worthwhile, and 1 step is usually worse than 2. In theory fractional steps are possible, so the ideal number of steps is somewhere between 1 and 3. GHC's default has always been 2. We can implement 2 steps quite straightforwardly by having each block point to the generation to which objects in that block should be promoted, so blocks in the nursery point to generation 0, and blocks in gen 0 point to gen 1, and so on. This commit removes the explicit step structures, merging generations with steps, thus simplifying a lot of code. Performance is unaffected. The tunable number of steps is now gone, although it may be replaced in the future by a way to tune the aging in generation 0.
* Make allocatePinned use local storage, and other refactoringsSimon Marlow2009-12-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. - allocateLocal() now allocates large objects into the local nursery, rather than taking a global lock and allocating then in gen 0 step 0. - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). - We had a gen 0 step 0, distinct from the nurseries, which are stored in a separate nurseries[] array. This is slightly strange. I removed the g0s0 global that pointed to gen 0 step 0, and removed all uses of it. I think now we don't use gen 0 step 0 at all, except possibly when there is only one generation. Possibly more tidying up is needed here. - I removed the global allocate() function, and renamed allocateLocal() to allocate(). - the alloc_blocks global is gone. MAYBE_GC() and doYouWantToGC() now check the local nursery only.
* RTS tidyup sweep, first phaseSimon Marlow2009-08-021-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.
* Stop building the rts against gmpDuncan Coutts2009-06-131-5/+0
| | | | Nothing from gmp is used in the rts anymore.
* add missing case in ENTER() (fixes readwrite002(profasm) crash)Simon Marlow2009-03-191-0/+1
|
* FIX biographical profiling (#3039, probably #2297)Simon Marlow2009-03-171-4/+26
| | | | | | | | | Since we introduced pointer tagging, we no longer always enter a closure to evaluate it. However, the biographical profiler relies on closures being entered in order to mark them as "used", so we were getting spurious amounts of data attributed to VOID. It turns out there are various places that need to be fixed, and I think at least one of them was also wrong before pointer tagging (CgCon.cgReturnDataCon).
* Merging in the new codegen branchdias@eecs.harvard.edu2008-08-141-4/+5
| | | | | | | | | | | | | | | | | | This merge does not turn on the new codegen (which only compiles a select few programs at this point), but it does introduce some changes to the old code generator. The high bits: 1. The Rep Swamp patch is finally here. The highlight is that the representation of types at the machine level has changed. Consequently, this patch contains updates across several back ends. 2. The new Stg -> Cmm path is here, although it appears to have a fair number of bugs lurking. 3. Many improvements along the CmmCPSZ path, including: o stack layout o some code for infotables, half of which is right and half wrong o proc-point splitting
* refactor: move unlockClosure() into SMPClosureOps() where it should beSimon Marlow2008-11-141-1/+1
|
* Move the context_switch flag into the CapabilitySimon Marlow2008-09-191-1/+1
| | | | | Fixes a long-standing bug that could in some cases cause sub-optimal scheduling behaviour.
* Add a write barrier to the TSO link field (#1589)Simon Marlow2008-04-161-3/+0
|
* update a commentSimon Marlow2008-04-071-12/+2
|
* Fix warnings in the RTSIan Lynagh2008-03-251-3/+6
| | | | For some reason this causes build failures for me in my 32-bit chroot,
* recordMutable: test for gen>0 before calling recordMutableCapSimon Marlow2007-10-171-4/+8
| | | | | | | For some reason the C-- version of recordMutable wasn't verifying that the object was in an old generation before attempting to add it to the mutable list, and this broke maessen_hashtab. This version of recordMutable is only used in unsafeThaw#.
* Pointer TaggingSimon Marlow2007-07-271-4/+41
| | | | | | | | | | | | | | | | | | | | | | This patch implements pointer tagging as per our ICFP'07 paper "Faster laziness using dynamic pointer tagging". It improves performance by 10-15% for most workloads, including GHC itself. The original patches were by Alexey Rodriguez Yakushev <mrchebas@gmail.com>, with additions and improvements by me. I've re-recorded the development as a single patch. The basic idea is this: we use the low 2 bits of a pointer to a heap object (3 bits on a 64-bit architecture) to encode some information about the object pointed to. For a constructor, we encode the "tag" of the constructor (e.g. True vs. False), for a function closure its arity. This enables some decisions to be made without dereferencing the pointer, which speeds up some common operations. In particular it enables us to avoid costly indirect jumps in many cases. More information in the commentary: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
* Implemented and fixed bugs in CmmInfo handlingMichael D. Adams2007-06-271-1/+1
|
* Make the threaded RTS compilable using -fasmSimon Marlow2007-06-261-0/+2
| | | | | We needed to turn some inline C functions and C macros into either real C functions or C-- macros.
* Lightweight ticky-ticky profilingKirsten Chevalier2007-02-071-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following changes restore ticky-ticky profiling to functionality from its formerly bit-rotted state. Sort of. (It got bit-rotted as part of the switch to the C-- back-end.) The way that ticky-ticky is supposed to work is documented in Section 5.7 of the GHC manual (though the manual doesn't mention that it hasn't worked since sometime around 6.0, alas). Changes from this are as follows (which I'll document on the wiki): * In the past, you had to build all of the libraries with way=t in order to use ticky-ticky, because it entailed a different closure layout. No longer. You still need to do make way=t in rts/ in order to build the ticky RTS, but you should now be able to mix ticky and non-ticky modules. * Some of the counters that worked in the past aren't implemented yet. I was originally just trying to get entry counts to work, so those should be correct. The list of counters was never documented in the first place, so I hope it's not too much of a disaster that some don't appear anymore. Someday, someone (perhaps me) should document all the counters and what they do. For now, all of the counters are either accurate (or at least as accurate as they always were), zero, or missing from the ticky profiling report altogether. This hasn't been particularly well-tested, but these changes shouldn't affect anything except when compiling with -fticky-ticky (famous last words...) Implementation details: I got rid of StgTicky.h, which in the past had the macros and declarations for all of the ticky counters. Now, those macros are defined in Cmm.h. StgTicky.h was still there for inclusion in C code. Now, any remaining C code simply cannot call the ticky macros -- or rather, they do call those macros, but from the perspective of C code, they're defined as no-ops. (This shouldn't be too big a problem.) I added a new file TickyCounter.h that has all the declarations for ticky counters, as well as dummy macros for use in C code. Someday, these declarations should really be automatically generated, since they need to be kept consistent with the macros defined in Cmm.h. Other changes include getting rid of the header that was getting added to closures before, and getting rid of various code having to do with eager blackholing and permanent indirections (the changes under compiler/ and rts/Updates.*).
* STM invariantstharris@microsoft.com2006-10-071-2/+3
|
* new RTS flag: -V to modify the resolution of the RTS timerIan Lynagh2006-09-051-2/+0
| | | | | | | | | Fixed version of an old patch by Simon Marlow. His description read: Also, now an arbitrarily short context switch interval may now be specified, as we increase the RTS ticker's resolution to match the requested context switch interval. This also applies to +RTS -i (heap profiling) and +RTS -I (the idle GC timer). +RTS -V is actually only required for increasing the resolution of the profile timer.
* MAYBE_GC: initialise HpAllocSimon Marlow2006-08-301-0/+1
| | | | | | | | | HpAlloc was not being set when returning to the scheduler via MAYBE_GC(), which at the least was just wrong (the scheduler might allocate a large block more than once), and at worst could lead to crashes if HpAlloc contains garbage. Fixes at least one threaded2 test on Windows.
* Replace inline C functions with C-- macros in .cmm codeSimon Marlow2006-06-291-0/+28
| | | | So that we can build the RTS with the NCG.
* fix sloppy conditionalsSimon Marlow2006-06-201-1/+1
|
* Reorganisation of the source treeSimon Marlow2006-04-071-0/+517
Most of the other users of the fptools build system have migrated to Cabal, and with the move to darcs we can now flatten the source tree without losing history, so here goes. The main change is that the ghc/ subdir is gone, and most of what it contained is now at the top level. The build system now makes no pretense at being multi-project, it is just the GHC build system. No doubt this will break many things, and there will be a period of instability while we fix the dependencies. A straightforward build should work, but I haven't yet fixed binary/source distributions. Changes to the Building Guide will follow, too.