summaryrefslogtreecommitdiff
path: root/includes/rts
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix build on Win64Ian Lynagh2012-05-091-0/+4
|
* Enable FileLock for win32 (#4363)Paolo Capriotti2012-05-081-4/+2
|
* OS X build fixesIan Lynagh2012-04-261-3/+12
| | | | | OS X doesn't understand 'gnu_printf', so we need to onyl use it conditionally.
* Build fixesIan Lynagh2012-04-261-0/+2
|
* Fix warnings on Win64Ian Lynagh2012-04-262-5/+5
| | | | | | Mostly this meant getting pointer<->int conversions to use the right sizes. lnat is now size_t, rather than unsigned long, as that seems a better match for how it's used.
* Use gnu_printf rather than just printf in function format attributesIan Lynagh2012-04-241-3/+3
| | | | | On Windows, gcc thinks that printf means ms_printf, which is not the case when we #define _POSIX_SOURCE 1.
* Improve the handling of threadDelay in the non-threaded RTSSimon Marlow2012-04-111-1/+1
| | | | | | | Firstly, we were rounding up too much, such that the smallest delay was 20ms. Secondly, there is no need to use millisecond resolution on a 64-bit machine where we have room in the TSO to use the normal nanosecond resolution that we use elsewhere in the RTS.
* Add the GC_GLOBAL_SYNC event marking that all caps are stopped for GCMikolaj2012-04-041-2/+3
| | | | | | | | | Quoting design rationale by dcoutts: The event indicates that we're doing a stop-the-world GC and all other HECs should be between their GC_START and GC_END events at that moment. We don't want to use GC_STATS_GHC for that, because GC_STATS_GHC is for extra GHC-specific info, not something we have to rely on to be able to match the GC pauses across HECs to a particular global GC.
* Adjust the eventlog description header for the spark counter eventDuncan Coutts2012-04-041-1/+1
| | | | | | | The EventLogFormat.h described the spark counter fields in a different order to that which ghc emits (the GC'd and fizzled fields were reversed). At this stage it is easier to fix the ghc-events lib and to have ghc continue to emit them in the current order.
* Add new eventlog events for various heap and GC statisticsDuncan Coutts2012-04-041-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They cover much the same info as is available via the GHC.Stats module or via the '+RTS -s' textual output, but via the eventlog and with a better sampling frequency. We have three new generic heap info events and two very GHC-specific ones. (The hope is the general ones are usable by other implementations that use the same eventlog system, or indeed not so sensitive to changes in GHC itself.) The general ones are: * total heap mem allocated since prog start, on a per-HEC basis * current size of the heap (MBlocks reserved from OS for the heap) * current size of live data in the heap Currently these are all emitted by GHC at GC time (live data only at major GC). The GHC specific ones are: * an event giving various static heap paramaters: * number of generations (usually 2) * max size if any * nursary size * MBlock and block sizes * a event emitted on each GC containing: * GC generation (usually just 0,1) * total bytes copied * bytes lost to heap slop and fragmentation * the number of threads in the parallel GC (1 for serial) * the maximum number of bytes copied by any par GC thread * the total number of bytes copied by all par GC threads (these last three can be used to calculate an estimate of the work balance in parallel GCs)
* Change the presentation of parallel GC work balance in +RTS -sDuncan Coutts2012-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | Also rename internal variables to make the names match what they hold. The parallel GC work balance is calculated using the total amount of memory copied by all GC threads, and the maximum copied by any individual thread. You have serial GC when the max is the same as copied, and perfectly balanced GC when total/max == n_caps. Previously we presented this as the ratio total/max and told users that the serial value was 1 and the ideal value N, for N caps, e.g. Parallel GC work balance: 1.05 (4045071 / 3846774, ideal 2) The downside of this is that the user always has to keep in mind the number of cores being used. Our new presentation uses a normalised scale 0--1 as a percentage. The 0% means completely serial and 100% is perfect balance, e.g. Parallel GC work balance: 4.56% (serial 0%, perfect 100%)
* Add eventlog/trace stuff for capabilities: create/delete/enable/disableDuncan Coutts2012-04-041-4/+10
| | | | | | | | | | | | | | | | | | | | | | | Now that we can adjust the number of capabilities on the fly, we need this reflected in the eventlog. Previously the eventlog had a single startup event that declared a static number of capabilities. Obviously that's no good anymore. For compatability we're keeping the EVENT_STARTUP but adding new EVENT_CAP_CREATE/DELETE. The EVENT_CAP_DELETE is actually just the old EVENT_SHUTDOWN but renamed and extended (using the existing mechanism to extend eventlog events in a compatible way). So we now emit both EVENT_STARTUP and EVENT_CAP_CREATE. One day we will drop EVENT_STARTUP. Since reducing the number of capabilities at runtime does not really delete them, it just disables them, then we also have new events for disable/enable. The old EVENT_SHUTDOWN was in the scheduler class of events. The new EVENT_CAP_* events are in the unconditional class, along with the EVENT_CAPSET_* ones. Knowing when capabilities are created and deleted is crucial to making sense of eventlogs, you always want those events. In any case, they're extremely low volume.
* Tabs -> SpacesDavid Terei2012-03-231-25/+25
|
* fix _BTM field of closureFlags[], and document what it means (#5923)Simon Marlow2012-03-141-1/+1
|
* use idiomatic (GHC) typesGabor Greif2012-02-271-6/+6
|
* tidied this up, the macro definitions were causing duplicate semis in the sourceGabor Greif2012-02-271-10/+10
|
* use (GHC) idiomatic typesGabor Greif2012-01-091-4/+4
|
* Make the RTS linker API use wide-char pathnames on Windows (#5697)Simon Marlow2012-01-091-6/+12
| | | | | I haven't been able to test whether this works or not due to #5754, but at least it doesn't appear to break anything.
* setNumCapabilities: don't barf() if it isn't supported, just print an errorSimon Marlow2012-01-061-4/+0
|
* Rename struct _CostCentreStack to struct CostCentreStack_ for consistencySimon Marlow2012-01-051-8/+8
| | | | Needed by #5357
* Rename the CCCS field of StgTSO so as not to conflict with the CCCS ↵Simon Marlow2012-01-051-1/+1
| | | | | | pseudo-register Needed by #5357
* Fix alignment in the CostCentre struct (#5710)Simon Marlow2011-12-191-1/+1
|
* New flag +RTS -qi<n>, avoid waking up idle Capabilities to do parallel GCSimon Marlow2011-12-131-0/+8
| | | | | | | | | | | | | | | | | This is an experimental tweak to the parallel GC that avoids waking up a Capability to do parallel GC if we know that the capability has been idle for a (tunable) number of GC cycles. The idea is that if you're only using a few Capabilities, there's no point waking up the ones that aren't busy. e.g. +RTS -qi3 says "A Capability will participate in parallel GC if it was running at all since the last 3 GC cycles." Results are a bit hit and miss, and I don't completely understand why yet. Hence, for now it is turned off by default, and also not documented except in the +RTS -? output.
* Define getNumberOfProcessors() even when !THREADED_RTSSimon Marlow2011-12-071-4/+7
|
* Allow the number of capabilities to be increased at runtime (#3729)Simon Marlow2011-12-061-0/+10
| | | | | At present the number of capabilities can only be *increased*, not decreased. The latter presents a few more challenges!
* Make forkProcess work with +RTS -NSimon Marlow2011-12-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | Consider this experimental for the time being. There are a lot of things that could go wrong, but I've verified that at least it works on the test cases we have. I also did some API cleanups while I was here. Previously we had: Capability * rts_eval (Capability *cap, HaskellObj p, /*out*/HaskellObj *ret); but this API is particularly error-prone: if you forget to discard the Capability * you passed in and use the return value instead, then you're in for subtle bugs with +RTS -N later on. So I changed all these functions to this form: void rts_eval (/* inout */ Capability **cap, /* in */ HaskellObj p, /* out */ HaskellObj *ret) It's much harder to use this version incorrectly, because you have to pass the Capability in by reference.
* More changes aimed at improving call stacks.Simon Marlow2011-12-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | - Attach a SrcSpan to every CostCentre. This had the side effect that CostCentres that used to be merged because they had the same name are now considered distinct; so I had to add a Unique to CostCentre to give them distinct object-code symbols. - New flag: -fprof-auto-calls. This flag adds an automatic SCC to every call site (application, to be precise). This is typically more useful for call stacks than annotating whole functions. Various tidy-ups at the same time: removed unused NoCostCentre constructor, and refactored a bit in Coverage.lhs. The call stack we get from traceStack now looks like this: Stack trace: Main.CAF (<entire-module>) Main.main.xs (callstack002.hs:18:12-24) Main.map (callstack002.hs:13:12-16) Main.map.go (callstack002.hs:15:21-34) Main.map.go (callstack002.hs:15:21-23) Main.f (callstack002.hs:10:7-43)
* Make profiling work with multiple capabilities (+RTS -N)Simon Marlow2011-11-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs).
* Time handling overhaulSimon Marlow2011-11-252-6/+21
| | | | | | | | | | | | | | | | | | | | | Terminology cleanup: the type "Ticks" has been renamed "Time", which is an StgWord64 in units of TIME_RESOLUTION (currently nanoseconds). The terminology "tick" is now used consistently to mean the interval between timer signals. The ticker now always ticks in realtime (actually CLOCK_MONOTONIC if we have it). Before it used CPU time in the non-threaded RTS and realtime in the threaded RTS, but I've discovered that the CPU timer has terrible resolution (at least on Linux) and isn't much use for profiling. So now we always use realtime. This should also fix The default tick interval is now 10ms, except when profiling where we drop it to 1ms. This gives more accurate profiles without affecting runtime too much (<1%). Lots of cleanups - the resolution of Time is now in one place only (Rts.h) rather than having calculations that depend on the resolution scattered all over the RTS. I hope I found them all.
* Remove some old comments about the manglerDavid Terei2011-11-221-5/+0
|
* Generate the C main() function when linking a binary (fixes #5373)Simon Marlow2011-11-161-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than have main() be statically compiled as part of the RTS, we now generate it into the tiny C file that we compile when linking a binary. The main motivation is that we want to pass the settings for the -rtsotps and -with-rtsopts flags into the RTS, rather than relying on fragile linking semantics to override the defaults, which don't work with DLLs on Windows (#5373). In order to do this, we need to extend the API for initialising the RTS, so now we have: void hs_init_ghc (int *argc, char **argv[], // program arguments RtsConfig rts_config); // RTS configuration hs_init_ghc() can optionally be used instead of hs_init(), and allows passing in configuration options for the RTS. RtsConfig is a struct, which currently has two fields: typedef struct { RtsOptsEnabledEnum rts_opts_enabled; const char *rts_opts; } RtsConfig; but might have more in the future. There is a default value for the struct, defaultRtsConfig, the idea being that you start with this and override individual fields as necessary. In fact, main() was in a separate static library, libHSrtsmain.a. That's now gone.
* Allow the use of R9 and R10 in primops; fixes trac #5423Ian Lynagh2011-11-061-1/+1
|
* Add eventlog event for thread labelsDuncan Coutts2011-11-041-3/+3
| | | | | | The existing GHC.Conc.labelThread will now also emit the the thread label into the eventlog. Profiling tools like ThreadScope could then use the thread labels rather than thread numbers.
* Overhaul of infrastructure for profiling, coverage (HPC) and breakpointsSimon Marlow2011-11-021-94/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User visible changes ==================== Profilng -------- Flags renamed (the old ones are still accepted for now): OLD NEW --------- ------------ -auto-all -fprof-auto -auto -fprof-exported -caf-all -fprof-cafs New flags: -fprof-auto Annotates all bindings (not just top-level ones) with SCCs -fprof-top Annotates just top-level bindings with SCCs -fprof-exported Annotates just exported bindings with SCCs -fprof-no-count-entries Do not maintain entry counts when profiling (can make profiled code go faster; useful with heap profiling where entry counts are not used) Cost-centre stacks have a new semantics, which should in most cases result in more useful and intuitive profiles. If you find this not to be the case, please let me know. This is the area where I have been experimenting most, and the current solution is probably not the final version, however it does address all the outstanding bugs and seems to be better than GHC 7.2. Stack traces ------------ +RTS -xc now gives more information. If the exception originates from a CAF (as is common, because GHC tends to lift exceptions out to the top-level), then the RTS walks up the stack and reports the stack in the enclosing update frame(s). Result: +RTS -xc is much more useful now - but you still have to compile for profiling to get it. I've played around a little with adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem quite accurately. I plan to add more facilities for stack tracing (e.g. in GHCi) in the future. Coverage (HPC) -------------- * derived instances are now coloured yellow if they weren't used * likewise record field names * entry counts are more accurate (hpc --fun-entry-count) * tab width is now correct (markup was previously off in source with tabs) Internal changes ================ In Core, the Note constructor has been replaced by Tick (Tickish b) (Expr b) which is used to represent all the kinds of source annotation we support: profiling SCCs, HPC ticks, and GHCi breakpoints. Depending on the properties of the Tickish, different transformations apply to Tick. See CoreUtils.mkTick for details. Tickets ======= This commit closes the following tickets, test cases to follow: - Close #2552: not a bug, but the behaviour is now more intuitive (test is T2552) - Close #680 (test is T680) - Close #1531 (test is result001) - Close #949 (test is T949) - Close #2466: test case has bitrotted (doesn't compile against current version of vector-space package)
* Add an RTS eventlog tracing class for user messagesDuncan Coutts2011-10-271-0/+1
| | | | Enables people to turn them on/off. Defaults to on.
* Add new eventlog EVENT_WALL_CLOCK_TIME for time matchingDuncan Coutts2011-10-261-2/+3
| | | | | | | | | | | | | | Eventlog timestamps are elapsed times (in nanoseconds) relative to the process start. To be able to merge eventlogs from multiple processes we need to be able to align their timelines. If they share a clock domain (or a user judges that their clocks are sufficiently closely synchronised) then it is sufficient to know how the eventlog timestamps match up with the clock. The EVENT_WALL_CLOCK_TIME contains the clock time with (up to) nanosecond precision. It is otherwise an ordinary event and so contains the usual timestamp for the same moment in time. It therefore enables us to match up all the eventlog timestamps with clock time.
* Revert "Move freeStablePtr() into the exported API (Lennart wants it)"Simon Marlow2011-10-191-2/+1
| | | | | | | On second thoughts, hs_free_stable_ptr() is the official way to free a StablePtr. This reverts commit ae583f2949570755c8a03f68a71416c0fd7f257c.
* Move freeStablePtr() into the exported API (Lennart wants it)Simon Marlow2011-10-181-1/+2
|
* make CAFs atomic, to fix #5558Simon Marlow2011-10-171-2/+2
| | | | See Note [atomic CAFs] in rts/sm/Storage.c
* Snapshot of codegen refactoring to share with simonpjSimon Marlow2011-08-251-4/+4
|
* Also include basic time statistics in GCStats.Edward Z. Yang2011-08-061-0/+2
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Implement public interface for GC statistics.Edward Z. Yang2011-07-301-0/+44
| | | | | | | | | | | We add a new RTS flag -T for collecting statistics but not giving any new inputs. There is one new struct in rts/storage/GC.h: GCStats. We add two new global counters current_residency and current_slop, which are useful for in-program GC statistics. See GHC.Stats in base for a Haskell interface to this functionality. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* comment updatesSimon Marlow2011-07-201-2/+2
|
* Sync EventLogFormat.h with ghc-eventsDuncan Coutts2011-07-181-2/+5
|
* Add new fully-accurate per-spark trace/eventlog eventsDuncan Coutts2011-07-182-5/+15
| | | | | | | | | | | | | | Replaces the existing EVENT_RUN/STEAL_SPARK events with 7 new events covering all stages of the spark lifcycle: create, dud, overflow, run, steal, fizzle, gc The sampled spark events are still available. There are now two event classes for sparks, the sampled and the fully accurate. They can be enabled/disabled independently. By default +RTS -l includes the sampled but not full detail spark events. Use +RTS -lf-p to enable the detailed 'f' and disable the sampled 'p' spark. Includes work by Mikolaj <mikolaj.konarski@gmail.com>
* Move GC tracing into a separate trace classDuncan Coutts2011-07-181-0/+1
| | | | | | | Previously GC was included in the scheduler trace class. It can be enabled specifically with +RTS -vg or -lg, though note that both -v and -l on their own now default to a sensible set of trace classes, currently: scheduler, gc and sparks.
* add a new trace class for spark eventsDuncan Coutts2011-07-181-1/+1
|
* Add spark counter tracingDuncan Coutts2011-07-181-1/+2
| | | | | | | A new eventlog event containing 7 spark counters/statistics: sparks created, dud, overflowed, converted, GC'd, fizzled and remaining. These are maintained and logged separately for each capability. We log them at startup, on each GC (minor and major) and on shutdown.
* remove getOrSetTypeableStore. This is no longer used after the switchSimon Marlow2011-07-121-1/+0
| | | | to using MD5 hashes to identify TypeReps in the Typeable library.
* Emit various bits of OS process info into the eventlogDuncan Coutts2011-05-261-1/+39
| | | | | The process ID, parent process ID, rts name and version The program arguments and environment.