diff options
| author | Simon Marlow <marlowsd@gmail.com> | 2011-11-28 16:48:43 +0000 |
|---|---|---|
| committer | Simon Marlow <marlowsd@gmail.com> | 2011-11-29 12:21:18 +0000 |
| commit | 50de6034343abc93a7b01daccff34121042c0e7c (patch) | |
| tree | 24496a5fc6bc39c6baaa574608e53c5d76c169f6 /compiler/codeGen/StgCmmUtils.hs | |
| parent | 1c2b838131134d44004dfdff18c302131478390d (diff) | |
| download | haskell-50de6034343abc93a7b01daccff34121042c0e7c.tar.gz | |
Make profiling work with multiple capabilities (+RTS -N)
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
Diffstat (limited to 'compiler/codeGen/StgCmmUtils.hs')
| -rw-r--r-- | compiler/codeGen/StgCmmUtils.hs | 8 |
1 files changed, 6 insertions, 2 deletions
diff --git a/compiler/codeGen/StgCmmUtils.hs b/compiler/codeGen/StgCmmUtils.hs index f209005108..c3327138b3 100644 --- a/compiler/codeGen/StgCmmUtils.hs +++ b/compiler/codeGen/StgCmmUtils.hs @@ -253,7 +253,7 @@ callerSaveVolatileRegs = (caller_save, caller_load) caller_save = catAGraphs (map callerSaveGlobalReg regs_to_save) caller_load = catAGraphs (map callerRestoreGlobalReg regs_to_save) - system_regs = [ Sp,SpLim,Hp,HpLim,CurrentTSO,CurrentNursery + system_regs = [ Sp,SpLim,Hp,HpLim,CCCS,CurrentTSO,CurrentNursery {- ,SparkHd,SparkTl,SparkBase,SparkLim -} , BaseReg ] @@ -366,6 +366,9 @@ callerSaves Hp = True #ifdef CALLER_SAVES_HpLim callerSaves HpLim = True #endif +#ifdef CALLER_SAVES_CCCS +callerSaves CCCS = True +#endif #ifdef CALLER_SAVES_CurrentTSO callerSaves CurrentTSO = True #endif @@ -385,7 +388,8 @@ baseRegOffset SpLim = oFFSET_StgRegTable_rSpLim baseRegOffset (LongReg 1) = oFFSET_StgRegTable_rL1 baseRegOffset Hp = oFFSET_StgRegTable_rHp baseRegOffset HpLim = oFFSET_StgRegTable_rHpLim -baseRegOffset CurrentTSO = oFFSET_StgRegTable_rCurrentTSO +baseRegOffset CCCS = oFFSET_StgRegTable_rCCCS +baseRegOffset CurrentTSO = oFFSET_StgRegTable_rCurrentTSO baseRegOffset CurrentNursery = oFFSET_StgRegTable_rCurrentNursery baseRegOffset HpAlloc = oFFSET_StgRegTable_rHpAlloc baseRegOffset GCEnter1 = oFFSET_stgGCEnter1 |
