From 50de6034343abc93a7b01daccff34121042c0e7c Mon Sep 17 00:00:00 2001 From: Simon Marlow Date: Mon, 28 Nov 2011 16:48:43 +0000 Subject: Make profiling work with multiple capabilities (+RTS -N) This means that both time and heap profiling work for parallel programs. Main internal changes: - CCCS is no longer a global variable; it is now another pseudo-register in the StgRegTable struct. Thus every Capability has its own CCCS. - There is a new built-in CCS called "IDLE", which records ticks for Capabilities in the idle state. If you profile a single-threaded program with +RTS -N2, you'll see about 50% of time in "IDLE". - There is appropriate locking in rts/Profiling.c to protect the shared cost-centre-stack data structures. This patch does enough to get it working, I have cut one big corner: the cost-centre-stack data structure is still shared amongst all Capabilities, which means that multiple Capabilities will race when updating the "allocations" and "entries" fields of a CCS. Not only does this give unpredictable results, but it runs very slowly due to cache line bouncing. It is strongly recommended that you use -fno-prof-count-entries to disable the "entries" count when profiling parallel programs. (I shall add a note to this effect to the docs). --- compiler/codeGen/CgCase.lhs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'compiler/codeGen/CgCase.lhs') diff --git a/compiler/codeGen/CgCase.lhs b/compiler/codeGen/CgCase.lhs index e4fe386043..a36621bdaf 100644 --- a/compiler/codeGen/CgCase.lhs +++ b/compiler/codeGen/CgCase.lhs @@ -670,6 +670,6 @@ restoreCurrentCostCentre Nothing _freeit = nopC restoreCurrentCostCentre (Just slot) freeit = do { sp_rel <- getSpRelOffset slot ; whenC freeit (freeStackSlots [slot]) - ; stmtC (CmmStore curCCSAddr (CmmLoad sp_rel bWord)) } + ; stmtC (storeCurCCS (CmmLoad sp_rel bWord)) } \end{code} -- cgit v1.2.1