| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
We also generate much better code for safe foreign calls (and maybe
also unsafe foreign calls) than previously. See the two new Notes:
Note [lower safe foreign calls]
Note [safe foreign call convention]
|
| | |
|
| |\ |
|
| | | |
|
| | |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
compiler/cmm/CmmLint.hs
compiler/cmm/OldCmm.hs
compiler/codeGen/CgMonad.lhs
compiler/main/CodeOutput.lhs
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
021a0dd265ff34c1e292813c06185eff1d6b5c1c appears to have only
partially added the new primops associated with ArrayArray#
and MutableArrayArray#
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We now carry around with CmmJump statements a list of
the STG registers that are live at that jump site.
This is used by the LLVM backend so it can avoid
unnesecarily passing around dead registers, improving
perfromance. This gives us the framework to finally
fix trac #4308.
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Instead of enterLocalIdLabel we should get the label from the
ClosureInfo, because that knows better whether the label should be
local or not.
Needed by #5357
|
| | | |
| | |
| | |
| | |
| | |
| | | |
pseudo-register
Needed by #5357
|
| | | | |
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We no longer have many separate, clashing getDynFlags functions
I've given each GhcMonad its own HasDynFlags instance, rather than
using UndecidableInstances to make a GhcMonad m => HasDynFlags m
instance.
|
| | | | |
|
| |/ /
| |
| |
| |
| |
| |
| |
| | |
We were using the SRT information generated by the computeSRTs pass to
decide whether to add a static link field to a constructor or not, and
this broke when I disabled computeSRTs for the new code generator. So
I've hacked it for now to only rely on the SRT information generated
by CoreToStg.
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Also:
- improvements to code generation: push slow-call continuations
on the stack instead of generating explicit continuations
- remove unused CmmInfo wrapper type (replace with CmmInfoTable)
- squash Area and AreaId together, remove now-unused RegSlot
- comment out old unused stack-allocation code that no longer
compiles after removal of RegSlot
|
| | | |
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
This is so that we can process the Stg code in constant space. Before
we were generating all the C-- up front, leading to a large space
leak.
I haven't converted the LLVM or C back ends to the incremental scheme,
but it's not hard to do.
|
| | | |
|
| |/ |
|
| | |
|
| | |
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| | |
The primitive array types, such as 'ByteArray#', have kind #, but are represented by pointers. They are boxed, but unpointed types (i.e., they cannot be 'undefined').
The two categories of array types —[Mutable]Array# and [Mutable]ByteArray#— are containers for unboxed (and unpointed) as well as for boxed and pointed types. So far, we lacked support for containers for boxed, unpointed types (i.e., containers for the primitive arrays themselves). This is what the new primtypes provide.
Containers for boxed, unpointed types are crucial for the efficient implementation of scattered nested arrays, which are central to the new DPH backend library dph-lifted-vseg. Without such containers, we cannot eliminate all unboxing from the inner loops of traversals processing scattered nested arrays.
|
| |/ |
|
| |
|
|
|
|
|
| |
Otherwise the LLVM backend gets confused over whether its type should be
"void (i8*, i8*)" or "i64 (i8*, i8*)".
Signed-off-by: David Terei <davidterei@gmail.com>
|
| | |
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Attach a SrcSpan to every CostCentre. This had the side effect
that CostCentres that used to be merged because they had the same
name are now considered distinct; so I had to add a Unique to
CostCentre to give them distinct object-code symbols.
- New flag: -fprof-auto-calls. This flag adds an automatic SCC to
every call site (application, to be precise). This is typically
more useful for call stacks than annotating whole functions.
Various tidy-ups at the same time: removed unused NoCostCentre
constructor, and refactored a bit in Coverage.lhs.
The call stack we get from traceStack now looks like this:
Stack trace:
Main.CAF (<entire-module>)
Main.main.xs (callstack002.hs:18:12-24)
Main.map (callstack002.hs:13:12-16)
Main.map.go (callstack002.hs:15:21-34)
Main.map.go (callstack002.hs:15:21-23)
Main.f (callstack002.hs:10:7-43)
|
| |/
|
|
|
|
|
|
|
|
| |
When they existed, they were getting included in the includes_H_FILES
variable (as it uses wildcard to find all header files). But the
.depends files for the programs that generate the headers depend on
$(includes_H_FILES), so the .depends files looked out-of-date once the
headers had been created. This caused unnecessary make reinvocations.
So now we put them in dist* directories, where they ought to be anyway.
|
| |
|
|
|
|
|
|
|
| |
- add getCCSOf# :: a -> State# s -> (# State# s, Addr# #)
(returns the CCS attached to the supplied object)
- remove traceCcs# (obsoleted by getCCSOf#)
- rename getCCCS# to getCurrentCCS#
|
| |
|
|
|
| |
Returns a pointer to the current cost-centre stack when profiling,
NULL otherwise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that both time and heap profiling work for parallel
programs. Main internal changes:
- CCCS is no longer a global variable; it is now another
pseudo-register in the StgRegTable struct. Thus every
Capability has its own CCCS.
- There is a new built-in CCS called "IDLE", which records ticks for
Capabilities in the idle state. If you profile a single-threaded
program with +RTS -N2, you'll see about 50% of time in "IDLE".
- There is appropriate locking in rts/Profiling.c to protect the
shared cost-centre-stack data structures.
This patch does enough to get it working, I have cut one big corner:
the cost-centre-stack data structure is still shared amongst all
Capabilities, which means that multiple Capabilities will race when
updating the "allocations" and "entries" fields of a CCS. Not only
does this give unpredictable results, but it runs very slowly due to
cache line bouncing.
It is strongly recommended that you use -fno-prof-count-entries to
disable the "entries" count when profiling parallel programs. (I shall
add a note to this effect to the docs).
|
| |
|
|
|
| |
This field was doing nothing. I think it originally appeared in a
very old incarnation of the new code generator.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
So the .prof file will be UTF-8. This is mostly ok, except that the
RTS doesn't calculate the column widths correctly (it assumes bytes =
chars).
hp2ps doesn't do anything sensible with Unicode strings, it just dumps
the bytes into the .ps file.
|
| | |
|
| |
|
|
|
|
| |
It's still a panic, as it wouldn't be trivial to give a proper error
at the point that we generate it, but it's now a bit nicer:
Registers above R10 are not supported (tried to use R11)
|
| |
|
|
|
| |
We only use it for "compiler" sources, i.e. not for libraries.
Many modules have a -fno-warn-tabs kludge for now.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User visible changes
====================
Profilng
--------
Flags renamed (the old ones are still accepted for now):
OLD NEW
--------- ------------
-auto-all -fprof-auto
-auto -fprof-exported
-caf-all -fprof-cafs
New flags:
-fprof-auto Annotates all bindings (not just top-level
ones) with SCCs
-fprof-top Annotates just top-level bindings with SCCs
-fprof-exported Annotates just exported bindings with SCCs
-fprof-no-count-entries Do not maintain entry counts when profiling
(can make profiled code go faster; useful with
heap profiling where entry counts are not used)
Cost-centre stacks have a new semantics, which should in most cases
result in more useful and intuitive profiles. If you find this not to
be the case, please let me know. This is the area where I have been
experimenting most, and the current solution is probably not the
final version, however it does address all the outstanding bugs and
seems to be better than GHC 7.2.
Stack traces
------------
+RTS -xc now gives more information. If the exception originates from
a CAF (as is common, because GHC tends to lift exceptions out to the
top-level), then the RTS walks up the stack and reports the stack in
the enclosing update frame(s).
Result: +RTS -xc is much more useful now - but you still have to
compile for profiling to get it. I've played around a little with
adding 'head []' to GHC itself, and +RTS -xc does pinpoint the problem
quite accurately.
I plan to add more facilities for stack tracing (e.g. in GHCi) in the
future.
Coverage (HPC)
--------------
* derived instances are now coloured yellow if they weren't used
* likewise record field names
* entry counts are more accurate (hpc --fun-entry-count)
* tab width is now correct (markup was previously off in source with
tabs)
Internal changes
================
In Core, the Note constructor has been replaced by
Tick (Tickish b) (Expr b)
which is used to represent all the kinds of source annotation we
support: profiling SCCs, HPC ticks, and GHCi breakpoints.
Depending on the properties of the Tickish, different transformations
apply to Tick. See CoreUtils.mkTick for details.
Tickets
=======
This commit closes the following tickets, test cases to follow:
- Close #2552: not a bug, but the behaviour is now more intuitive
(test is T2552)
- Close #680 (test is T680)
- Close #1531 (test is result001)
- Close #949 (test is T949)
- Close #2466: test case has bitrotted (doesn't compile against current
version of vector-space package)
|
| | |
|