| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the code generator generated small code fragments labelled
with __stginit_M for each module M, and these performed whatever
initialisation was necessary for that module and recursively invoked
the initialisation functions for imported modules. This appraoch had
drawbacks:
- FFI users had to call hs_add_root() to ensure the correct
initialisation routines were called. This is a non-standard,
and ugly, API.
- unless we were using -split-objs, the __stginit dependencies would
entail linking the whole transitive closure of modules imported,
whether they were actually used or not. In an extreme case (#4387,
#4417), a module from GHC might be imported for use in Template
Haskell or an annotation, and that would force the whole of GHC to
be needlessly linked into the final executable.
So now instead we do our initialisation with C functions marked with
__attribute__((constructor)), which are automatically invoked at
program startup time (or DSO load-time). The C initialisers are
emitted into the stub.c file. This means that every time we compile
with -prof or -hpc, we now get a stub file, but thanks to #3687 that
is now invisible to the user.
There are some refactorings in the RTS (particularly for HPC) to
handle the fact that initialisers now get run earlier than they did
before.
The __stginit symbols are still generated, and the hs_add_root()
function still exists (but does nothing), for backwards compatibility.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new code generator was doing some interesting spilling across
unsafe foreign calls:
_c1ao::I32 = Hp - 4;
I32[Sp - 20] = _c1ao::I32;
foreign "ccall"
newCAF((BaseReg, PtrHint), (R1, PtrHint))[_unsafe_call_];
_c1ao::I32 = I32[Sp - 20];
This is fairly unnecessary, and resulted from over-conservative
liveness analysis from CmmLive. We can see that the old code
generator only saved volatile registers across unsafe foreign calls:
spilling variables was done by saveVolatileVarsAndRegs, which was
only performed for ordinary calls.
This commit removes the excess kill from the liveness analysis, as well
as the *redundant* excess kill from spilling-and-reloading, and adds a
note to CmmNode to this effect. The only registers we need to kill
are the ones that the foreign call assigns to, just like any other
machine instruction.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A common sequence of commands (at least for me) is this:
$ ghc hello
1 of 1] Compiling Main ( hello.hs, hello.o )
Linking hello ...
$ ./hello +RTS -s
hello: Most RTS options are disabled. Link with -rtsopts to enable them.
$ ghc hello -rtsopts
$
grr, nothing happened. I could use -fforce-recomp, but if this was a
large program I probably don't want to recompile it all again, so:
$ rm hello
removed `hello'
$ ghc hello -rtsopts
Linking hello ...
$ ./hello +RTS -s
./hello +RTS -s
Hello World!
51,264 bytes allocated in the heap
2,904 bytes copied during GC
43,808 bytes maximum residency (1 sample(s))
17,632 bytes maximum slop
etc.
With this patch, GHC notices when the options have changed and forces
a relink, so you don't need to rm the binary or use -fforce-recomp.
This is done by adding the pertinent stuff to the binary in a special
section called ".debug-ghc-link-info":
$ readelf -p .debug-ghc-link-info ./hello
String dump of section 'ghc-linker-opts':
[ 0] (["-lHSbase-4.3.1.0","-lHSinteger-gmp-0.2.0.2","-lgmp","-lHSghc-prim-0.2.0.0","-lHSrts","-lm","-lrt","-ldl","-u","ghczmprim_GHCziTypes_Izh_static_info","-u","ghczmprim_GHCziTypes_Czh_static_info","-u","ghczmprim_GHCziTypes_Fzh_static_info","-u","ghczmprim_GHCziTypes_Dzh_static_info","-u","base_GHCziPtr_Ptr_static_info","-u","base_GHCziWord_Wzh_static_info","-u","base_GHCziInt_I8zh_static_info","-u","base_GHCziInt_I16zh_static_info","-u","base_GHCziInt_I32zh_static_info","-u","base_GHCziInt_I64zh_static_info","-u","base_GHCziWord_W8zh_static_info","-u","base_GHCziWord_W16zh_static_info","-u","base_GHCziWord_W32zh_static_info","-u","base_GHCziWord_W64zh_static_info","-u","base_GHCziStable_StablePtr_static_info","-u","ghczmprim_GHCziTypes_Izh_con_info","-u","ghczmprim_GHCziTypes_Czh_con_info","-u","ghczmprim_GHCziTypes_Fzh_con_info","-u","ghczmprim_GHCziTypes_Dzh_con_info","-u","base_GHCziPtr_Ptr_con_info","-u","base_GHCziPtr_FunPtr_con_info","-u","base_GHCziStable_StablePtr_con_info","-u","ghczmprim_GHCziTypes_False_closure","-u","ghczmprim_GHCziTypes_True_closure","-u","base_GHCziPack_unpackCString_closure","-u","base_GHCziIOziException_stackOverflow_closure","-u","base_GHCziIOziException_heapOverflow_closure","-u","base_ControlziExceptionziBase_nonTermination_closure","-u","base_GHCziIOziException_blockedIndefinitelyOnMVar_closure","-u","base_GHCziIOziException_blockedIndefinitelyOnSTM_closure","-u","base_ControlziExceptionziBase_nestedAtomically_closure","-u","base_GHCziWeak_runFinalizzerBatch_closure","-u","base_GHCziTopHandler_runIO_closure","-u","base_GHCziTopHandler_runNonIO_closure","-u","base_GHCziConcziIO_ensureIOManagerIsRunning_closure","-u","base_GHCziConcziSync_runSparks_closure","-u","base_GHCziConcziSignal_runHandlers_closure","-lHSffi"],Nothing,RtsOptsAll,False,[],[])
And GHC itself uses the readelf command to extract it when deciding
whether to relink. The reason for the name ".debug-ghc-link-info" is
that sections beginning with ".debug" are removed automatically by
strip.
This currently only works on Linux; Windows and OS X still have the
old behaviour.
|
|
|
|
|
|
|
| |
This change may constitute a substantial performance hit, due to the new
creation of a set for every instruction we emit.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, on register-deficient architectures like x86-32,
the new code generator would emit code for calls to stg_gc_l1,
stg_gc_d1 and stg_gc_f1 that pushed their single argument on
to the stack, while the functions themselves expected the
argument to live in L1, D1 and F1 (respectively). This was
because cmmCall with the GC calling convention allocated real
registers, not virtual registers.
This patch modifies the code for assigning registers/stack slots
to use the right calling convention for GC and adds an assertion
to ensure it did it properly.
|
| |
|
|
|
|
| |
There might be more simplification to do.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the new code generator to make use of the Hoopl package
for dataflow analysis. Hoopl is a new boot package, and is maintained
in a separate upstream git repository (as usual, GHC has its own
lagging darcs mirror in http://darcs.haskell.org/packages/hoopl).
During this merge I squashed recent history into one patch. I tried
to rebase, but the history had some internal conflicts of its own
which made rebase extremely confusing, so I gave up. The history I
squashed was:
- Update new codegen to work with latest Hoopl
- Add some notes on new code gen to cmm-notes
- Enable Hoopl lag package.
- Add SPJ note to cmm-notes
- Improve GC calls on new code generator.
Work in this branch was done by:
- Milan Straka <fox@ucw.cz>
- John Dias <dias@cs.tufts.edu>
- David Terei <davidterei@gmail.com>
Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD
and fixed a few bugs.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes two changes to the way stacks are managed:
1. The stack is now stored in a separate object from the TSO.
This means that it is easier to replace the stack object for a thread
when the stack overflows or underflows; we don't have to leave behind
the old TSO as an indirection any more. Consequently, we can remove
ThreadRelocated and deRefTSO(), which were a pain.
This is obviously the right thing, but the last time I tried to do it
it made performance worse. This time I seem to have cracked it.
2. Stacks are now represented as a chain of chunks, rather than
a single monolithic object.
The big advantage here is that individual chunks are marked clean or
dirty according to whether they contain pointers to the young
generation, and the GC can avoid traversing clean stack chunks during
a young-generation collection. This means that programs with deep
stacks will see a big saving in GC overhead when using the default GC
settings.
A secondary advantage is that there is much less copying involved as
the stack grows. Programs that quickly grow a deep stack will see big
improvements.
In some ways the implementation is simpler, as nothing special needs
to be done to reclaim stack as the stack shrinks (the GC just recovers
the dead stack chunks). On the other hand, we have to manage stack
underflow between chunks, so there's a new stack frame
(UNDERFLOW_FRAME), and we now have separate TSO and STACK objects.
The total amount of code is probably about the same as before.
There are new RTS flags:
-ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m
-kc<size> Sets the stack chunk size (default 32k)
-kb<size> Sets the stack chunk buffer size (default 1k)
-ki was previously called just -k, and the old name is still accepted
for backwards compatibility. These new options are documented.
|
|
|
|
|
| |
This was apparently needed at some point during the new typechecker
development, but does not seem to be required now.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
There are two main changes
* New LANGUAGE option RebindableSyntax, which implies NoImplicitPrelude
* if-the-else becomes rebindable, with function name "ifThenElse"
(but case expressions are unaffected)
Thanks to Sam Anklesaria for doing most of the work here
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch that adds support for interruptible FFI calls in the form
of a new foreign import keyword 'interruptible', which can be used
instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe
FFI calls, except that the worker thread they run on may be interrupted.
Internally, it replaces BlockedOnCCall_NoUnblockEx with
BlockedOnCCall_Interruptible, and changes the behavior of the RTS
to not modify the TSO_ flags on the event of an FFI call from
a thread that was interruptible. It also modifies the bytecode
format for foreign call, adding an extra Word16 to indicate
interruptibility.
The semantics of interruption vary from platform to platform, but the
intent is that any blocking system calls are aborted with an error code.
This is most useful for making function calls to system library
functions that support interrupting. There is no support for pre-Vista
Windows.
There is a partner testsuite patch which adds several tests for this
functionality.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We still have
insertList, insertListWith, deleteList
which aren't in Data.Map, and
foldRightWithKey
which works around the fold(r)WithKey addition and deprecation.
|
| |
|
|
|
|
|
|
|
|
|
| |
This major patch implements the new OutsideIn constraint solving
algorithm in the typecheker, following our JFP paper "Modular type
inference with local assumptions".
Done with major help from Dimitrios Vytiniotis and Brent Yorgey.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Happy generates
notHappyAtAll = error "Blah"
without a type signature, and currently the new
typechecker doesn't generalise it. This patch
says "no monomorphism restriction" which makes it
generalise again.
Better would be to add a type sig to Happy's template
|
| |
|
|
|
|
|
| |
"printable" is ASCII-only, whereas in other locales we can get things like
# 1 "<lĂnea-de-orden>"
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was done as part of an honours thesis at UNSW, the paper describing the
work and results can be found at:
http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
A Homepage for the backend can be found at:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM
Quick summary of performance is that for the 'nofib' benchmark suite, runtimes
are within 5% slower than the NCG and generally better than the C code
generator. For some code though, such as the DPH projects benchmark, the LLVM
code generator outperforms the NCG and C code generator by about a 25%
reduction in run times.
|
|
|
|
| |
Noticed by Denys Rtveliashvili <rtvd@mac.com>, see #4004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces the global blackhole_queue with a clever scheme that
enables us to queue up blocked threads on the closure that they are
blocked on, while still avoiding atomic instructions in the common
case.
Advantages:
- gets rid of a locked global data structure and some tricky GC code
(replacing it with some per-thread data structures and different
tricky GC code :)
- wakeups are more prompt: parallel/concurrent performance should
benefit. I haven't seen anything dramatic in the parallel
benchmarks so far, but a couple of threading benchmarks do improve
a bit.
- waking up a thread blocked on a blackhole is now O(1) (e.g. if
it is the target of throwTo).
- less sharing and better separation of Capabilities: communication
is done with messages, the data structures are strictly owned by a
Capability and cannot be modified except by sending messages.
- this change will utlimately enable us to do more intelligent
scheduling when threads block on each other. This is what started
off the whole thing, but it isn't done yet (#3838).
I'll be documenting all this on the wiki in due course.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Allow a temporary assignment to be pushed past an assignment to a
global if the global is not mentioned in the rhs of the assignment we
are inlining.
This fixes up some bad code. We should make sure we're doing
something equivalent in the new backend in due course.
|
|
|
|
| |
From Maxime Henrion <mhenrion@gmail.com>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
The problem in #3741 was that we had confused column numbers with byte
offsets, which fails in the case of UTF-8 (amongst other things).
Fortunately we're tracking correct column offsets now, so we didn't
have to make a calculation based on a byte offset. I got rid of two
fields from the PState (last_line_len and last_offs).and one field
from the AI (alex input) constructor.
|
| |
|
|
|
|
| |
Also corrected a couple of line 0's to line 1
|
| |
|
| |
|
| |
|