summaryrefslogtreecommitdiff
path: root/compiler/cmm
Commit message (Collapse)AuthorAgeFilesLines
...
* Change the way module initialisation is done (#3252, #4417)Simon Marlow2011-04-122-72/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the code generator generated small code fragments labelled with __stginit_M for each module M, and these performed whatever initialisation was necessary for that module and recursively invoked the initialisation functions for imported modules. This appraoch had drawbacks: - FFI users had to call hs_add_root() to ensure the correct initialisation routines were called. This is a non-standard, and ugly, API. - unless we were using -split-objs, the __stginit dependencies would entail linking the whole transitive closure of modules imported, whether they were actually used or not. In an extreme case (#4387, #4417), a module from GHC might be imported for use in Template Haskell or an annotation, and that would force the whole of GHC to be needlessly linked into the final executable. So now instead we do our initialisation with C functions marked with __attribute__((constructor)), which are automatically invoked at program startup time (or DSO load-time). The C initialisers are emitted into the stub.c file. This means that every time we compile with -prof or -hpc, we now get a stub file, but thanks to #3687 that is now invisible to the user. There are some refactorings in the RTS (particularly for HPC) to handle the fact that initialisers now get run earlier than they did before. The __stginit symbols are still generated, and the hs_add_root() function still exists (but does nothing), for backwards compatibility.
* Unsafe foreign calls (fat machine instructions) do not kill all registers.Edward Z. Yang2011-04-113-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | The new code generator was doing some interesting spilling across unsafe foreign calls: _c1ao::I32 = Hp - 4; I32[Sp - 20] = _c1ao::I32; foreign "ccall" newCAF((BaseReg, PtrHint), (R1, PtrHint))[_unsafe_call_]; _c1ao::I32 = I32[Sp - 20]; This is fairly unnecessary, and resulted from over-conservative liveness analysis from CmmLive. We can see that the old code generator only saved volatile registers across unsafe foreign calls: spilling variables was done by saveVolatileVarsAndRegs, which was only performed for ordinary calls. This commit removes the excess kill from the liveness analysis, as well as the *redundant* excess kill from spilling-and-reloading, and adds a note to CmmNode to this effect. The only registers we need to kill are the ones that the foreign call assigns to, just like any other machine instruction. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Force re-linking if the options have changed (#4451)Simon Marlow2011-04-081-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common sequence of commands (at least for me) is this: $ ghc hello 1 of 1] Compiling Main ( hello.hs, hello.o ) Linking hello ... $ ./hello +RTS -s hello: Most RTS options are disabled. Link with -rtsopts to enable them. $ ghc hello -rtsopts $ grr, nothing happened. I could use -fforce-recomp, but if this was a large program I probably don't want to recompile it all again, so: $ rm hello removed `hello' $ ghc hello -rtsopts Linking hello ... $ ./hello +RTS -s ./hello +RTS -s Hello World! 51,264 bytes allocated in the heap 2,904 bytes copied during GC 43,808 bytes maximum residency (1 sample(s)) 17,632 bytes maximum slop etc. With this patch, GHC notices when the options have changed and forces a relink, so you don't need to rm the binary or use -fforce-recomp. This is done by adding the pertinent stuff to the binary in a special section called ".debug-ghc-link-info": $ readelf -p .debug-ghc-link-info ./hello String dump of section 'ghc-linker-opts': [ 0] (["-lHSbase-4.3.1.0","-lHSinteger-gmp-0.2.0.2","-lgmp","-lHSghc-prim-0.2.0.0","-lHSrts","-lm","-lrt","-ldl","-u","ghczmprim_GHCziTypes_Izh_static_info","-u","ghczmprim_GHCziTypes_Czh_static_info","-u","ghczmprim_GHCziTypes_Fzh_static_info","-u","ghczmprim_GHCziTypes_Dzh_static_info","-u","base_GHCziPtr_Ptr_static_info","-u","base_GHCziWord_Wzh_static_info","-u","base_GHCziInt_I8zh_static_info","-u","base_GHCziInt_I16zh_static_info","-u","base_GHCziInt_I32zh_static_info","-u","base_GHCziInt_I64zh_static_info","-u","base_GHCziWord_W8zh_static_info","-u","base_GHCziWord_W16zh_static_info","-u","base_GHCziWord_W32zh_static_info","-u","base_GHCziWord_W64zh_static_info","-u","base_GHCziStable_StablePtr_static_info","-u","ghczmprim_GHCziTypes_Izh_con_info","-u","ghczmprim_GHCziTypes_Czh_con_info","-u","ghczmprim_GHCziTypes_Fzh_con_info","-u","ghczmprim_GHCziTypes_Dzh_con_info","-u","base_GHCziPtr_Ptr_con_info","-u","base_GHCziPtr_FunPtr_con_info","-u","base_GHCziStable_StablePtr_con_info","-u","ghczmprim_GHCziTypes_False_closure","-u","ghczmprim_GHCziTypes_True_closure","-u","base_GHCziPack_unpackCString_closure","-u","base_GHCziIOziException_stackOverflow_closure","-u","base_GHCziIOziException_heapOverflow_closure","-u","base_ControlziExceptionziBase_nonTermination_closure","-u","base_GHCziIOziException_blockedIndefinitelyOnMVar_closure","-u","base_GHCziIOziException_blockedIndefinitelyOnSTM_closure","-u","base_ControlziExceptionziBase_nestedAtomically_closure","-u","base_GHCziWeak_runFinalizzerBatch_closure","-u","base_GHCziTopHandler_runIO_closure","-u","base_GHCziTopHandler_runNonIO_closure","-u","base_GHCziConcziIO_ensureIOManagerIsRunning_closure","-u","base_GHCziConcziSync_runSparks_closure","-u","base_GHCziConcziSignal_runHandlers_closure","-lHSffi"],Nothing,RtsOptsAll,False,[],[]) And GHC itself uses the readelf command to extract it when deciding whether to relink. The reason for the name ".debug-ghc-link-info" is that sections beginning with ".debug" are removed automatically by strip. This currently only works on Linux; Windows and OS X still have the old behaviour.
* CmmOpt cannot assume single assignment for hand-written or new codegen Cmm.Edward Z. Yang2011-04-051-6/+14
| | | | | | | This change may constitute a substantial performance hit, due to the new creation of a set for every instruction we emit. Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Give infinite fuel to required C-- transformations. Fixes #4971.Edward Z. Yang2011-04-052-12/+29
| | | | Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
* Ignore comments when inlining.Edward Z. Yang2011-03-251-0/+1
|
* RednCounts can contain CAFs, so support them in cvtToClosureLbl.Edward Z. Yang2011-03-221-0/+1
|
* New codegen: GC calling convention must use registers.Edward Z. Yang2011-02-181-1/+3
| | | | | | | | | | | | | | Previously, on register-deficient architectures like x86-32, the new code generator would emit code for calls to stg_gc_l1, stg_gc_d1 and stg_gc_f1 that pushed their single argument on to the stack, while the functions themselves expected the argument to live in L1, D1 and F1 (respectively). This was because cmmCall with the GC calling convention allocated real registers, not virtual registers. This patch modifies the code for assigning registers/stack slots to use the right calling convention for GC and adds an assertion to ensure it did it properly.
* constant fold (a + N) - M and (a - N) + MSimon Marlow2011-02-101-2/+11
|
* Recursively call cmmMachOpFold on divides that we turned into shiftsSimon Marlow2011-02-081-3/+3
| | | | There might be more simplification to do.
* fix DEBUG buildSimon Marlow2011-01-311-1/+3
|
* Fix warningsSimon Marlow2011-01-284-5/+27
|
* Merge in new code generator branch.Simon Marlow2011-01-2452-9501/+4954
| | | | | | | | | | | | | | | | | | | | | | | | | | This changes the new code generator to make use of the Hoopl package for dataflow analysis. Hoopl is a new boot package, and is maintained in a separate upstream git repository (as usual, GHC has its own lagging darcs mirror in http://darcs.haskell.org/packages/hoopl). During this merge I squashed recent history into one patch. I tried to rebase, but the history had some internal conflicts of its own which made rebase extremely confusing, so I gave up. The history I squashed was: - Update new codegen to work with latest Hoopl - Add some notes on new code gen to cmm-notes - Enable Hoopl lag package. - Add SPJ note to cmm-notes - Improve GC calls on new code generator. Work in this branch was done by: - Milan Straka <fox@ucw.cz> - John Dias <dias@cs.tufts.edu> - David Terei <davidterei@gmail.com> Edward Z. Yang <ezyang@mit.edu> merged in further changes from GHC HEAD and fixed a few bugs.
* Fix longstanding bug in C-- inlining for functions calls.Edward Z. Yang2011-01-131-1/+1
|
* Remove code that is dead now that we need >= 6.12 to buildIan Lynagh2010-12-155-10/+0
|
* Implement stack chunks and separate TSO/STACK objectsSimon Marlow2010-12-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two changes to the way stacks are managed: 1. The stack is now stored in a separate object from the TSO. This means that it is easier to replace the stack object for a thread when the stack overflows or underflows; we don't have to leave behind the old TSO as an indirection any more. Consequently, we can remove ThreadRelocated and deRefTSO(), which were a pain. This is obviously the right thing, but the last time I tried to do it it made performance worse. This time I seem to have cracked it. 2. Stacks are now represented as a chain of chunks, rather than a single monolithic object. The big advantage here is that individual chunks are marked clean or dirty according to whether they contain pointers to the young generation, and the GC can avoid traversing clean stack chunks during a young-generation collection. This means that programs with deep stacks will see a big saving in GC overhead when using the default GC settings. A secondary advantage is that there is much less copying involved as the stack grows. Programs that quickly grow a deep stack will see big improvements. In some ways the implementation is simpler, as nothing special needs to be done to reclaim stack as the stack shrinks (the GC just recovers the dead stack chunks). On the other hand, we have to manage stack underflow between chunks, so there's a new stack frame (UNDERFLOW_FRAME), and we now have separate TSO and STACK objects. The total amount of code is probably about the same as before. There are new RTS flags: -ki<size> Sets the initial thread stack size (default 1k) Egs: -ki4k -ki2m -kc<size> Sets the stack chunk size (default 32k) -kb<size> Sets the stack chunk buffer size (default 1k) -ki was previously called just -k, and the old name is still accepted for backwards compatibility. These new options are documented.
* remove -XNoMonomorphismRestrictionSimon Marlow2010-11-171-9/+1
| | | | | This was apparently needed at some point during the new typechecker development, but does not seem to be required now.
* add some {-# LANGUAGE BangPatterns #-} to mollify GHCSimon Marlow2010-11-171-0/+1
|
* Remove unncessary fromIntegral callssimonpj@microsoft.com2010-11-161-1/+1
|
* More modules that need LANGUAGE BangPatternssimonpj@microsoft.com2010-11-121-0/+1
|
* Add rebindable syntax for if-then-elsesimonpj@microsoft.com2010-10-221-2/+2
| | | | | | | | | | | There are two main changes * New LANGUAGE option RebindableSyntax, which implies NoImplicitPrelude * if-the-else becomes rebindable, with function name "ifThenElse" (but case expressions are unaffected) Thanks to Sam Anklesaria for doing most of the work here
* Interruptible FFI calls with pthread_kill and CancelSynchronousIO. v4Edward Z. Yang2010-09-198-15/+31
| | | | | | | | | | | | | | | | | | | | | | | This is patch that adds support for interruptible FFI calls in the form of a new foreign import keyword 'interruptible', which can be used instead of 'safe' or 'unsafe'. Interruptible FFI calls act like safe FFI calls, except that the worker thread they run on may be interrupted. Internally, it replaces BlockedOnCCall_NoUnblockEx with BlockedOnCCall_Interruptible, and changes the behavior of the RTS to not modify the TSO_ flags on the event of an FFI call from a thread that was interruptible. It also modifies the bytecode format for foreign call, adding an extra Word16 to indicate interruptibility. The semantics of interruption vary from platform to platform, but the intent is that any blocking system calls are aborted with an error code. This is most useful for making function calls to system library functions that support interrupting. There is no support for pre-Vista Windows. There is a partner testsuite patch which adds several tests for this functionality.
* Fix build following haskell98 and -fglasgow-exts changesIan Lynagh2010-10-061-3/+2
|
* Some refactoring and simplification in TcInteract.occurChecksimonpj@microsoft.com2010-10-071-1/+1
|
* Remove (most of) the FiniteMap wrapperIan Lynagh2010-09-146-91/+102
| | | | | | | | We still have insertList, insertListWith, deleteList which aren't in Data.Map, and foldRightWithKey which works around the fold(r)WithKey addition and deprecation.
* Fix build with 6.10Ian Lynagh2010-09-135-0/+10
|
* Super-monster patch implementing the new typechecker -- at lastsimonpj@microsoft.com2010-09-138-9/+53
| | | | | | | | | This major patch implements the new OutsideIn constraint solving algorithm in the typecheker, following our JFP paper "Modular type inference with local assumptions". Done with major help from Dimitrios Vytiniotis and Brent Yorgey.
* Work around missing type signature in Happysimonpj@microsoft.com2010-07-301-1/+9
| | | | | | | | | | | | | Happy generates notHappyAtAll = error "Blah" without a type signature, and currently the new typechecker doesn't generalise it. This patch says "no monomorphism restriction" which makes it generalise again. Better would be to add a type sig to Happy's template
* Add two local type signaturessimonpj@microsoft.com2010-07-292-12/+18
|
* Don't restrict filenames in line pragmas to printable characters; fixes #4207Ian Lynagh2010-08-051-1/+1
| | | | | "printable" is ASCII-only, whereas in other locales we can get things like # 1 "<lĂ­nea-de-orden>"
* Remove an unnecessary #includeIan Lynagh2010-07-151-2/+0
|
* typo in commentSimon Marlow2010-06-161-1/+1
|
* Make mkPState and pragState take their arguments in the same orderIan Lynagh2010-07-061-1/+1
|
* Add new LLVM code generator to GHC. (Version 2)David Terei2010-06-152-1/+2
| | | | | | | | | | | | | | | | | | This was done as part of an honours thesis at UNSW, the paper describing the work and results can be found at: http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf A Homepage for the backend can be found at: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM Quick summary of performance is that for the 'nofib' benchmark suite, runtimes are within 5% slower than the NCG and generally better than the C code generator. For some code though, such as the DPH projects benchmark, the LLVM code generator outperforms the NCG and C code generator by about a 25% reduction in run times.
* Add missing constant folding and optimisation for unsigned divisionSimon Marlow2010-04-221-0/+5
| | | | Noticed by Denys Rtveliashvili <rtvd@mac.com>, see #4004
* New implementation of BLACKHOLEsSimon Marlow2010-03-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the global blackhole_queue with a clever scheme that enables us to queue up blocked threads on the closure that they are blocked on, while still avoiding atomic instructions in the common case. Advantages: - gets rid of a locked global data structure and some tricky GC code (replacing it with some per-thread data structures and different tricky GC code :) - wakeups are more prompt: parallel/concurrent performance should benefit. I haven't seen anything dramatic in the parallel benchmarks so far, but a couple of threading benchmarks do improve a bit. - waking up a thread blocked on a blackhole is now O(1) (e.g. if it is the target of throwTo). - less sharing and better separation of Capabilities: communication is done with messages, the data structures are strictly owned by a Capability and cannot be modified except by sending messages. - this change will utlimately enable us to do more intelligent scheduling when threads block on each other. This is what started off the whole thing, but it isn't done yet (#3838). I'll be documenting all this on the wiki in due course.
* Comments and type signatures onlysimonpj@microsoft.com2010-03-091-0/+23
|
* add a noteSimon Marlow2010-03-091-0/+3
|
* Beef up cmmMiniInline a tiny bitSimon Marlow2010-02-162-20/+32
| | | | | | | | | Allow a temporary assignment to be pushed past an assignment to a global if the global is not mentioned in the rhs of the assignment we are inlining. This fixes up some bad code. We should make sure we're doing something equivalent in the new backend in due course.
* Patch for shared libraries support on FreeBSDIan Lynagh2010-01-061-3/+3
| | | | From Maxime Henrion <mhenrion@gmail.com>
* Assume CmmLabels have dynamic linkage on non-WindowsBen.Lippmeier.anu.edu.au2010-01-021-3/+5
|
* When compiling viac, don't need to emit prototypes for symbols in the RTSBen.Lippmeier@anu.edu.au2010-01-021-1/+9
|
* Tag ForeignCalls with the package they correspond toBen.Lippmeier@anu.edu.au2010-01-024-35/+141
|
* Typo in commentBen.Lippmeier@anu.edu.au2009-12-291-1/+1
|
* Fix #3741, simplifying things in the processSimon Marlow2009-12-101-2/+2
| | | | | | | | | The problem in #3741 was that we had confused column numbers with byte offsets, which fails in the case of UTF-8 (amongst other things). Fortunately we're tracking correct column offsets now, so we didn't have to make a calculation based on a byte offset. I got rid of two fields from the PState (last_line_len and last_offs).and one field from the AI (alex input) constructor.
* Add a new to-do to cmm-notessimonpj@microsoft.com2009-12-071-0/+4
|
* Columns now start at 1, as lines already didIan Lynagh2009-11-272-2/+2
| | | | Also corrected a couple of line 0's to line 1
* Comments onlysimonpj@microsoft.com2009-11-122-14/+35
|
* Morguing dead codedias@cs.tufts.edu2009-09-181-85/+8
|
* More sensible use of -fnew-codegen and less debugging outputdias@cs.tufts.edu2009-09-184-19/+19
|