delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Constants: add a note and fix minor doc glitches	Sylvain Henry	2021-04-10	1	-2/+2
\|
*	rts: Initialize card table in newArray#	Ben Gamari	2021-01-17	1	-13/+20
\| \| \| \| \| \| \| \| \| \|	Previously we would leave the card table of new arrays uninitialized. This wasn't a soundness issue: at worst we would end up doing unnecessary scavenging during GC, after which the card table would be reset. That being said, it seems worth initializing this properly to avoid both unnecessary work and non-determinism. Fixes #19143.
*	rts: Zero shrunk array slop in vanilla RTS	Ben Gamari	2021-01-07	1	-1/+5
\| \| \| \| \| \|	But only when profiling or DEBUG are enabled. Fixes #17572.
*	rts: Fix integer width in TICK_BUMP_BY	Ben Gamari	2020-10-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously `TICK_BUMP_BY` was defined as ```c #define TICK_BUMP_BY(ctr,n) CLong[ctr] = CLong[ctr] + n ``` Yet the tickers themselves were defined as `StgInt`s. This happened to work out correctly on Linux, where `CLong` is 64-bits. However, it failed on Windows, where `CLong` is 32-bits, resulting in #18782. Fixes #18783.
*	Define TICKY_TICKY when compiling cmm RTS files.	David Himmelstrup	2020-09-11	1	-1/+5
\|
*	Cmm: introduce SAVE_REGS/RESTORE_REGS	Sylvain Henry	2020-06-23	1	-69/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	We don't want to save both Fn and Dn register sets on x86-64 as they are aliased to the same arch register (XMMn). Moreover, when SAVE_STGREGS was used in conjunction with `jump foo []` which makes a set of Cmm registers alive so that they cover all arch registers used to pass parameter, we could have Fn, Dn and XMMn alive at the same time. It made the LLVM code generator choke (see #17920). Now `SAVE_REGS/RESTORE_REGS` and `jump foo []` use the same set of registers.
*	Cleanup OVERWRITING_CLOSURE logic	Daniel Gröber	2020-06-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	The code is just more confusing than it needs to be. We don't need to mix the threaded check with the ldv profiling check since ldv's init already checks for this. Hence they can be two separate checks. Taking the sanity checking into account is also cleaner via DebugFlags.sanity. No need for checking the DEBUG define. The ZERO_SLOP_FOR_LDV_PROF and ZERO_SLOP_FOR_SANITY_CHECK definitions the old code had also make things a lot more opaque IMO so I removed those.
*	Module hierarchy: Cmm (cf #13009)	Sylvain Henry	2020-01-25	1	-2/+2
\|
*	Nonmoving: Ensure write barrier vanishes in non-threaded RTS	Ben Gamari	2019-10-21	1	-4/+8
\|
*	rts: Implement concurrent collection in the nonmoving collector	Ben Gamari	2019-10-20	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the non-moving collector to allow concurrent collection. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) This extension involves the introduction of a capability-local remembered set, known as the /update remembered set/, which tracks objects which may no longer be visible to the collector due to mutation. To maintain this remembered set we introduce a write barrier on mutations which is enabled while a concurrent mark is underway. The update remembered set representation is similar to that of the nonmoving mark queue, being a chunked array of `MarkEntry`s. Each `Capability` maintains a single accumulator chunk, which it flushed when it (a) is filled, or (b) when the nonmoving collector enters its post-mark synchronization phase. While the write barrier touches a significant amount of code it is conceptually straightforward: the mutator must ensure that the referee of any pointer it overwrites is added to the update remembered set. However, there are a few details: * In the case of objects with a dirty flag (e.g. `MVar`s) we can exploit the fact that only the first mutation requires a write barrier. * Weak references, as usual, complicate things. In particular, we must ensure that the referee of a weak object is marked if dereferenced by the mutator. For this we (unfortunately) must introduce a read barrier, as described in Note [Concurrent read barrier on deRefWeak#] (in `NonMovingMark.c`). * Stable names are also a bit tricky as described in Note [Sweeping stable names in the concurrent collector] (`NonMovingSweep.c`). We take quite some pains to ensure that the high thread count often seen in parallel Haskell applications doesn't affect pause times. To this end we allow thread stacks to be marked either by the thread itself (when it is executed or stack-underflows) or the concurrent mark thread (if the thread owning the stack is never scheduled). There is a non-trivial handshake to ensure that this happens without racing which is described in Note [StgStack dirtiness flags and concurrent marking]. Co-Authored-by: Ömer Sinan Ağacan <omer@well-typed.com>
*	Deduplicate `HaskellMachRegs.h` and `RtsMachRegs.h` headers	John Ericson	2019-09-17	1	-1/+1
\| \| \| \| \| \| \|	Until 0472f0f6a92395d478e9644c0dbd12948518099f there was a meaningful host vs target distinction (though it wasn't used right, in genapply). After that, they did not differ in meaningful ways, so it's best to just only keep one.
*	Correct closure observation, construction, and mutation on weak memory machines.	Travis Whitaker	2019-06-28	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	stg_floatToWord32zh: zero-extend the Word32 (#16617)	Kevin Buhr	2019-05-08	1	-1/+6
\| \| \| \| \| \| \|	The primop stgFloatToWord32 was sign-extending the 32-bit word, resulting in weird negative Word32s. Zero-extend them instead. Closes #16617.
*	Update Wiki URLs to point to GitLab	Takenobu Tani	2019-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
*	Fix slop zeroing for AP_STACK eager blackholes in debug build	Ömer Sinan Ağacan	2018-09-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As #15571 reports, eager blackholing breaks sanity checks as we can't zero the payload when eagerly blackholing (because we'll be using the payload after blackholing), but by the time we blackhole a previously eagerly blackholed object (in `threadPaused()`) we don't have the correct size information for the object (because the object's type becomes BLACKHOLE when we eagerly blackhole it) so can't properly zero the slop. This problem can be solved for AP_STACK eager blackholing (which unlike eager blackholing in general, is not optional) by zeroing the payload after entering the stack. This patch implements this idea. Fixes #15571. Test Plan: Previously concprog001 when compiled and run with sanity checks ghc-stage2 Mult.hs -debug -rtsopts ./Mult +RTS -DS was failing with Mult: internal error: checkClosure: stack frame (GHC version 8.7.20180821 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug thic patch fixes this panic. The test still panics, but it runs for a while before panicking (instead of directly panicking as before), and the new problem seems unrelated: Mult: internal error: ASSERTION FAILED: file rts/sm/Sanity.c, line 296 (GHC version 8.7.20180919 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug The new problem will be fixed in another diff. I also tried slow validate (which requires D5164): this does not introduce any new failures. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15571 Differential Revision: https://phabricator.haskell.org/D5165
*	Use __FILE__ for Cmm assertion locations, fix #8619	Ömer Sinan Ağacan	2018-06-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems like we currently support string literals in Cmm, so we can use __LINE__ CPP macro in assertion macros. This improves error messages that previously looked like ASSERTION FAILED: file (null), line 1302 (null) part now shows the actual file name. Also inline some single-use string literals in PrimOps.cmm. Reviewers: bgamari, simonmar, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4862
*	rts: Rip out support for STM invariants	Ben Gamari	2018-06-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This feature has some very serious correctness issues (#14310), introduces a great deal of complexity, and hasn't seen wide usage. Consequently we are removing it, as proposed in Proposal #77 [1]. This is heavily based on a patch from fryguybob. Updates stm submodule. [1] https://github.com/ghc-proposals/ghc-proposals/pull/77 Test Plan: Validate Reviewers: erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14310 Differential Revision: https://phabricator.haskell.org/D4760
*	Improve accuracy of get/setAllocationCounter	Ben Gamari	2018-03-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: get/setAllocationCounter didn't take into account allocations in the current block. This was known at the time, but it turns out to be important to have more accuracy when using these in a fine-grained way. Test Plan: New unit test to test incrementally larger allocaitons. Before I got results like this: ``` +0 +0 +0 +0 +0 +4096 +0 +0 +0 +0 +0 +4064 +0 +0 +4088 +4056 +0 +0 +0 +4088 +4096 +4056 +4096 ``` Notice how the results aren't always monotonically increasing. After this patch: ``` +344 +416 +488 +560 +632 +704 +776 +848 +920 +992 +1064 +1136 +1208 +1280 +1352 +1424 +1496 +1568 +1640 +1712 +1784 +1856 +1928 +2000 +2072 +2144 ``` Reviewers: hvr, erikd, simonmar, jrtc27, trommler Reviewed By: simonmar Subscribers: trommler, jrtc27, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4363
*	Prefer #if defined to #ifdef	Ben Gamari	2017-04-28	1	-9/+9
\| \| \| \|	Our new CPP linter enforces this.
*	Use memcpy in cloneArray	Ben Gamari	2017-04-28	1	-16/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While looking at #13615 I noticed that there was this strange open-coded memcpy in the definition of the cloneArray macro. I don't see why this should be preferable to memcpy. Test Plan: Validate, particularly focusing on array operations Reviewers: simonmar, tibbe, austin, alexbiehl Reviewed By: tibbe, alexbiehl Subscribers: alexbiehl, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3504
*	cpp: Use #pragma once instead of #ifndef guards	Ben Gamari	2017-04-23	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
*	Fix cost-centre-stacks bug (#5654)	Simon Marlow	2016-12-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes some cases of wrong stacks being generated by the profiler. For background and details on the fix see `Note [Evaluating functions with profiling]` in `rts/Apply.cmm`. This does have an impact on allocations for some programs when profiling. nofib results: ``` k-nucleotide +0.0% +8.8% +11.0% +11.0% 0.0% puzzle +0.0% +12.5% 0.244 0.246 0.0% typecheck 0.0% +8.7% +16.1% +16.2% 0.0% ------------------------------------------------------------------------ -------- Min -0.0% -0.0% -34.4% -35.5% -25.0% Max +0.0% +12.5% +48.9% +49.4% +10.6% Geometric Mean +0.0% +0.6% +2.0% +1.8% -0.3% ``` But runtimes don't seem to be affected much, and the examples I looked at were completely legitimate. For example, in puzzle we have this: ``` position :: ItemType -> StateType -> BankType position Bono = bonoPos position Edge = edgePos position Larry = larryPos position Adam = adamPos ``` where the identifiers on the rhs are all record selectors. Previously the profiler gave a stack that looked like ``` position bonoPos ... ``` i.e. `bonoPos` was at the same level of the call stack as `position`, but now it looks like ``` position bonoPos ... ``` I used the normaliser from the testsuite to diff the profiling output from other nofib programs and they all looked better. Test Plan: * the broken test passes * validate * compiled and ran all of nofib, measured perf, diff'd several .prof files Reviewers: niteria, erikd, austin, scpmw, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2804 GHC Trac Issues: #5654, #10007
*	Use C99's bool	Ben Gamari	2016-11-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Test Plan: Validate on lots of platforms Reviewers: erikd, simonmar, austin Reviewed By: erikd, simonmar Subscribers: michalt, thomie Differential Revision: https://phabricator.haskell.org/D2699
*	Use MO_Cmpxchg in Primops.cmm instead of ccall cas(..)	alexbiehl	2016-08-01	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adjust `CmmParse.y` to parse the `cmpxchg{8, 16, 32, 64}` instructions and use the 32 respectively the 64 bit variant in `Primops.cmm`. This effectively eliminates the compare-and-swap ccall to the rts. Based off the mailing list question from @osa1 (https://mail.haskell.org/pipermail/ghc-devs/2016-July/012506.html). Reviewers: simonmar, austin, erikd, bgamari, trommler Reviewed By: erikd, bgamari, trommler Subscribers: carter, trommler, osa1, thomie Differential Revision: https://phabricator.haskell.org/D2431
*	NUMA support	Simon Marlow	2016-06-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
*	Remove unused IND_PERM	Joachim Breitner	2016-01-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it seems that this closure type has not been in use since 5d52d9, so all this is dead and untested code. This removes it. Some of the code might be useful for a counting indirection as described in #10613, so when implementing that, have a look at what this commit removes. Test Plan: validate on harbormaster Reviewers: austin, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1821
*	s/StgArrWords/StgArrBytes/	Siddhanathan Shanmugam	2015-09-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Rename StgArrWords to StgArrBytes (see Trac #8552) Reviewed By: austin Differential Revision: https://phabricator.haskell.org/D1233 GHC Trac Issues: #8552
*	Encode alignment in MO_Memcpy and friends	Ben Gamari	2015-06-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Alignment needs to be a compile-time constant. Previously the code generators had to jump through hoops to ensure this was the case as the alignment was passed as a CmmExpr in the arguments list. Now we take care of this up front. This fixes #8131. Authored-by: Reid Barton <rwbarton@gmail.com> Dusted-off-by: Ben Gamari <ben@smart-cactus.org> Tests for T8131 Test Plan: Validate Reviewers: rwbarton, austin Reviewed By: rwbarton, austin Subscribers: bgamari, carter, thomie Differential Revision: https://phabricator.haskell.org/D624 GHC Trac Issues: #8131
*	Revert "Rename _closure to _static_closure, apply naming consistently."	Edward Z. Yang	2014-10-20	1	-1/+0
\| \| \| \| \| \| \|	This reverts commit 35672072b4091d6f0031417bc160c568f22d0469. Conflicts: compiler/main/DriverPipeline.hs
*	Rename _closure to _static_closure, apply naming consistently.	Edward Z. Yang	2014-10-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In preparation for indirecting all references to closures, we rename _closure to _static_closure to ensure any old code will get an undefined symbol error. In order to reference a closure foobar_closure (which is now undefined), you should instead use STATIC_CLOSURE(foobar). For convenience, a number of these old identifiers are macro'd. Across C-- and C (Windows and otherwise), there were differing conventions on whether or not foobar_closure or &foobar_closure was the address of the closure. Now, all foobar_closure references are addresses, and no & is necessary. CHARLIKE/INTLIKE were not changed, simply alpha-renamed. Part of remove HEAP_ALLOCED patch set (#8199) Depends on D265 Signed-off-by: Edward Z. Yang <ezyang@mit.edu> Test Plan: validate Reviewers: simonmar, austin Subscribers: simonmar, ezyang, carter, thomie Differential Revision: https://phabricator.haskell.org/D267 GHC Trac Issues: #8199
*	[ci skip] includes: detabify/dewhitespace Cmm.h	Austin Seipp	2014-08-20	1	-117/+117
\| \| \| \|	Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Implement {resize,shrink}MutableByteArray# primops	Herbert Valerio Riedel	2014-08-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The two new primops with the type-signatures resizeMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, MutableByteArray# s #) shrinkMutableByteArray# :: MutableByteArray# s -> Int# -> State# s -> State# s allow to resize MutableByteArray#s in-place (when possible), and are useful for algorithms where memory is temporarily over-allocated. The motivating use-case is for implementing integer backends, where the final target size of the result is either N or N+1, and only known after the operation has been performed. A future commit will implement a stateful variant of the `sizeofMutableByteArray#` operation (see #9447 for details), since now the size of a `MutableByteArray#` may change over its lifetime (i.e before it gets frozen or GCed). Test Plan: ./validate --slow Reviewers: ezyang, austin, simonmar Reviewed By: austin, simonmar Differential Revision: https://phabricator.haskell.org/D133
*	Add SmallArray# and SmallMutableArray# types	Johan Tibell	2014-03-29	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These array types are smaller than Array# and MutableArray# and are faster when the array size is small, as they don't have the overhead of a card table. Having no card table reduces the closure size with 2 words in the typical small array case and leads to less work when updating or GC:ing the array. Reduces both the runtime and memory allocation by 8.8% on my insert benchmark for the HashMap type in the unordered-containers package, which makes use of lots of small arrays. With tuned GC settings (i.e. `+RTS -A6M`) the runtime reduction is 15%. Fixes #8923.
*	Make copy array ops out-of-line by default	Johan Tibell	2014-03-28	1	-0/+53
\| \| \| \| \| \|	This should reduce code size when there's little to gain from inlining these primops, while still retaining the inlining benefit when the size of the copy is known statically.
*	codeGen: inline allocation optimization for clone array primops	Johan Tibell	2014-03-22	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The inline allocation version is 69% faster than the out-of-line version, when cloning an array of 16 unit elements on a 64-bit machine. Comparing the new and the old primop implementations isn't straightforward. The old version had a missing heap check that I discovered during the development of the new version. Comparing the old and the new version would requiring fixing the old version, which in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim. The inline allocation threshold is configurable via -fmax-inline-alloc-size which gives the maximum array size, in bytes, to allocate inline. The size does not include the closure header size. Allowing the same primop to be either inline or out-of-line has some implication for how we lay out heap checks. We always place a heap check around out-of-line primops, as they may allocate outside of our knowledge. However, for the inline primops we only allow allocation via the standard means (i.e. virtHp). Since the clone primops might be either inline or out-of-line the heap check layout code now consults shouldInlinePrimOp to know whether a primop will be inlined.
*	Remove use of R9, and fix associated bugs	Simon Marlow	2013-10-01	1	-13/+28
\| \| \| \| \| \| \| \| \| \|	We were passing the function address to stg_gc_prim_p in R9, which was wrong because the call was a high-level call and didn't declare R9 as a parameter. Passing R9 as an argument is the right way, but unfortunately that exposed another bug: we were using the same macro in some low-level Cmm, where it is illegal to call functions with arguments (see Note [syntax of cmm files]). So we now have low-level variants of STK_CHK() and STK_CHK_P() for use in low-level Cmm code.
*	Add support for 512-bit-wide vectors.	Geoffrey Mainland	2013-09-22	1	-0/+1
\|
*	Add support for 256-bit-wide vectors.	Geoffrey Mainland	2013-09-22	1	-0/+1
\|
*	added ticky counters for heap and stack checks	Nicolas Frisby	2013-04-11	1	-0/+2
\|
*	ticky enhancements	Nicolas Frisby	2013-03-29	1	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* the new StgCmmArgRep module breaks a dependency cycle; I also untabified it, but made no real changes * updated the documentation in the wiki and change the user guide to point there * moved the allocation enters for ticky and CCS to after the heap check * I left LDV where it was, which was before the heap check at least once, since I have no idea what it is * standardized all (active?) ticky alloc totals to bytes * in order to avoid double counting StgCmmLayout.adjustHpBackwards no longer bumps ALLOC_HEAP_ctr * I resurrected the SLOW_CALL counters * the new module StgCmmArgRep breaks cyclic dependency between Layout and Ticky (which the SLOW_CALL counters cause) * renamed them SLOW_CALL_fast_<pattern> and VERY_SLOW_CALL * added ALLOC_RTS_ctr and _tot ticky counters * eg allocation by Storage.c:allocate or a BUILD_PAP in stg_ap__info resurrected ticky counters for ALLOC_THK, ALLOC_PAP, and ALLOC_PRIM * added -ticky and -DTICKY_TICKY in ways.mk for debug ways * added a ticky counter for total LNE entries * new flags for ticky: -ticky-allocd -ticky-dyn-thunk -ticky-LNE * all off by default * -ticky-allocd: tracks allocation of closure in addition to allocation by that closure * -ticky-dyn-thunk tracks dynamic thunks as if they were functions * -ticky-LNE tracks LNEs as if they were functions * updated the ticky report format, including making the argument categories (more?) accurate again * the printed name for things in the report include the unique of their ticky parent as well as if they are not top-level
*	Only emit %write_barrier primitive for THREADED_RTS	Gabor Greif	2013-02-26	1	-1/+7
\|
*	Always pass vector values on the stack.	Geoffrey Mainland	2013-02-01	1	-3/+4
\| \| \| \| \|	Vector values are now always passed on the stack. This isn't particularly efficient, but it will have to do for now.
*	STM: Only wake up once	Ben Gamari	2013-01-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Previously, threads blocked on an STM retry would be sent a wakeup message each time an unpark was requested. This could result in the accumulation of a large number of wake-up messages, which would slow wake-up once the sleeping thread is finally scheduled. Here, we introduce a new closure type, STM_AWOKEN, which marks a TSO which has been sent a wake-up message, allowing us to send only one wakeup.
*	Fix C macro bug that was causing some stack checks to erroneously succeed	Simon Marlow	2012-10-31	1	-8/+8
\|
*	Draw STG F and D registers from the same pool of available SSE registers on ↵	Geoffrey Mainland	2012-10-30	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \|	x86-64. On x86-64 F and D registers are both drawn from SSE registers, so there is no reason not to draw them from the same pool of available SSE registers. This means that whereas previously a function could only receive two Double arguments in registers even if it did not have any Float arguments, now it can receive up to 6 arguments that are any mix of Float and Double in registers. This patch breaks the LLVM back end. The next patch will fix this breakage.
*	Properly mark C-- calls to _assertFail as "never returns".	Geoffrey Mainland	2012-10-30	1	-1/+1
\|
*	Use canned heap checks to save a few bytes of code	Simon Marlow	2012-10-23	1	-0/+3
\|
*	profiling fixes	Simon Marlow	2012-10-19	1	-8/+6
\|
*	profiling fixes	Simon Marlow	2012-10-09	1	-5/+7
\|
*	Produce new-style Cmm from the Cmm parser	Simon Marlow	2012-10-08	1	-77/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main change here is that the Cmm parser now allows high-level cmm code with argument-passing and function calls. For example: foo ( gcptr a, bits32 b ) { if (b > 0) { // we can make tail calls passing arguments: jump stg_ap_0_fast(a); } return (x,y); } More details on the new cmm syntax are in Note [Syntax of .cmm files] in CmmParse.y. The old syntax is still more-or-less supported for those occasional code fragments that really need to explicitly manipulate the stack. However there are a couple of differences: it is now obligatory to give a list of live GlobalRegs on every jump, e.g. jump %ENTRY_CODE(Sp(0)) [R1]; Again, more details in Note [Syntax of .cmm files]. I have rewritten most of the .cmm files in the RTS into the new syntax, except for AutoApply.cmm which is generated by the genapply program: this file could be generated in the new syntax instead and would probably be better off for it, but I ran out of enthusiasm. Some other changes in this batch: - The PrimOp calling convention is gone, primops now use the ordinary NativeNodeCall convention. This means that primops and "foreign import prim" code must be written in high-level cmm, but they can now take more than 10 arguments. - CmmSink now does constant-folding (should fix #7219) - .cmm files now go through the cmmPipeline, and as a result we generate better code in many cases. All the object files generated for the RTS .cmm files are now smaller. Performance should be better too, but I haven't measured it yet. - RET_DYN frames are removed from the RTS, lots of code goes away - we now have some more canned GC points to cover unboxed-tuples with 2-4 pointers, which will reduce code size a little.