delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Move `/includes` to `/rts/include`, sort per package better	John Ericson	2021-08-09	1	-368/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
*	Make Block.h compile with c++ compilers	Matthew Pickering	2020-01-27	1	-4/+9
\|
*	NonMoving: Move next_free_snap to block descriptorwip/gc/segment-header-to-bdescr	Ben Gamari	2019-10-22	1	-0/+1
\|
*	NonMoving: Move block size to block descriptor	Ben Gamari	2019-10-22	1	-11/+16
\|
*	rts: Non-concurrent mark and sweep	Ömer Sinan Ağacan	2019-10-20	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the core heap structure and a serial mark/sweep collector which can be used to manage the oldest-generation heap. This is the first step towards a concurrent mark-and-sweep collector aimed at low-latency applications. The full design of the collector implemented here is described in detail in a technical note B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell Compiler" (2018) The basic heap structure used in this design is heavily inspired by K. Ueno & A. Ohori. "A fully concurrent garbage collector for functional programs on multicore processors." /ACM SIGPLAN Notices/ Vol. 51. No. 9 (presented by ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). The mark queue is a fairly straightforward chunked-array structure. The representation is a bit more verbose than a typical mark queue to accomodate a combination of two features: * a mark FIFO, which improves the locality of marking, reducing one of the major overheads seen in mark/sweep allocators (see [1] for details) * the selector optimization and indirection shortcutting, which requires that we track where we found each reference to an object in case we need to update the reference at a later point (e.g. when we find that it is an indirection). See Note [Origin references in the nonmoving collector] (in `NonMovingMark.h`) for details. Beyond this the mark/sweep is fairly run-of-the-mill. [1] R. Garner, S.M. Blackburn, D. Frampton. "Effective Prefetch for Mark-Sweep Garbage Collection." ISMM 2007. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	rts/BlockAlloc: Allow aligned allocation requestswip/gc/aligned-block-allocation	Ömer Sinan Ağacan	2019-10-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	This implements support for block group allocations which are aligned to an integral number of blocks. This will be used by the nonmoving garbage collector, which uses the block allocator to allocate the segments which back its heap. These segments are a fixed number of blocks in size, with each segment being aligned to the segment size boundary. This allows us to easily find the segment metadata stored at the beginning of the segment.
*	rts: remove unused round_up_to_mblocks function	Ömer Sinan Ağacan	2018-05-10	1	-9/+0
\|
*	Remove unused bdescr flag BF_FREE	Ömer Sinan Ağacan	2018-04-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Reviewers: bgamari, simonmar, erikd Reviewed By: bgamari, simonmar Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4539
*	Prefer #if defined to #ifdef	Ben Gamari	2017-04-28	1	-6/+6
\| \| \| \|	Our new CPP linter enforces this.
*	cpp: Use #pragma once instead of #ifndef guards	Ben Gamari	2017-04-23	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
*	Compact Regions	Giovanni Campagna	2016-07-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This brings in initial support for compact regions, as described in the ICFP 2015 paper "Efficient Communication and Collection with Compact Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni Campagna. Some things may change before the 8.2 release, but I (Simon M.) wanted to get the main patch committed so that we can iterate. What documentation there is is in the Data.Compact module in the new compact package. We'll need to extend and polish the documentation before the release. Test Plan: validate (new test cases included) Reviewers: ezyang, simonmar, hvr, bgamari, austin Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1264 GHC Trac Issues: #11493
*	NUMA support	Simon Marlow	2016-06-10	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The aim here is to reduce the number of remote memory accesses on systems with a NUMA memory architecture, typically multi-socket servers. Linux provides a NUMA API for doing two things: * Allocating memory local to a particular node * Binding a thread to a particular node When given the +RTS --numa flag, the runtime will * Determine the number of NUMA nodes (N) by querying the OS * Assign capabilities to nodes, so cap C is on node C%N * Bind worker threads on a capability to the correct node * Keep a separate free lists in the block layer for each node * Allocate the nursery for a capability from node-local memory * Allocate blocks in the GC from node-local memory For example, using nofib/parallel/queens on a 24-core 2-socket machine: ``` $ ./Main 15 +RTS -N24 -s -A64m Total time 173.960s ( 7.467s elapsed) $ ./Main 15 +RTS -N24 -s -A64m --numa Total time 150.836s ( 6.423s elapsed) ``` The biggest win here is expected to be allocating from node-local memory, so that means programs using a large -A value (as here). According to perf, on this program the number of remote memory accesses were reduced by more than 50% by using `--numa`. Test Plan: * validate * There's a new flag --debug-numa=<n> that pretends to do NUMA without actually making the OS calls, which is useful for testing the code on non-NUMA systems. * TODO: I need to add some unit tests Reviewers: erikd, austin, rwbarton, ezyang, bgamari, hvr, niteria Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2199
*	rts: Replace `nat` with `uint32_t`	Erik de Castro Lopo	2016-05-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The `nat` type was an alias for `unsigned int` with a comment saying it was at least 32 bits. We keep the typedef in case client code is using it but mark it as deprecated. Test Plan: Validated on Linux, OS X and Windows Reviewers: simonmar, austin, thomie, hvr, bgamari, hsyl20 Differential Revision: https://phabricator.haskell.org/D2166
*	Allocate blocks in the GC in batches	Simon Marlow	2016-04-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoids contention for the block allocator lock in the GC; this can be seen in the gc_alloc_block_sync counter emitted by +RTS -s. I experimented with this a while ago, and there was already commented-out code for it in GCUtils.c, but I've now improved it so that it doesn't result in significantly worse memory usage. * The old method of putting spare blocks on ws->part_list was wasteful, the spare blocks are now shared between all generations and retained between GCs. * repeated allocGroup() results in fragmentation, so I switched to using allocLargeChunk() instead which is fragmentation-friendly; we already use it for the same reason in nursery allocation.
*	[ci skip] includes: detabify/dewhitespace rts/storage/Block.h	Austin Seipp	2014-08-20	1	-4/+4
\| \| \| \|	Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Fix incorrect blocksize calculation on Win64	Kyrill Briantsev	2014-03-13	1	-2/+12
\| \| \| \| \| \|	Fixes #8839 Signed-off-by: Austin Seipp <austin@well-typed.com>
*	Make `#include "Rts.h"` C++-compatible again (re #8676)	Herbert Valerio Riedel	2014-01-19	1	-6/+6
\| \| \| \|	Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
*	Make integer overflow less likely to happen (#7762)	Simon Marlow	2013-10-25	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The particular problematic code in #7762 was this: nat newSize = size - n; char freeAddr = MBLOCK_ROUND_DOWN(bd->start); freeAddr += newSize MBLOCK_SIZE; ^^^^^^^^^^^^^^^^^^^^^^ OVERFLOW!!! For good measure, I'm going to fix the bug twice. This patch fixes the class of bugs of this kind, by making sure that any expressions involving BLOCK_SIZE or MBLOCK_SIZE are promoted to unsigned long. In a separate patch, I'll fix a bunch of individual instances (including the one above).
*	Don't include a (void *) cast in BLOCK_ROUND_UP	Ian Lynagh	2012-11-13	1	-1/+1
\| \| \| \| \| \|	All uses of it cast the result anyway. However, DeriveConstants needs it to not include the cast, as (void *) casts can't be used in constant expressions.
*	Lots of nat -> StgWord changes	Simon Marlow	2012-09-07	1	-2/+2
\|
*	A small GC optimisation	Simon Marlow	2011-02-02	1	-12/+31
\| \| \| \| \| \|	Store the number of the destination generation in the Bdescr struct, so that in evacuate() we don't have to deref gen to get it. This is another improvement ported over from my GC branch.
*	warning fix: don't redefine BLOCKS_PER_MBLOCK	Simon Marlow	2010-12-10	1	-0/+2
\|
*	Don't check for swept blocks in -DS.	Marco Túlio Gontijo e Silva	2010-07-18	1	-0/+2
\| \| \| \| \| \|	The checkHeap function assumed the allocated part of the block contained only alive objects and slops. This was not true for blocks that are collected using mark sweep. The code in this patch skip the test for this kind of blocks.
*	Change the representation of the MVar blocked queue	Simon Marlow	2010-04-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The list of threads blocked on an MVar is now represented as a list of separately allocated objects rather than being linked through the TSOs themselves. This lets us remove a TSO from the list in O(1) time rather than O(n) time, by marking the list object. Removing this linear component fixes some pathalogical performance cases where many threads were blocked on an MVar and became unreachable simultaneously (nofib/smp/threads007), or when sending an asynchronous exception to a TSO in a long list of thread blocked on an MVar. MVar performance has actually improved by a few percent as a result of this change, slightly to my surprise. This is the final cleanup in the sequence, which let me remove the old way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent). It is now the case that only the Capability that owns a TSO may modify its state (well, almost), and this simplifies various things. More of the RTS is based on message-passing between Capabilities now.
*	GC refactoring, remove "steps"	Simon Marlow	2009-12-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GC had a two-level structure, G generations each of T steps. Steps are for aging within a generation, mostly to avoid premature promotion. Measurements show that more than 2 steps is almost never worthwhile, and 1 step is usually worse than 2. In theory fractional steps are possible, so the ideal number of steps is somewhere between 1 and 3. GHC's default has always been 2. We can implement 2 steps quite straightforwardly by having each block point to the generation to which objects in that block should be promoted, so blocks in the nursery point to generation 0, and blocks in gen 0 point to gen 1, and so on. This commit removes the explicit step structures, merging generations with steps, thus simplifying a lot of code. Performance is unaffected. The tunable number of steps is now gone, although it may be replaced in the future by a way to tune the aging in generation 0.
*	Store a destination step in the block descriptor	Simon Marlow	2009-11-29	1	-14/+18
\| \| \| \| \| \| \|	At the moment, this just saves a memory reference in the GC inner loop (worth a percent or two of GC time). Later, it will hopefully let me experiment with partial steps, and simplifying the generation/step infrastructure.
*	RTS tidyup sweep, first phase	Simon Marlow	2009-08-02	1	-0/+271
	The first phase of this tidyup is focussed on the header files, and in particular making sure we are exposinng publicly exactly what we need to, and no more. - Rts.h now includes everything that the RTS exposes publicly, rather than a random subset of it. - Most of the public header files have moved into subdirectories, and many of them have been renamed. But clients should not need to include any of the other headers directly, just #include the main public headers: Rts.h, HsFFI.h, RtsAPI.h. - All the headers needed for via-C compilation have moved into the stg subdirectory, which is self-contained. Most of the headers for the rest of the RTS APIs have moved into the rts subdirectory. - I left MachDeps.h where it is, because it is so widely used in Haskell code. - I left a deprecated stub for RtsFlags.h in place. The flag structures are now exposed by Rts.h. - Various internal APIs are no longer exposed by public header files. - Various bits of dead code and declarations have been removed - More gcc warnings are turned on, and the RTS code is more warning-clean. - More source files #include "PosixSource.h", and hence only use standard POSIX (1003.1c-1995) interfaces. There is a lot more tidying up still to do, this is just the first pass. I also intend to standardise the names for external RTS APIs (e.g use the rts_ prefix consistently), and declare the internal APIs as hidden for shared libraries.