summaryrefslogtreecommitdiff
path: root/includes/stg/SMP.h
Commit message (Collapse)AuthorAgeFilesLines
* Move `/includes` to `/rts/include`, sort per package betterJohn Ericson2021-08-091-579/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to make the packages in this repo "reinstallable", we need to associate source code with a specific packages. Having a top level `/includes` dir that mixes concerns (which packages' includes?) gets in the way of this. To start, I have moved everything to `rts/`, which is mostly correct. There are a few things however that really don't belong in the rts (like the generated constants haskell type, `CodeGen.Platform.h`). Those needed to be manually adjusted. Things of note: - No symlinking for sake of windows, so we hard-link at configure time. - `CodeGen.Platform.h` no longer as `.hs` extension (in addition to being moved to `compiler/`) so as not to confuse anyone, since it is next to Haskell files. - Blanket `-Iincludes` is gone in both build systems, include paths now more strictly respect per-package dependencies. - `deriveConstants` has been taught to not require a `--target-os` flag when generating the platform-agnostic Haskell type. Make takes advantage of this, but Hadrian has yet to.
* Implement riscv64 LLVM backendAndreas Schwab2021-03-051-0/+6
| | | | This enables a registerised build for the riscv64 architecture.
* rts: Use weaker cas in WSDequeDouglas Wilson2020-12-191-0/+24
| | | | | | | | The algorithm described in the referenced paper uses this slightly weaker atomic op. This is the first "exotic" cas we're using. I've added a macro in the <ORDERING>_OP style to match existing ones.
* Merge branch 'wip/tsan/wsdeque' into wip/tsan/allBen Gamari2020-11-011-0/+15
|\
| * rts/WSDeque: Rewrite with proper atomicswip/tsan/wsdequeBen Gamari2020-10-241-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | After a few attempts at shoring up the previous implementation, I ended up turning to the literature and now use the proven implementation, > N.M. Lê, A. Pop, A.Cohen, and F.Z. Nardelli. "Correct and Efficient > Work-Stealing for Weak Memory Models". PPoPP'13, February 2013, > ACM 978-1-4503-1922/13/02. Note only is this approach formally proven correct under C11 semantics but it is also proved to be a bit faster in practice.
* | rts: Use relaxed ordering on spinlock counterswip/tsan/storageBen Gamari2020-10-301-0/+2
|/
* SMP.h: Add C11-style atomic operationsBen Gamari2020-10-241-1/+60
|
* Fix duplicated words and typos in comments and user guideJan Hrček2020-06-281-2/+2
|
* Fix typos, via a Levenshtein-style correctorBrian Wignall2020-01-041-1/+1
|
* Merge non-moving garbage collectorBen Gamari2019-10-231-0/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a concurrent mark & sweep garbage collector to manage the old generation. The concurrent nature of this collector typically results in significantly reduced maximum and mean pause times in applications with large working sets. Due to the large and intricate nature of the change I have opted to preserve the fully-buildable history, including merge commits, which is described in the "Branch overview" section below. Collector design ================ The full design of the collector implemented here is described in detail in a technical note > B. Gamari. "A Concurrent Garbage Collector For the Glasgow Haskell > Compiler" (2018) This document can be requested from @bgamari. The basic heap structure used in this design is heavily inspired by > K. Ueno & A. Ohori. "A fully concurrent garbage collector for > functional programs on multicore processors." /ACM SIGPLAN Notices/ > Vol. 51. No. 9 (presented at ICFP 2016) This design is intended to allow both marking and sweeping concurrent to execution of a multi-core mutator. Unlike the Ueno design, which requires no global synchronization pauses, the collector introduced here requires a stop-the-world pause at the beginning and end of the mark phase. To avoid heap fragmentation, the allocator consists of a number of fixed-size /sub-allocators/. Each of these sub-allocators allocators into its own set of /segments/, themselves allocated from the block allocator. Each segment is broken into a set of fixed-size allocation blocks (which back allocations) in addition to a bitmap (used to track the liveness of blocks) and some additional metadata (used also used to track liveness). This heap structure enables collection via mark-and-sweep, which can be performed concurrently via a snapshot-at-the-beginning scheme (although concurrent collection is not implemented in this patch). Implementation structure ======================== The majority of the collector is implemented in a handful of files: * `rts/Nonmoving.c` is the heart of the beast. It implements the entry-point to the nonmoving collector (`nonmoving_collect`), as well as the allocator (`nonmoving_allocate`) and a number of utilities for manipulating the heap. * `rts/NonmovingMark.c` implements the mark queue functionality, update remembered set, and mark loop. * `rts/NonmovingSweep.c` implements the sweep loop. * `rts/NonmovingScav.c` implements the logic necessary to scavenge the nonmoving heap. Branch overview =============== ``` * wip/gc/opt-pause: | A variety of small optimisations to further reduce pause times. | * wip/gc/compact-nfdata: | Introduce support for compact regions into the non-moving |\ collector | \ | \ | | * wip/gc/segment-header-to-bdescr: | | | Another optimization that we are considering, pushing | | | some segment metadata into the segment descriptor for | | | the sake of locality during mark | | | | * | wip/gc/shortcutting: | | | Support for indirection shortcutting and the selector optimization | | | in the non-moving heap. | | | * | | wip/gc/docs: | |/ Work on implementation documentation. | / |/ * wip/gc/everything: | A roll-up of everything below. |\ | \ | |\ | | \ | | * wip/gc/optimize: | | | A variety of optimizations, primarily to the mark loop. | | | Some of these are microoptimizations but a few are quite | | | significant. In particular, the prefetch patches have | | | produced a nontrivial improvement in mark performance. | | | | | * wip/gc/aging: | | | Enable support for aging in major collections. | | | | * | wip/gc/test: | | | Fix up the testsuite to more or less pass. | | | * | | wip/gc/instrumentation: | | | A variety of runtime instrumentation including statistics | | / support, the nonmoving census, and eventlog support. | |/ | / |/ * wip/gc/nonmoving-concurrent: | The concurrent write barriers. | * wip/gc/nonmoving-nonconcurrent: | The nonmoving collector without the write barriers necessary | for concurrent collection. | * wip/gc/preparation: | A merge of the various preparatory patches that aren't directly | implementing the GC. | | * GHC HEAD . . . ```
| * rts: Shrink size of STACK's dirty and marking fieldsBen Gamari2019-10-201-0/+19
| |
* | Implement s390x LLVM backend.Stefan Schulze Frielinghaus2019-10-221-0/+6
|/ | | | | | This patch adds support for the s390x architecture for the LLVM code generator. The patch includes a register mapping of STG registers onto s390x machine registers which enables a registerised build.
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-101-3/+3
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Fix specification of load_load_barrier [skip-ci]Peter Trommler2019-03-201-1/+1
|
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-281-17/+20
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* cpp: Use #pragma once instead of #ifndef guardsBen Gamari2017-04-231-4/+1
| | | | | | | | | | | | | | This both says what we mean and silences a bunch of spurious CPP linting warnings. This pragma is supported by all CPP implementations which we support. Reviewers: austin, erikd, simonmar, hvr Reviewed By: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3482
* Revert "Enable new warning for fragile/incorrect CPP #if usage"Ben Gamari2017-04-051-20/+17
| | | | | | | | This is causing too much platform dependent breakage at the moment. We will need a more rigorous testing strategy before this can be merged again. This reverts commit 7e340c2bbf4a56959bd1e95cdd1cfdb2b7e537c2.
* Enable new warning for fragile/incorrect CPP #if usageErik de Castro Lopo2017-04-051-17/+20
| | | | | | | | | | | | | | | | The C code in the RTS now gets built with `-Wundef` and the Haskell code (stages 1 and 2 only) with `-Wcpp-undef`. We now get warnings whereever `#if` is used on undefined identifiers. Test Plan: Validate on Linux and Windows Reviewers: austin, angerman, simonmar, bgamari, Phyx Reviewed By: bgamari Subscribers: thomie, snowleopard Differential Revision: https://phabricator.haskell.org/D3278
* RTS SMP: Use compiler built-ins on all platforms.Peter Trommler2016-06-041-166/+23
| | | | | | | | | | | | | | | | | | | Use C compiler builtins for atomic SMP primitives. This saves a lot of CPP ifdefs. Add test for atomic xchg: Test if __sync_lock_test_and_set() builtin stores the second argument. The gcc manual says the actual value stored is implementation defined. Test Plan: validate and eyeball generated assembler code Reviewers: kgardas, simonmar, hvr, bgamari, austin, erikd Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2233
* PPC: Implement SMP primitives using gcc built-insPeter Trommler2016-05-161-64/+9
| | | | | | | | | | | | | | | | | | | | | | | The SMP primitives were missing appropriate memory barriers (sync, isync instructions) on all PowerPCs. Use the built-ins _sync_* provided by gcc and clang. This reduces code size significantly. Remove broken mark for concprog001 on powerpc64. The referenced ticket number (11259) was wrong. Test Plan: validate on powerpc and ARM Reviewers: erikd, austin, simonmar, bgamari, hvr Reviewed By: bgamari, hvr Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2225 GHC Trac Issues: #12070
* Avoid local label syntax for assembler on AIXHerbert Valerio Riedel2016-03-241-0/+20
| | | | | | | | | | | | | Unfortunately (for inline `__asm__()` uses), IBM's `as` doesn't seem to support local labels[1] like GNU `as` does so we need to workaround this when on AIX. [1]: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Symbol-Names Turns out this also addresses the long-standing bug #485 Reviewed By: bgamari, trommler Differential Revision: https://phabricator.haskell.org/D2029
* Ensure that we don't produce code for pre-ARMv7 without barriersBen Gamari2016-01-251-14/+3
| | | | | | | | | | | | | | | | | | | | | | We are unable to produce load/store barriers for pre-ARMv7 targets. Phab:D894 added dummy cases to SMP.h for these barriers to prevent the build from failing under the assumption that there are no SMP-capable devices of this vintage. However, #10433 points out that it is more correct to simply set NOSMP for such targets. Tested By: rwbarton Test Plan: Validate Reviewers: erikd, rwbarton, austin Reviewed By: rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1704 GHC Trac Issues: #10433
* Make headers C++ compatible (fixes #10700)Alexey Shmalko2015-07-301-8/+8
| | | | | | | | | | | | | | | | | Some headers used `new` as parameter name, which is reserved word in C++. This patch changes these names to `new_`. Test Plan: validate Reviewers: austin, ezyang, bgamari, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1107 GHC Trac Issues: #10700
* Implement PowerPC 64-bit native code backend for LinuxPeter Trommler2015-07-031-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the PowerPC 32-bit native code generator for "64-bit PowerPC ELF Application Binary Interface Supplement 1.9" by Ian Lance Taylor and "Power Architecture 64-Bit ELF V2 ABI Specification -- OpenPOWER ABI for Linux Supplement" by IBM. The latter ABI is mainly used on POWER7/7+ and POWER8 Linux systems running in little-endian mode. The code generator supports both static and dynamic linking. PowerPC 64-bit code for ELF ABI 1.9 and 2 is mostly position independent anyway, and thus so is all the code emitted by the code generator. In other words, -fPIC does not make a difference. rts/stg/SMP.h support is implemented. Following the spirit of the introductory comment in PPC/CodeGen.hs, the rest of the code is a straightforward extension of the 32-bit implementation. Limitations: * Code is generated only in the medium code model, which is also gcc's default * Local symbols are not accessed directly, which seems to also be the case for 32-bit * LLVM does not work, but this does not work on 32-bit either * Must use the system runtime linker in GHCi, because the GHC linker for "static" object files (rts/Linker.c) for PPC 64-bit is not implemented. The system runtime (dynamic) linker works. * The handling of the system stack (register 1) is not ELF- compliant so stack traces break. Instead of allocating a new stack frame, spill code should use the "official" spill area in the current stack frame and deallocation code should restore the back chain * DWARF support is missing Fixes #9863 Test Plan: validate (on powerpc, too) Reviewers: simonmar, trofi, erikd, austin Reviewed By: trofi Subscribers: bgamari, arnons1, kgardas, thomie Differential Revision: https://phabricator.haskell.org/D629 GHC Trac Issues: #9863
* rts: Fix aarch64 implementation of xchgErik de Castro Lopo2015-06-011-4/+2
| | | | | | | | | | | | | | | | In the previous implementation, the `stlxr` instruction clobbered the value that was supposed to be returned by the the `xchg` function. Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com> Test Plan: build on aarch64 Reviewers: austin, bgamari, rwbarton Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D932
* Add a TODO FIXME w.r.t. D894Austin Seipp2015-05-191-0/+5
| | | | | | | | | | | | As Reid mentioned in a comment on D894, the case fixed by this revision likely isn't really correct, because old ARM binaries could run on newer machines, meaning we need to detect at runtime whether we need a proper barrier. But in the mean time, this actually stops the build from failing - which is better off. So we'll just remember this when we fix it in the future. Signed-off-by: Austin Seipp <austin@well-typed.com>
* includes/stg/SMP.h: implement simple load_/store_load_barrier on armv6 and olderSergei Trofimovich2015-05-181-0/+4
| | | | | | | | | | | | | | | | | | | | Assuming there is no real SMP systems on these CPUs I've added only compiler barrier (otherwise write_barrier and friends need to be fixed as well). Patch also fixes build breakage reported in #10244. Signed-off-by: Sergei Trofimovich <siarheit@google.com> Reviewers: rwbarton, nomeata, austin Reviewed By: nomeata, austin Subscribers: bgamari, thomie Differential Revision: https://phabricator.haskell.org/D894 GHC Trac Issues: #10244
* commentsSimon Marlow2014-12-151-1/+11
|
* Link pre-ARMv6 spinlocks into all RTS variantsJoachim Breitner2014-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For compatibility with ARM machines from pre v6, the RTS provides implementations of certain atomic operations. Previously, these were only included in the threaded RTS. But ghc (the library) contains the code in compiler/cbits/genSym.c, which uses these operations if there is more than one capability. But there is only one libHSghc, so the linker wants to resolve these symbols in every case. By providing these operations in all RTSs, the linker is happy. The only downside is a small amount of dead code in the non-threaded RTS on old ARM machines. Test Plan: It helped here. Reviewers: bgamari, austin Reviewed By: bgamari, austin Subscribers: carter, thomie Differential Revision: https://phabricator.haskell.org/D564 GHC Trac Issues: #8951
* arm64: 64bit iOS and SMP support (#7942)Luke Iannini2014-11-191-1/+36
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* [ci skip] includes: detabify/dewhitespace stg/SMP.hAustin Seipp2014-08-201-21/+21
| | | | Signed-off-by: Austin Seipp <austin@well-typed.com>
* includes/stg/SMP.h: use 'NOSMP' instead of never defined 'WITHSMP' (Trac #8789)Sergei Trofimovich2014-07-021-21/+21
| | | | | | | | | | | | | | | | | | Summary: git history does not contain state where 'WITHSMP' macro was ever defined. It should have always been '!NOSMP'. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: build tested Reviewers: austin, simonmar Reviewed By: austin, simonmar Subscribers: simonmar, relrod, carter Differential Revision: https://phabricator.haskell.org/D33
* Globally replace "hackage.haskell.org" with "ghc.haskell.org"Simon Marlow2013-10-011-1/+1
|
* Fix the definition of cas() on x86 (#8219)Patrick Palka2013-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | *p is both read and written to by the cmpxchg instruction, and therefore should be given the '+' constraint modifier. (In GCC's extended ASM language, '+' means that the operand is both read and written to whereas '=' means that it is only written to.) Otherwise, the compiler is allowed to rewrite something like SpinLock lock; initSpinLock(&lock); /* sets lock = 1 */ ACQUIRE_SPIN_LOCK(&lock); into SpinLock lock; ACQUIRE_SPIN_LOCK(&lock); because according to the asm statement, the previous value of 'lock' is not important.
* Merge branch 'master' into atomicsRyan Newton2013-08-311-3/+6
|\
| * In the non-threaded RTS, make *_barrier functions EXTERN_INLINE, not ↵Ryan Newton2013-08-211-3/+6
| | | | | | | | #define. (fixes #8077)
* | Eliminate atomic_inc_by and instead medofiy atomic_inc.Ryan Newton2013-08-211-27/+8
| |
* | Add PrimOp fetchAddIntArray# plus supporting C function atomic_inc_by.Ryan Newton2013-08-211-4/+28
|/
* ARMv5 compatibility for registerized runtime changes.Stephen Blackheath2011-08-101-26/+12
| | | | | | | When the bootstrap compiler does not include this patch, you must add this line to mk/build.mk, otherwise the ARM architecture cannot be detected due to a -undef option given to the C pre-processor. SRC_HC_OPTS = -pgmP 'gcc -E -traditional'
* RTS: fix xchg/cas fcns to invoke memory barrier on ARMv7 platformKarel Gardas2011-08-101-0/+6
| | | | | | This patch fixes RTS' xchg and cas functions. On ARMv7 it is recommended to add memory barrier after using ldrex/strex for implementing atomic lock or operation.
* implement ARMv7 specific memory barriersKarel Gardas2011-08-101-1/+11
| | | | | | This patch provides implementation of ARMv7 specific memory barriers. It uses dmb sy isn (or shortly dmb) for store/load and load/load barriers and dmb st isn for store/store barrier.
* make StgReturn and cas functions Thumb friendlyKarel Gardas2011-08-101-0/+2
|
* implement ARMv6/7 specific xchg functionKarel Gardas2011-08-101-2/+18
|
* Stephen Blackheath's GHC/ARM registerised portKarel Gardas2011-08-101-0/+43
| | | | | | This is the Stephen Blackheath's GHC/ARM registerised port which is using modified version of LLVM and which provides basic registerised build functionality
* Update some files for new testsuite tests locationDavid Terei2011-07-201-1/+1
|
* Don't expose the cas definition to .hc filesIan Lynagh2011-04-301-0/+2
| | | | | This is more pleasant than having the C generator check whether the function it's calling is cas, and not generate a prototype if so.
* add casMutVar#Simon Marlow2011-04-111-1/+2
|
* messageBlackHole: fix deadlock bug caused by a missing 'volatile'Simon Marlow2010-06-101-0/+8
|