summaryrefslogtreecommitdiff
path: root/includes/stg/SMP.h
diff options
context:
space:
mode:
authorBen Gamari <ben@smart-cactus.org>2019-02-10 12:49:51 -0500
committerBen Gamari <ben@smart-cactus.org>2019-02-10 12:49:51 -0500
commitdf9aa8651b054bcf869fd7d7d5dfe76c5c5b031f (patch)
tree0af374a675b3d77659e4128c3dc843533b4ccedd /includes/stg/SMP.h
parent5585c55f8db4b6a0cc36d924dbbe0bd285aac8c0 (diff)
downloadhaskell-wip/T15449.tar.gz
rts: Ensure thunk updates are safe on non-TSO platformswip/T15449
Previously we wouldn't bother to initialize the `indirectee` field of a thunk during construction. However, on architectures with weak memory ordering this can result in unsoundness with an expensive dual write barrier in `updateWithIndirection`. To see how this happens, consider a thunk X and two threads. Say thread 1 evaluates X. When thread 1 finishes evaluation it will call `updateWithIndirection` to replace X with an indirection to the result, Y. To first order `updateWithIndirection` does the following, void updateWithIndirection (Capability *cap, StgClosure *p1, StgClosure *p2) { write_barrier(); ((StgInd *)p1)->indirectee = p2; SET_INFO(p1, &stg_BLACKHOLE_info); } The write barrier ensures that the writes constructing the result Y are made visible to other cores before it is introduced as the indirectee. We then set the `indirectee` and then the info table pointer. However, we don't impose any ordering relationship on these two writes. This means on a weak memory model machine we could observe an indirection such that `p->info == stg_BLACKHOLE_info` yet without a valid value in `indirectee`. One solution to this would be to add another `write_barrier` between these two writes. However, write barriers are expensive. Instead of adding more write barriers we instead take care to initialize the `indirectee` field with a known value (a non-enterable closure, `stg_NO_INDIRECTEE_closure`) on architecture that don't have total store ordering. The indirection entry code can then check for this value and loop as necessary. This incurs two costs: * an additional write during thunk allocation. However, given that we have to touch the cache line anyways this should have negligible performance impact since the write goes straight to the store buffer. * an additional branch in the indirection closures' entry code. However, indirections are eventually short-cutting out of existence anyways, so we should be able to avoid this cost much of the time.
Diffstat (limited to 'includes/stg/SMP.h')
-rw-r--r--includes/stg/SMP.h7
1 files changed, 7 insertions, 0 deletions
diff --git a/includes/stg/SMP.h b/includes/stg/SMP.h
index 4020aef0d9..9fc5389e0e 100644
--- a/includes/stg/SMP.h
+++ b/includes/stg/SMP.h
@@ -18,6 +18,13 @@ void arm_atomic_spin_lock(void);
void arm_atomic_spin_unlock(void);
#endif
+/* Does the platform maintain ordering of stores by a single core? */
+#if !defined(THREADED_RTS) || defined(x86_64_HOST_ARCH) || defined(i386_HOST_ARCH)
+#define ARCH_TOTAL_STORE_ORDER 1
+#else
+#define ARCH_TOTAL_STORE_ORDER 0
+#endif
+
#if defined(THREADED_RTS)
/* ----------------------------------------------------------------------------