So I've came across Jeff Preshing's wonderful blog posts on what's Acquire/Release and how they may be achieved with some CPU barriers.
I've also read that SeqCst is about some total order that's guaranteed to be consistent with not coherence-after relation - though at times it might contradict with happens-before relation established by plain Acquire/Release operations due to historical reasons.
My question is, how does the old GCC built-ins map into the memory model proposed by C++11 (and later revisions)? In particular, how to map __sync_synchronize() into C++11 or later modern C/C++?
In the GCC manual this call is simply described as a full memory barrier, which I suppose is the combination of all four major kind of barrier i.e. LoadLoad/LoadStore/StoreLoad/StoreStore barriers all at once. But is sync_synchronize equivalent to std::atomic_thread_fence(memory_order_seq_cst)? Or maybe, formally speaking, one of them is stronger than the other (which I suppose is the case here: in general a SeqCst fence should be stronger, since it requires the toolchain/platform to improvise a global ordering somehow, no?), and it just happens that most of the CPUs out there provides only instructions that satisfies both (full memory barrier by __sync_synchronize, total sequential ordering by std::atomic_thread_fence(memory_order_seq_cst)) at once, for example x86 mfence and PowerPC hwsync?
Either __sync_synchronize and std::atomic_thread_fence(memory_order_seq_cst) are formally equal or they are effectively equal (i.e. formally speaking they are different but no commercialized CPU bother to differentiate between the two), technically speaking a memory_order_relaxed load on the same atomic still may not be relied upon to synchronize-with/create happens-before relation with it, no?
I.e. technically speaking all of these assertions are allowed to fail, right?
// Experiment 1, using C11 `atomic_thread_fence`: assertion is allowed to fail, right?
// global
static atomic_bool lock = false;
static atomic_bool critical_section = false;
// thread 1
atomic_store_explicit(&critical_section, true, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
atomic_store_explicit(&lock, true, memory_order_relaxed);
// thread 2
if (atomic_load_explicit(&lock, memory_order_relaxed)) {
// We should really `memory_order_acquire` the `lock`
// or `atomic_thread_fence(memory_order_acquire)` here,
// or this assertion may fail, no?
assert(atomic_load_explicit(&critical_section, memory_order_relaxed));
}
// Experiment 2, using `SeqCst` directly on the atomic store
// global
static atomic_bool lock = false;
static atomic_bool critical_section = false;
// thread 1
atomic_store_explicit(&critical_section, true, memory_order_relaxed);
atomic_store_explicit(&lock, true, memory_order_seq_cst);
// thread 2
if (atomic_load_explicit(&lock, memory_order_relaxed)) {
// Again we should really `memory_order_acquire` the `lock`
// or `atomic_thread_fence(memory_order_acquire)` here,
// or this assertion may fail, no?
assert(atomic_load_explicit(&critical_section, memory_order_relaxed));
}
// Experiment 3, using GCC built-in: assertion is allowed to fail, right?
// global
static atomic_bool lock = false;
static atomic_bool critical_section = false;
// thread 1
atomic_store_explicit(&critical_section, true, memory_order_relaxed);
__sync_synchronize();
atomic_store_explicit(&lock, true, memory_order_relaxed);
// thread 2
if (atomic_load_explicit(&lock, memory_order_relaxed)) {
// we should somehow put a `LoadLoad` memory barrier here,
// or the assert might fail, no?
assert(atomic_load_explicit(&critical_section, memory_order_relaxed));
}
I've tried these snippets on my RPi 5 but I don't see assertions fails. Yes this doesn't formally prove anything but it also doesn't shed light on differentiating between __sync_synchronize and std::atomic_thread_fence(memory_order_seq_cst).