In C++, I have two threads. Each thread does a store first on one variable, then a load on another variable, but in reversed order:
std::atomic<bool> please_wake_me_up{false};
uint32_t cnt{0};
void thread_1() {
std::atomic_ref atomic_cnt(cnt);
please_wake_me_up.store(true, std::memory_order_seq_cst);
atomic_cnt.load(std::memory_order_seq_cst); // <-- Is this line necessary or can it be omitted?
futex_wait(&cnt, 0); // <-- The performed syscall must read the counter.
// But with which memory ordering?
}
void thread_2() {
std::atomic_ref atomic_cnt(cnt);
atomic_cnt.store(1, std::memory_order_seq_cst);
if (please_wake_me_up.load(std::memory_order_seq_cst)) {
futex_wake(&cnt);
}
}
Full code example: Godbolt.
If all of the four atomic accesses are performed with sequential consistency, it's guaranteed that at least one thread will see the store of the other thread when performing the load. This is what I want to achieve.
As the futex-syscall must perform a load of the variable it performs on internally, I'm wondering if I can omit the (duplicated) load right before the syscall.
- Every syscall should lead to a compiler memory barrier, right?
- Do syscalls in general act like full memory barriers?
- As the
futexsyscall is guaranteed to read the counter, is it safe to omit the marked line? Is there any guarantee the load inside the syscall occurs with sequential consistency? - If the line is necessary, would a
std::atomic_thread_fence(std::memory_order_seq_cst)be better, as I'm not needing the value, just a fence?
If the answer to the question is architecture-specific, I would be interested in x86_64 and arm64.