Consider this outline of a simple multi-threaded application. It has one writer thread, and ten reader threads.
#include <atomic>
#include <thread>
const int Num_readers{10};
std::atomic<bool> write_lock;
std::atomic<bool> read_lock[Num_readers];
void reader_thread(int reader_number)
{
auto &rl{read_lock[reader_number]};
for (;;)
{
for (;;)
{
rl.store(true, std::memory_order_seq_cst); // #1
if (not write_lock.load(std::memory_order_seq_cst)) // #2
break;
// Avoid writer starvation.
rl.store(false, std::memory_order_seq_cst); // #3
while (write_lock.load(std::memory_order_relaxed)) // #4
std::this_thread::yield();
}
// Read stuff from writer.
}
}
void writer_thread()
{
for (;;)
{
write_lock.store(true, std::memory_order_seq_cst); // #5
bool have_lock;
do
{
have_lock = true;
for (const auto &rl : read_lock)
if (rl.load(std::memory_order_seq_cst)) // #6
{
have_lock = false;
std::this_thread::yield();
break;
}
}
while (not have_lock);
// Write stuff for readers.
write_lock.store(false, std::memory_order_seq_cst); // #7
std::this_thread::yield();
}
}
(Assume the threads are static.) The purpose of the atomic variables is mutual exclusion of writing and reading. Could the memory order for any of the atomic loads or stores be made less strict?
writer_threadreally supposed to inspect only two of the tenread_lockvalues?std::atomic<bool>toint, which looks like a bug.for (int i : read_lock) { ... }was supposed to be a loop over the array, checking if there were any readers (and spin-waiting until there aren't any), because that would I think be a readers/writers version of Peterson's. Not using their bool values as 0/1 indexes back into the same array like the code is currently doing!for (const auto &rl_entry : read_lock) { if (rl_entry) { ...} }or similar.seq_cstfence before and/or after looping over all theread_lock[]entries, instead of making each loadseq_cst. That would be a lot faster on PowerPC (where an SC load has to include a full barrier), and maybe on ARMv7 for the same reason, but slower on x86 where an SC load is just a plainmovinstruction, but an SC fence is slow.