6

In P2300, the "1.4. Asynchronous Windows socket recv" example uses a pattern to mark completion (of setting the cancellation callback) that looks like this:

if (ready.load(std::memory_order_acquire) ||
    ready.exchange(true, std::memory_order_acq_rel))
{ ... }

where ready is an std::atomic<bool>.

To me it looks that we could use just exchange:

if (ready.exchange(true, std::memory_order_acq_rel))
{ ... }

My question is: why also do load as in the first example? It's not clear to me if this is done for correctness or efficiency.

Background info:

The example in P2300 caters for the scenario where there are potentially two threads. First thread calls a C API, taking a C callback, then it needs to do some work emplacing an optional called stopCallback. Typically the first thread will then load the ready variable, see a false, then proceed to set it to true via the exchange, see that it was still false and not execute the if block. The C callback might be called from a second thread that will load the ready and see the true value and execute the if block (the completion of the C API).

As far as I can see the option to do just exchange is sufficient, but both threads will do an exchange. With the option load followed by exchange: typically there will be a load and exchange from one thread and only a load from the other (though it might be that both threads to load and exchange).

5
  • 5
    Likely a performance optimization avoiding expensive exchange() requiring exclusive access by first checking with cheap load(). If load() returns true, exchange() is skipped due to short-circuiting. Commented Jul 28 at 10:35
  • 4
    I also think like @TheAliceBaskerville that this is an optimization. But to me, what they try to avoid, it the memory_order_acq_rel memory synchronization which is more expensive than memory_order_acquire on some platforms (but not on x86 family) Commented Jul 28 at 10:43
  • 5
    That pattern is called test and test-and-set. Commented Jul 28 at 15:52
  • @prapin: Avoiding the atomic-RMW attempt is also a big deal on most platforms, but yeah especially x86 where it's a full barrier. Read-only can leave the cache line in Shared state, so other cores can read it in parallel, instead of requesting Exclusive state. For a spinlock, often you want the first access to be an RMW attempt to optimize for the hopefully-common case of the lock being available, without first a Share request then another cache miss to Read-For-Ownership (RFO). But this is just an if, maybe it's normal that it won't be ready most times you check. Commented Jul 28 at 16:56
  • Related: Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock? / Locks around memory manipulation via inline assembly re: whether the first access should be read-only or not for a lock. (Generally not, but spinning read-only after the first xchg fails is good.) Commented Jul 28 at 17:00

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.