shared_lock implemented with std::atomic

Question

I can't afford to hold shared_lock due to its weight. Instead I implemented what I think is shared_lock but with atomic value. Will this code work, or have I missed something?

update: I added RAII classes around atomic mutex.

namespace atm {
static const unsigned int g_unlocked = 0;
static const unsigned int g_exclusive = std::numeric_limits<unsigned int>::max();
using mutex_type = std::atomic<unsigned int>;

class exclusive_lock {
  public:
    exclusive_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = g_unlocked;
      while (!_mutex.compare_exchange_weak(expected, g_exclusive)) {
        _mm_pause();
        expected = g_unlocked;
      }
    }
    ~exclusive_lock() { _mutex.store(g_unlocked, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};

class shared_lock {
  public:
    shared_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = _mutex;
      while (expected == g_exclusive || !_mutex.compare_exchange_weak(expected, expected + 1)) {
        _mm_pause();
        expected = _mutex;
      }
    }
    ~shared_lock() {  _mutex.fetch_sub(1, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};
} // namespace atm

Peter Cordes · Accepted Answer · 2021-01-21 15:11:27Z

1

For correctness I think this looks reasonable, I don't see a problem. I might have missed something, but a seq_cst CAS is more than sufficient for acquiring a lock. It looks like you're avoiding integer wrap-around by using the max value as something special.

The mutex=0 unlock only needs to be release, not seq_cst. (Same for the -=1 shared unlock, but that won't make it more efficient on x86, only on weakly-ordered ISAs). Also, compare_exchange_weak would be totally fine; you're retrying in a loop anyway so spurious failure is not different from a failed compare.

If you're on x86, you'd normally want _mm_pause() inside your spin loop, and possibly some kind of backoff to reduce contention if multiple threads are all trying to acquire the lock at once.

And usually you want to spin read-only until you see the lock available, not keep hammering with atomic RMWs. (See Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?).

Also, short is a strange choice; if any size is going to perform worse than int, it's often short. But probably it's fine, and ok I guess if that helps it pack into the same cache line as the data you're modifying. (Although that cache line will be the victim of false sharing contention from other threads hammering on it trying to take the lock.)

answered Jan 21, 2021 at 15:11

Peter Cordes

377k50 gold badges741 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

vamirio-chan Over a year ago

You gave a detailed answer, thanks. Unfortunately, I don't know much about atomics. Could you please explain or show with an example what you meant by release over seq_cst. Does it have something to do with 'store/fetch/load'? And also the part with 'read-only until you see the lock available' isn't clear to me, just because I'd have to spend a couple of nights researching assembler and terms from the provided link :)

Peter Cordes Over a year ago

@vamirio-chan: I mean memory_order_release, like mutex.store(0, std::memory_order_release);

Collectives™ on Stack Overflow

shared_lock implemented with std::atomic

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related