1

I can't afford to hold shared_lock due to its weight. Instead I implemented what I think is shared_lock but with atomic value. Will this code work, or have I missed something?

update: I added RAII classes around atomic mutex.

namespace atm {
static const unsigned int g_unlocked = 0;
static const unsigned int g_exclusive = std::numeric_limits<unsigned int>::max();
using mutex_type = std::atomic<unsigned int>;

class exclusive_lock {
  public:
    exclusive_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = g_unlocked;
      while (!_mutex.compare_exchange_weak(expected, g_exclusive)) {
        _mm_pause();
        expected = g_unlocked;
      }
    }
    ~exclusive_lock() { _mutex.store(g_unlocked, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};

class shared_lock {
  public:
    shared_lock(mutex_type& mutex) : _mutex(mutex) {
      unsigned int expected = _mutex;
      while (expected == g_exclusive || !_mutex.compare_exchange_weak(expected, expected + 1)) {
        _mm_pause();
        expected = _mutex;
      }
    }
    ~shared_lock() {  _mutex.fetch_sub(1, std::memory_order_release); }
  private:
    mutex_type& _mutex;
};
} // namespace atm
0

1 Answer 1

1

For correctness I think this looks reasonable, I don't see a problem. I might have missed something, but a seq_cst CAS is more than sufficient for acquiring a lock. It looks like you're avoiding integer wrap-around by using the max value as something special.

The mutex=0 unlock only needs to be release, not seq_cst. (Same for the -=1 shared unlock, but that won't make it more efficient on x86, only on weakly-ordered ISAs). Also, compare_exchange_weak would be totally fine; you're retrying in a loop anyway so spurious failure is not different from a failed compare.

If you're on x86, you'd normally want _mm_pause() inside your spin loop, and possibly some kind of backoff to reduce contention if multiple threads are all trying to acquire the lock at once.

And usually you want to spin read-only until you see the lock available, not keep hammering with atomic RMWs. (See Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?).

Also, short is a strange choice; if any size is going to perform worse than int, it's often short. But probably it's fine, and ok I guess if that helps it pack into the same cache line as the data you're modifying. (Although that cache line will be the victim of false sharing contention from other threads hammering on it trying to take the lock.)

Sign up to request clarification or add additional context in comments.

2 Comments

You gave a detailed answer, thanks. Unfortunately, I don't know much about atomics. Could you please explain or show with an example what you meant by release over seq_cst. Does it have something to do with 'store/fetch/load'? And also the part with 'read-only until you see the lock available' isn't clear to me, just because I'd have to spend a couple of nights researching assembler and terms from the provided link :)
@vamirio-chan: I mean memory_order_release, like mutex.store(0, std::memory_order_release);

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.