3

While looking through implementations of the std::atomic<T>::wait, I've found that most of them used a simple hash table for mapping the state for each atomic location.

libcxx

static constexpr size_t __libcpp_contention_table_size = (1 << 8); /* < there's no magic in this number */

struct alignas(64) /*  aim to avoid false sharing */ __libcpp_contention_table_entry {
  __cxx_atomic_contention_t __contention_state;
  __cxx_atomic_contention_t __platform_state;
  inline constexpr __libcpp_contention_table_entry() : __contention_state(0), __platform_state(0) {}
};

static __libcpp_contention_table_entry __libcpp_contention_table[__libcpp_contention_table_size];

libstdc++

static __waitable_state&
_S_state_for(const void* __addr) noexcept
{
  constexpr __UINTPTR_TYPE__ __ct = 16;
  static __waitable_state __w[__ct];
  auto __key = ((__UINTPTR_TYPE__)__addr >> 2) % __ct;
  return __w[__key];
}

Using the hash table itself is pretty reasonable I think, since atomic wait and notify should be the light weighted lock-free synchronization mechanism. But the size of the hash table is what made me to raise a question.

Since the table itself has a fixed size, I though that it might get some hash collisions when there are enough threads waiting on different locations. And if so, some state might get overlapped, which will cause a false notify on waiting threads. And others might not get notified (or unlocked) which I think is a serious issue.

So here's my questions.

  1. How is the size of the table determined? (since comment on libcxx says that there are no magic in that number)
  2. Will there no overlap between multiple atomic variable locations? Or, is it fine to get states overlapped each other?
10
  • 1
    Isn't that the cause of spurious wakeup? Commented Oct 20 at 17:37
  • 1
    @Jarod42 Refering to cppreference atomic wait should have a guarantee of not returning from the function until it is notified. But I was unable to find any retry of blocking (or waiting) in the implementation. Looking at the libcxx implementation of the notify, it calls the __libcpp_platform_wake_by_address which just seems to be futex wake (or unlock). So I guessed that the state must be unique per address. Please tell me if I have misunderstood. Commented Oct 20 at 17:58
  • 1
    I added tags for libstdc++ and libc++ since you're looking at their implementations specifically, but an answer about what MSVC's standard library does could still be relevant. Commented Oct 20 at 18:37
  • 3
    @JeremyFriesner The "no magic number here" comment almost certainly means that the number is just an educated guess and future developers should not be afraid to change it. Commented Oct 20 at 19:21
  • 3
    Related glibc bug lamenting poor hashing: gcc.gnu.org/bugzilla/show_bug.cgi?id=115955 Commented Oct 20 at 19:45

1 Answer 1

1

After reading comments from @dewaffled I've realized that I was missing something. And after reading the implementation again, I found that the wait itself was inside the loop.

libcxx:atomic_sync.h
libcxx:poll_with_backoff.h

template <class _AtomicWaitable, class _Poll>
_LIBCPP_HIDE_FROM_ABI void __atomic_wait_unless(const _AtomicWaitable& __a, memory_order __order, _Poll&& __poll) {
  std::__libcpp_thread_poll_with_backoff(
      /* poll */
      [&]() {
        auto __current_val = __atomic_waitable_traits<__decay_t<_AtomicWaitable> >::__atomic_load(__a, __order);
        return __poll(__current_val);
      },
      /* backoff */ __spinning_backoff_policy());
}
template <class _Poll, class _Backoff>
_LIBCPP_HIDE_FROM_ABI bool __libcpp_thread_poll_with_backoff(_Poll&& __poll, _Backoff&& __backoff, chrono::nanoseconds __max_elapsed) {
  auto const __start = chrono::high_resolution_clock::now();
  for (int __count = 0;;) {
    if (__poll())
      return true; // __poll completion means success

  // code that checks if the time has excceded the max_elapsed ...
}

__atomic_wait calls the __libcpp_thread_poll_with_backoff with polling and backoff policy, which does the spinning work.

And as mentioned by @dewaffled, same thing goes for libstdc++.

libstdc++:atomic_wait.h

template<typename _Tp, typename _Pred, typename _ValFn>
    void
    __atomic_wait_address(const _Tp* __addr, _Pred&& __pred, _ValFn&& __vfn,
              bool __bare_wait = false) noexcept
    {
      __detail::__wait_args __args{ __addr, __bare_wait };
      _Tp __val = __args._M_setup_wait(__addr, __vfn);
      while (!__pred(__val))
    {
      auto __res = __detail::__wait_impl(__addr, __args);
      __val = __args._M_setup_wait(__addr, __vfn, __res);
    }
      // C++26 will return __val
    }

So just looking at the implementation, atomic<T>::wait can spuriously wakeup by the implementation (as @Jarod42 mentioned), but does not return from a function until the value has actually changed.

To answer my question,

  1. It's a educated guess (as @Homer512 mentioned), which can minimize the spurious wakeup with a reasonable space.
  2. Since value itself is compared on each wakeup, sharing state across multiple addresses is fine.
Sign up to request clarification or add additional context in comments.

1 Comment

It would be basically fine if the (address >> 2) % 16 hash function in libstdc++, i.e. bits #2 to #5 of the address, wasn't total garbage for the typical use-case, which often has all objects aligned the same relative to a 64-byte cache line, so they all pick the same bucket. Home512 linked gcc.gnu.org/bugzilla/show_bug.cgi?id=115955 (libstdc++'s 16-entry table size is also probably too small for modern systems with many cores and many threads. libc++'s 256-entry table is larger than most programs need on client systems but not server, assuming thread count is similar to core count.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.