0

Suppose I have some global:

std::atomic_int next_free_block;

and a number of threads each with access to a

std::atomic_int child_offset;

that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:

if (child_offset != 0) child_offset = next_free_block++;

Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.

I have also considered the following:

int expected = child_offset;
do {
    if (expected == 0) break;
    int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);

But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.

I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.

The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:

struct tree_page {
    atomic<uint32_t> allocated;
    uint32_t child_offset[8];
    uint32_t nodes[1015];
};

The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.

The code I am trying to write looks like this:

int expected = allocated.load(relaxed), updated;
do {
    updated = expected + num_children;
    if (updated > NODES_PER_PAGE) {
        expected = -1; break;
    }
} while (!allocated.compare_exchange_weak(&expected, updated));

if (expected != -1) {
    // successfully allocated in the same page
} else {
    for (int i = 0; i < 8; ++i) {
        // this is the operation I would like to be atomic
        if (child_offset[i] == 0) 
            child_offset[i] = next_free_block++;
        
        int offset = try_allocating_at_page(pages[child_offset[i]]);
        if (offset != -1) {
            // successfully allocated at child_offset i
            // ...

            break;
        }
    }
}
8
  • int val = next_free_block++; then set_if_greater(child_offset, val); where set_if_greater is a CAS loop like your, might work depending on the use case of child_offset. Commented Jan 12, 2023 at 10:02
  • So there are multiple different child_offset variables, so you can't just increment it? And you can't keep it in a 64-bit struct with next_free_block to let you update them both together? There's no general way to atomically update 2 disjoint locations in a lock-free way, without transactional memory or DCAS (en.wikipedia.org/wiki/Double_compare-and-swap - supported only on a very few machines, like 68020 through 68040) Commented Jan 12, 2023 at 10:04
  • But what if another thread sets child_offset after val is incremented but before the CAS loop succeeds? Then next_free_block has been incremented, but the allocated value has not been assigned to anything Commented Jan 12, 2023 at 10:04
  • Something which will work is combine those 2 values into a single std::atomic<uint64_t> and do the math yourself (along with helper functions to extract each part for reading) Commented Jan 12, 2023 at 10:04
  • 1
    There's one global next_free_block and many different child_offsets which are stored in different locations Commented Jan 12, 2023 at 10:05

1 Answer 1

0

As far as I understood from you description you array of child_offset is filled with 0 initially and then filled with some concrete values concurrently by different threads.

In this case you can atomically "tag" value first and if you are successful assign valid value. Something like this:

    constexpr int INVALID_VALUE = -1;

    for (int i = 0; i < 8; ++i) {
        int expected = 0;
        // this is the operation I would like to be atomic
        if (child_offset[i].compare_exchange_weak(expected, INVALID_VALUE)) {
            child_offset[i] = next_free_block++;
        }
        // Not sure if this is needed in your environment, but just in case
        if (child_offset[i] == INVALID_VALUE) continue;
        ...
    }

This doesn't guarantee that all values in child_offset array will be in ascending order. But if you need that why not fill it without multithreading involved?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.