Suppose I have some global:
std::atomic_int next_free_block;
and a number of threads each with access to a
std::atomic_int child_offset;
that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:
if (child_offset != 0) child_offset = next_free_block++;
Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.
I have also considered the following:
int expected = child_offset;
do {
if (expected == 0) break;
int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);
But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.
I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.
The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:
struct tree_page {
atomic<uint32_t> allocated;
uint32_t child_offset[8];
uint32_t nodes[1015];
};
The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.
The code I am trying to write looks like this:
int expected = allocated.load(relaxed), updated;
do {
updated = expected + num_children;
if (updated > NODES_PER_PAGE) {
expected = -1; break;
}
} while (!allocated.compare_exchange_weak(&expected, updated));
if (expected != -1) {
// successfully allocated in the same page
} else {
for (int i = 0; i < 8; ++i) {
// this is the operation I would like to be atomic
if (child_offset[i] == 0)
child_offset[i] = next_free_block++;
int offset = try_allocating_at_page(pages[child_offset[i]]);
if (offset != -1) {
// successfully allocated at child_offset i
// ...
break;
}
}
}
int val = next_free_block++;thenset_if_greater(child_offset, val);whereset_if_greateris a CAS loop like your, might work depending on the use case ofchild_offset.child_offsetvariables, so you can't just increment it? And you can't keep it in a 64-bit struct withnext_free_blockto let you update them both together? There's no general way to atomically update 2 disjoint locations in a lock-free way, without transactional memory or DCAS (en.wikipedia.org/wiki/Double_compare-and-swap - supported only on a very few machines, like 68020 through 68040)child_offsetafter val is incremented but before the CAS loop succeeds? Thennext_free_blockhas been incremented, but the allocated value has not been assigned to anythingstd::atomic<uint64_t>and do the math yourself (along with helper functions to extract each part for reading)next_free_blockand many differentchild_offsets which are stored in different locations