0

Can I assume that Thrust stable_sort_by_key performed on unsigned int has complexity O(n)? If not what should I do to be sure that this complexity will be achieved? (Except of implementing radix sort on my own)

2
  • You can directly use cub::DeviceRadixSort::SortPairs. CUB is used in the backend by Thrust, but I don't see the guarantee you want in the docs, so even if it uses the right thing, that would be an implementation detail that I wouldn't base my code on. Commented Nov 10, 2022 at 12:08
  • That being said, from looking at those implementation details Thrust is able to figure out if it can use a radix sort and will use it when appropriate. Commented Nov 10, 2022 at 12:24

1 Answer 1

2

It depends a bit on your circumstances/view. Just from the docs/API there doesn't seem to be a guarantee for thrust::stable_sort_by_key on unsigned int keys using a radix sort.

On the other hand the necessary algorithm cub::DeviceRadixSort::SortPairs is implemented in the CUB library which is used by Thrust in the backend and there is no good reason for Thrust not to use it as the prerequisites can easily be queried at compile time.

From the code in thrust/system/cuda/detail/sort.h(the "detail" should warn you that this is not part of the public API) one can see that thrust::stable_sort_by_key can launch a cub::DeviceRadixSort::SortPairs under the right circumstances (arithmetic key type and using thrust::less or thrust::greater as comparison operation) at least on the main branch of Thrust at the time of writing (ca. v2.0.0rc2) but probably for a long time already. Else it will fall back to a merge sort.

Directly using cub::DeviceRadixSort::SortPairs could have benefits even if this is enough for you, as this makes it easier to reuse temporary buffers and avoid unnecessary synchronization. Both can be done in Thrust via a

auto exec = thrust::cuda::par_nosync(custom_allocator).on(custom_stream);

execution policy (the most recent CUDA Toolkit 11.8 still comes with old Thrust v1.15 without v1.16 par_nosync). One thing that can not be avoided using Thrust is the in-place nature of the sorting algorithms which is achieved by potentially copying the results back to the input buffers. These copy operations can only be elided using CUB.

Sign up to request clarification or add additional context in comments.

2 Comments

So just to clarify I can sort type T by key unsigned int with complexity O(n) using only default stable_sort_by_key from thrust api? I understand that there is no guarantee that it will use radix sort in the future.
As far as I understand the source, it should call a radix sort. You could check in a profiler, i.e. nSight Systems, to make sure. In a massively parallel environment O-notation can be even more missleading than it is already. The only good reason I could imagine for Thrust to not use the radix sort in the future is that something else is faster in practice on the given hardware (not necessarily better O).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.