2

I have a bool array "a flag array stored as char*" I want to efficiently sort it using radix sort, what I understand is that radix sort complexity is O(#bits N) which will be O(8N) for char* and O(N) for a bool* (if it exists)

I've tried thrust::stable_sort_by_key, on my device sorting 2^21 char pointer takes 1.7 ms another test was by using a compare function for the first bit only:

struct OBCmp {
    __host__ __device__
    bool operator()(const char& o1, const char& o2) {
        return (o1 & 1) < (o2 & 1);
    }
};

the result time using thrust::stable_sort_by_key(d_vec.begin(), d_vec.end(), d_vec_values.begin(), OBCmp ()); was 9 ms

but I'm almost sure that it has used another algorithm "not radix sort", and my guess is that if I can make radix sort to use only the first bit, the time should be around 0.3 ms

any hint how to do this? "even manually without thrust"

9
  • 5
    Does sorting a bit-array even make sense? The entire data set can be characterized by its size and the number of trues. If you really need it sorted, I would go with a variation of quick sort. It will only need one pass and, therefore, is O(n). But you should implement it yourself because your requirements are quite special. General sorting algorithms won't get max performance. Commented May 31, 2014 at 16:59
  • 6
    Why don't you just count the number of 1 (or 0) ? Commented May 31, 2014 at 16:59
  • sorting bit-array to avoid branching within warps @matovitch explain more Commented May 31, 2014 at 17:15
  • 5
    How about if somebody (matovitch) provides an answer. The sensible thing to do is to count the ones or zeroes. Thrust counting operations is one possibility Commented May 31, 2014 at 17:58
  • 3
    thrust::partition and thrust::sort both have to move data. thrust::count and thrust::fill don't do any data movement. Commented May 31, 2014 at 18:34

2 Answers 2

3

I suggest counting ones (or zeroes) and then creating the solution vector based on that. Here's a worked example using thrust:

$ cat t427.cu
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <thrust/count.h>
#include <stdio.h>
#include <stdlib.h>

#define DSIZE (1<<21)
int main(){
  cudaEvent_t start, stop;
  float et;
  cudaEventCreate(&start); cudaEventCreate(&stop);
  thrust::host_vector<char> h_data(DSIZE);
  for (int i = 0; i < DSIZE; i++) h_data[i] = rand()%2;
  thrust::device_vector<char> d_data = h_data;
  cudaEventRecord(start);
  thrust::sort(d_data.begin(), d_data.end());
  cudaEventRecord(stop);
  cudaEventSynchronize(stop);
  cudaEventElapsedTime(&et, start, stop);
  printf("Sort time: %fms\n", et);
  thrust::device_vector<char> d2_data = h_data;
  thrust::device_vector<char> d3_data(DSIZE);
  cudaEventRecord(start);
  int zeroes = thrust::count(d2_data.begin(), d2_data.end(), 0);
  thrust::fill(d3_data.begin(), d3_data.begin()+zeroes, 0);
  thrust::fill(d3_data.begin() + zeroes, d3_data.end(), 1);
  cudaEventRecord(stop);
  cudaEventSynchronize(stop);
  cudaEventElapsedTime(&et, start, stop);
  printf("count/fill time: %fms\n", et);
  if (thrust::equal(d_data.begin(), d_data.end(), d3_data.begin())) printf("Success!\n");
  else printf("Failure\n");
  return 0;
}

$ nvcc -O3 -arch=sm_20 -o t427 t427.cu
$ ./t427
Sort time: 1.450560ms
count/fill time: 0.302656ms
Success!
$
Sign up to request clarification or add additional context in comments.

1 Comment

Also note that I could just as easily have done the thrust::fill operations on the d2_data vector.
3

In plain old ugly C ?

#include <stdlib.h>
#include <stdio.h>

#define SIZE 100

void sort(char* my_array, int size)
{
    int counter = 0;

    for (int i = 0; i < size; i++, counter += (my_array[i] == 1));

    for (int i = 0; i < counter; i++, my_array[i] = 1);

    for (int i = counter; i < size; i++, my_array[i] = 0);
}

int main()
{
    char* my_array = malloc(SIZE * sizeof(char));

    for (int i = 0; i < SIZE; i++)
    {
        my_array[i] = (((i + 43) * (i + 17)) % 677) & 1; //Thought less than 0.5s too garantee randomness of chosen numbers
    }

    sort(my_array, SIZE);

    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.