-2

Input and starting arrays:

dv_A = { 5, -3, 2, 6} //4 elements
dv_B = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

Expected output:

dv_B = { 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1 }

For every element in dv_A{}, there are (dv_A.size - 1) elements in dv_B{}. This is because each element of dv_A should have a child element in dv_B for each of the other dv_A elements (i.e. should exclude itself). Therefore, if there are 4 elements in dv_A, there should be 3 elements in dv_B for each of the dv_A elements.

I want to transform each dv_B element to have a value of 1 if its corresponding dv_A element has a value > 0. Correspondence is determined based on the position of the element in dv_B. For example:

The first 3 dv_B values will be transformed by the value in dv_A[0], The second 3 dv_B values will be transformed by the value in dv_A[1], Etc.

Here's my attempt so far

thrust::transform(
    dv_B.begin(),
    dv_B.end(),
    thrust::make_transform_iterator(
        dv_A.begin(),
        _1 % dv_A
    ), 
    dv_B.begin(),
    _2 > 0 //When 2nd argument is greater than 0 then set the position in dv_A to 1.
);
5
  • 3
    What is your question? Commented Sep 25, 2022 at 9:21
  • You can nest counting_iterator, transform_iterator (i / (A_size - 1)) and permutation_iterator to get the view into dv_A that you need to do the transform. Commented Sep 25, 2022 at 14:46
  • @paleonix I think that's what I'm after, but just can't get it to work. Updated my code with my best attempt before nesting the iterators, which I could not get to work either. Commented Sep 25, 2022 at 15:02
  • 1
    c.godbolt.org/z/1beh3e3sP untested, but compiles. Commented Sep 25, 2022 at 16:04
  • @paleonix that works-- turn it into an answer so that I can mark it. Commented Sep 25, 2022 at 16:42

2 Answers 2

2

The serial code could look something like this:

for(int i = 0; i < dv_b.size(); i++){
    const int readIndex = i / (dv_a.size() - 1);
    if(dv_a[readIndex] > 0) dv_b[i] = 1;
    else dv_b[i] = 0;
}

which can easily be written using for_each. I think this makes the code more clear compared to using transform together with various fancy iterators.

thrust::for_each(
    thrust::device,
    thrust::make_counting_iterator(0),
    thrust::make_counting_iterator(0) + dv_b.size(),
    [
     s = dv_a.size() - 1,
     dv_a = thrust::raw_pointer_cast(dv_a.data()),
     dv_b = thrust::raw_pointer_cast(dv_b.data())
    ] __device__ (int i){
        const int readIndex = i / s;
        if(dv_a[readIndex] > 0) dv_b[i] = 1;
        else dv_b[i] = 0;
    }
);
Sign up to request clarification or add additional context in comments.

3 Comments

This is more readable than the nested iterator idea from @paleonix's original comment on my post. One slight code improvement on this is to remove the if() in the last argument and naively set dv_b[i] = dv_a[readIndex] > 0 to eliminate the conditional (since >0 will return 1, and <=0 will return 0 anyway).
@aiwyn I don't think shortening the if is a code improvement. The way it is now clearly expresses that the value should be 1, not just the implicit conversion of a boolean value to int (which happens to be 1). In any case, the resulting ptx code is the same, anyways.
You're right, looks like the compiler handles it the same. Also, this version of the solution is 4x faster than the other one so I'm marking this as the answer.
2

Packing the fancy iterator creation into an appropriately named factory function makes this version quite readable as well. Especially if you need this kind of pattern more than once, this solution might be more elegant.

#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/transform.h>

#include <iostream>

// nvcc doesn't seem to like __device__ or __host__ __device__ lambdas in auto
// return types, so I defined this functor instead
template <typename T>
class Div {
    T div_{};
    public:
    Div(T div) : div_{div} {}
    __host__ __device__ T operator()(T in) const noexcept { return in / div_; }
};

// I stole "repeat" from numpy. Another version using modulo (%) and therefore
// repeating the whole input instead of single elements would be called "tile".
template <class It>
auto make_repeat_it_begin(It input, int repetitions) {
    using diff_t = typename It::difference_type;
    return thrust::make_permutation_iterator(
                input,
                thrust::make_transform_iterator(
                    thrust::make_counting_iterator(diff_t{0}),
                    Div{static_cast<diff_t>(repetitions)}));
}

int main() {
    int A[] = {5, -3, 2, 6};
    constexpr int size_A = sizeof(A) / sizeof(A[0]);

    thrust::host_vector<int> hv_A(A, A + size_A);
    thrust::device_vector<int> dv_A(hv_A);
    thrust::device_vector<int> dv_B(size_A * (size_A - 1));

    auto A_repeat_it = make_repeat_it_begin(dv_A.begin(), size_A - 1);
    
    thrust::transform(A_repeat_it, A_repeat_it + dv_B.size(), 
                      dv_B.begin(),
                      [] __device__ (int a){ return a > 0 ? 1 : 0; });

    thrust::host_vector<int> hv_B(dv_B);
    thrust::copy(hv_B.begin(), hv_B.end(),
                 std::ostream_iterator<int>(std::cout, ","));
}

Due to the device lambda, nvcc needs the -extended-lambda flag.

1 Comment

Thank you, I'll compare this with @Abator 's solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.