970 questions
3
votes
1
answer
111
views
Deleted function compiler errors using thrust::remove in C++
I am currently attempting to use the thrust::remove function on a thrust::device_vector of structs in my main function as shown bellow:
#include <iostream>
#include <thrust/device_vector.h>...
2
votes
1
answer
101
views
Efficiently Sorting Rows of a Flattened Matrix Using thrust::sort_by_key
I am maintaining a matrix of integers with dimensions
𝑎
×
𝑏
as a flattened array in row-major order. Now, I need to rearrange the rows of this matrix according to a priority array (length 𝑎) (Think ...
3
votes
1
answer
300
views
CUB reduce_by_key
Thrust has the thrust::reduce_by_key algorithm which works well for a problem of mine. I wanted to try to use CUB for finer control of memory and streams as well as interaction with my own kernels, ...
0
votes
1
answer
145
views
Is it possible to overcome the maximum number of iterators in thrust::zip_iterator?
I’m using Thrust for some tasks at work and have found that there seems to be a maximum number of iterators when constructing a zip_iterator.
For example
#include <thrust/iterator/zip_iterator.h>...
1
vote
0
answers
219
views
Unable to include thrust/host_vector.h and others with CUDA 12.5
This test program compiled fine with CUDA 12.4 and lower, but fails to compile w/ 12.5.1:
#include <thrust/host_vector.h>
#include <thrust/scan.h>
#include <iostream>
int main() {
...
2
votes
0
answers
91
views
Optimizing Complex Number Computations for GPU with Thrust: Seeking Efficient Migration Advice
I am a little bit new to thrust I am trying to migrate the following code to use make use of gpus but this one seems a little difficult
#include <iostream>
#include <complex>
#include <...
2
votes
1
answer
1k
views
CUDA thrust iterator: how to use iterator to implement efficient fill and copy on device_vectors?
My project contains many fill, copy and other basic operations.
However, I'm new to CUDA programming, my current implementation just uses a for loop to operate on device_vector which is far less ...
2
votes
2
answers
638
views
Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code
I'm receiving the compiler error
static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a ...
49
votes
5
answers
26k
views
Thrust inside user written kernels
I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code.
I would like to know if I can pass a device_vector to my own kernel? How?
If yes, what are the ...
0
votes
1
answer
141
views
Pair deduplication on CUDA
I have a data structure already running on CUDA and collect the data as below:
struct SearchDataOnDevice
{
size_t npair;
int * id1;
int * id2;
};
I'd like to remove the duplicated id pair ...
1
vote
2
answers
385
views
error: "__forceinline__" redefined in simple program
Compiling the 3-line program test-cuda.cpp
#include <thrust/execution_policy.h>
int main() { return 0; }
results in a compiler warning/error:
$ g++ -std=c++17 test-cuda.cpp -I/opt/cuda/targets/...
1
vote
1
answer
185
views
Cyclically rotating a GPU vector?
I have an algorithm I would like to implement, which involves
coordinatewise addition,
coordinatewise multiplication, and
cyclic rotation of coordinates.
My addition and multiplication are a little ...
3
votes
1
answer
194
views
Replace/Merge operations in vectors using CUDA Thrust
I have two operations for manipulating elements in device vectors using CUDA Thrust. Which methods can implement these two functions more efficiently?
Replace part of values of a vector in batch with ...
1
vote
1
answer
2k
views
How do I thrust::sort() or thrust::sort_by_key() with raw pointers [duplicate]
I want to sort an array using raw device pointers with thrust::sort() and thrust::sort_by_key() because it uses radix sort. The data is in a raw uint64_t device pointer, and I initialize with random ...
1
vote
1
answer
267
views
CUDA, how to find the first item in an array that makes a function maximal
In Cuda C++, I have a big array Arr of integers, and a function F: int -> int. I want to find the first index of some items in Arr that makes F maximal.
How can I write a kernel that always keeps ...
1
vote
1
answer
464
views
How can I do segmented reduction using CUDA thrust?
I want to store partial reduction results in an array.
Say I have data[8] = {10,20,30,40,50,60,70,80}.
And if I divide the data with the chunk_size of 2, the chunks will be {10,20}, {30,40}, ... , {70,...
-2
votes
2
answers
330
views
How to use thrust::transform on larger Vector derived from smaller Vector?
Input and starting arrays:
dv_A = { 5, -3, 2, 6} //4 elements
dv_B = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
Expected output:
dv_B = { 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1 }
For every element in dv_A{},...
0
votes
1
answer
193
views
CUDA Thrust How can I combine copy_if and transform without materialize data
Let's say we have two inputs, the first one is an array, and the second is a bitmap
thrust::device_vector<point_t> points;
Bitset bits; // Imagine this can be accessed within the kernel.
What I ...
1
vote
1
answer
158
views
Further chance of optimization of Thrust operation of CUDA kernel
I have a CUDA kernel which essentially looks like the following.
__global__ void myOpKernel(double *vals, double *data, int *nums, double *crit, int N, int K) {
int index = blockIdx.x*blockDim.x + ...
2
votes
1
answer
411
views
CUDA force instruction execution order
I'm trying to transfer some data manipulations from CPU to GPU (CUDA), but there's one small part that requires instructions to be run in a specific order. In principle I could do the first few ...
0
votes
1
answer
371
views
Parallelization of a for loop consisting of Thrust Transforms
I've implemented a for loop consisting of several Thrust transformations. My aim is to calculate r[i] for each value of i from 0 to N. To put simply, r is a column vector and each of its elements can ...
2
votes
1
answer
290
views
Why is the iterating range of thrust::reduce limited to 2048 doubles in device code?
I am using the NVIDIA HPC SDK (2022) to compile the following code, the basic purpose of which is to sum a NxM matrix into a vector of size N.
#include <thrust/host_vector.h>
#include <thrust/...
0
votes
1
answer
432
views
Is THRUST stable_sort_by_key O(n)?
Can I assume that Thrust stable_sort_by_key performed on unsigned int has complexity O(n)? If not what should I do to be sure that this complexity will be achieved? (Except of implementing radix sort ...
1
vote
1
answer
1k
views
How to do a reduction over one dimension of 2D data in Thrust
I'm new to CUDA and the thrust library. I'm learning and trying to implement a function that will have a for loop doing a thrust function. Is there a way to convert this loop into another thrust ...
-1
votes
1
answer
72
views
Storing data from device to main memory
I have a device vector that I continuously modify and then want to save in an HDF5 file. Because of the size of the device vector I cannot make multiple modifications and then save them to reduce the ...