970 questions
3
votes
1
answer
111
views
Deleted function compiler errors using thrust::remove in C++
I am currently attempting to use the thrust::remove function on a thrust::device_vector of structs in my main function as shown bellow:
#include <iostream>
#include <thrust/device_vector.h>...
2
votes
0
answers
39
views
Use thrust::reduce for multplying a sequence of matrices
I am trying to use a reduction algorithm like thrust::reduce for a sequence of matrices. Let's say I want to do the product of N matrices: A1A2....*AN. I think a reduction algorithm would be great ...
3
votes
1
answer
300
views
CUB reduce_by_key
Thrust has the thrust::reduce_by_key algorithm which works well for a problem of mine. I wanted to try to use CUB for finer control of memory and streams as well as interaction with my own kernels, ...
0
votes
1
answer
145
views
Is it possible to overcome the maximum number of iterators in thrust::zip_iterator?
I’m using Thrust for some tasks at work and have found that there seems to be a maximum number of iterators when constructing a zip_iterator.
For example
#include <thrust/iterator/zip_iterator.h>...
1
vote
0
answers
219
views
Unable to include thrust/host_vector.h and others with CUDA 12.5
This test program compiled fine with CUDA 12.4 and lower, but fails to compile w/ 12.5.1:
#include <thrust/host_vector.h>
#include <thrust/scan.h>
#include <iostream>
int main() {
...
0
votes
1
answer
604
views
Can not find thrust/universal_vector.h in the Thrust library which is included in the CUDA Toolkit
I am currently using Thrust to transfer data between GPU and CPU. But when I include <thrust/universal_vector.h> in my code, using CMake to configure the project, "fatal error : No such ...
-1
votes
1
answer
92
views
build error using Custom data types with Thrust Vectors and Cuda
I am using Thrust vectors for cuda kernels and i have created my own struct so i can pass in all of the data together, when i initialise a host vector of my custom type i get a build error from thrust....
0
votes
1
answer
224
views
How to find indices of a vector given another vector in Thrust
I have two thrust device vectors, let's say a and b. I would like to find the indices of vector a where it is smaller/greater than the pair-wise component of vector b ($a_{ij}>b_{ij}$).
I found the ...
-2
votes
1
answer
491
views
How to install Cuda toolkit on GitHub Codespaces
I would like to compile Cuda code, in particular code that uses the library Thrust, on a GitHub codespace (one without GPU). I do not need to be able to run the Cuda code.
I installed the NSight VS ...
3
votes
1
answer
194
views
Replace/Merge operations in vectors using CUDA Thrust
I have two operations for manipulating elements in device vectors using CUDA Thrust. Which methods can implement these two functions more efficiently?
Replace part of values of a vector in batch with ...
1
vote
1
answer
620
views
memory pool in thrust execution policy
I am looking for solutions to use a memory pool within thrust as I want to limit the number of calls to cudaMalloc.
device_vector definitely accepts an allocator, but it's not so easy to deal with ...
-1
votes
1
answer
72
views
Storing data from device to main memory
I have a device vector that I continuously modify and then want to save in an HDF5 file. Because of the size of the device vector I cannot make multiple modifications and then save them to reduce the ...
2
votes
1
answer
189
views
Fusing two reduction operations in cuda Thrust
Is there a way to do a reduce_by_key operation and a reduce (or ideally another reduce_by_key) operation in only one kernel call in Thrust? Besides gaining computational speed, let us say I want to do ...
1
vote
1
answer
158
views
Further chance of optimization of Thrust operation of CUDA kernel
I have a CUDA kernel which essentially looks like the following.
__global__ void myOpKernel(double *vals, double *data, int *nums, double *crit, int N, int K) {
int index = blockIdx.x*blockDim.x + ...
2
votes
1
answer
110
views
Simpson's Integration code with Thrust outputs different results on two machines with NVC++
I wrote a numerical integration code:
#include <thrust/inner_product.h>
#include <thrust/transform.h>
#include <thrust/for_each.h>
#include <thrust/iterator/zip_iterator.h>
#...
1
vote
2
answers
385
views
error: "__forceinline__" redefined in simple program
Compiling the 3-line program test-cuda.cpp
#include <thrust/execution_policy.h>
int main() { return 0; }
results in a compiler warning/error:
$ g++ -std=c++17 test-cuda.cpp -I/opt/cuda/targets/...
0
votes
0
answers
37
views
How do I use complex thrust::device_vector in cuFFT functions [duplicate]
I have a working code (not shown) that performs a series of complex->complex fast fourier transforms using the cufft library. I have been attempting to simplify this code by using the thrust ...
1
vote
1
answer
167
views
thrust::make_zip_iterator - What happens with inconsistent iterator ranges?
I am currently trying to understand the following example of the boost library that uses parallel processing with thrust.
struct lorenz_system
{
struct lorenz_functor
{
template< ...
1
vote
1
answer
2k
views
How do I thrust::sort() or thrust::sort_by_key() with raw pointers [duplicate]
I want to sort an array using raw device pointers with thrust::sort() and thrust::sort_by_key() because it uses radix sort. The data is in a raw uint64_t device pointer, and I initialize with random ...
0
votes
0
answers
37
views
casting iterator returned by thrust::find_if to a struct pointer [duplicate]
I am using c++ for my project and facing this error in one of my functions.
"a const_cast can only adjust type qualifiers; it cannot change the underlying type"
I am working with thrust ...
1
vote
1
answer
267
views
CUDA, how to find the first item in an array that makes a function maximal
In Cuda C++, I have a big array Arr of integers, and a function F: int -> int. I want to find the first index of some items in Arr that makes F maximal.
How can I write a kernel that always keeps ...
0
votes
1
answer
432
views
Is THRUST stable_sort_by_key O(n)?
Can I assume that Thrust stable_sort_by_key performed on unsigned int has complexity O(n)? If not what should I do to be sure that this complexity will be achieved? (Except of implementing radix sort ...
2
votes
1
answer
1k
views
CUDA thrust iterator: how to use iterator to implement efficient fill and copy on device_vectors?
My project contains many fill, copy and other basic operations.
However, I'm new to CUDA programming, my current implementation just uses a for loop to operate on device_vector which is far less ...
1
vote
2
answers
652
views
Why is thrust reduce_by_key almost 75x slower than for_each with atomicAdd()?
I was not satisfied with the performance of the below thrust::reduce_by_key, so I rewrote it in a variety of ways with little gained benefit (including removing the permutation iterator). However, it ...
0
votes
1
answer
224
views
How to multiply two iterators and return the product to a thrust::reduce algorithm?
I have this reduce_by_key working fine, except I want to multiply dv_Vals with another vector (dv_Active) that contains 0 or 1 values so that the resultant product is a new iterator with either a ...