Trending 'thrust' questions

3 votes

1 answer

111 views

Deleted function compiler errors using thrust::remove in C++

I am currently attempting to use the thrust::remove function on a thrust::device_vector of structs in my main function as shown bellow: #include <iostream> #include <thrust/device_vector.h>...

AowynB

33

asked Nov 11 at 8:44

2 votes

1 answer

101 views

Efficiently Sorting Rows of a Flattened Matrix Using thrust::sort_by_key

I am maintaining a matrix of integers with dimensions 𝑎 × 𝑏 as a flattened array in row-major order. Now, I need to rearrange the rows of this matrix according to a priority array (length 𝑎) (Think ...

Samiran K

63

asked Feb 3 at 22:06

3 votes

1 answer

300 views

CUB reduce_by_key

Thrust has the thrust::reduce_by_key algorithm which works well for a problem of mine. I wanted to try to use CUB for finer control of memory and streams as well as interaction with my own kernels, ...

Treeman

116

asked Nov 15, 2024 at 16:16

0 votes

1 answer

145 views

Is it possible to overcome the maximum number of iterators in thrust::zip_iterator?

I’m using Thrust for some tasks at work and have found that there seems to be a maximum number of iterators when constructing a zip_iterator. For example #include <thrust/iterator/zip_iterator.h>...

DocWho

11

asked Sep 3, 2024 at 13:43

1 vote

0 answers

219 views

Unable to include thrust/host_vector.h and others with CUDA 12.5

This test program compiled fine with CUDA 12.4 and lower, but fails to compile w/ 12.5.1: #include <thrust/host_vector.h> #include <thrust/scan.h> #include <iostream> int main() { ...

Matt

20.8k

asked Jul 23, 2024 at 15:05

2 votes

0 answers

91 views

Optimizing Complex Number Computations for GPU with Thrust: Seeking Efficient Migration Advice

I am a little bit new to thrust I am trying to migrate the following code to use make use of gpus but this one seems a little difficult #include <iostream> #include <complex> #include <...

kiragon kiriyo

35

asked Aug 2, 2024 at 9:20

2 votes

1 answer

1k views

CUDA thrust iterator: how to use iterator to implement efficient fill and copy on device_vectors?

My project contains many fill, copy and other basic operations. However, I'm new to CUDA programming, my current implementation just uses a for loop to operate on device_vector which is far less ...

Dylan

89

asked Oct 25, 2022 at 7:59

2 votes

2 answers

638 views

Attempt to use an extended device lambda in a context that requires querying its return type in host code

I'm receiving the compiler error static_assert failed: 'Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a ...

0xbadf00d

18.4k

asked Apr 12, 2023 at 15:31

49 votes

5 answers

26k views

Thrust inside user written kernels

I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code. I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the ...

Ashwin Nanjappa

79k

asked Apr 1, 2011 at 8:14

0 votes

1 answer

141 views

Pair deduplication on CUDA

I have a data structure already running on CUDA and collect the data as below: struct SearchDataOnDevice { size_t npair; int * id1; int * id2; }; I'd like to remove the duplicated id pair ...

holmessh

89

asked Nov 3, 2023 at 1:10

1 vote

2 answers

385 views

error: "forceinline" redefined in simple program

Compiling the 3-line program test-cuda.cpp #include <thrust/execution_policy.h> int main() { return 0; } results in a compiler warning/error: $ g++ -std=c++17 test-cuda.cpp -I/opt/cuda/targets/...

Matt

20.8k

asked Apr 6, 2023 at 21:51

1 vote

1 answer

185 views

Cyclically rotating a GPU vector?

I have an algorithm I would like to implement, which involves coordinatewise addition, coordinatewise multiplication, and cyclic rotation of coordinates. My addition and multiplication are a little ...

Mark Schultz-Wu

374

asked Oct 2, 2023 at 20:01

3 votes

1 answer

194 views

Replace/Merge operations in vectors using CUDA Thrust

I have two operations for manipulating elements in device vectors using CUDA Thrust. Which methods can implement these two functions more efficiently? Replace part of values of a vector in batch with ...

Chris

33

asked Aug 9, 2023 at 8:15

1 vote

1 answer

2k views

How do I thrust::sort() or thrust::sort_by_key() with raw pointers [duplicate]

I want to sort an array using raw device pointers with thrust::sort() and thrust::sort_by_key() because it uses radix sort. The data is in a raw uint64_t device pointer, and I initialize with random ...

PlatinumFrog

23

asked Jan 15, 2023 at 20:02

1 vote

1 answer

267 views

CUDA, how to find the first item in an array that makes a function maximal

In Cuda C++, I have a big array Arr of integers, and a function F: int -> int. I want to find the first index of some items in Arr that makes F maximal. How can I write a kernel that always keeps ...

Mojtaba Valizadeh

766

asked Dec 6, 2022 at 23:01

1 vote

1 answer

464 views

How can I do segmented reduction using CUDA thrust?

I want to store partial reduction results in an array. Say I have data[8] = {10,20,30,40,50,60,70,80}. And if I divide the data with the chunk_size of 2, the chunks will be {10,20}, {30,40}, ... , {70,...

Sangjun Lee

455

asked May 14, 2023 at 12:35

-2 votes

2 answers

330 views

How to use thrust::transform on larger Vector derived from smaller Vector?

Input and starting arrays: dv_A = { 5, -3, 2, 6} //4 elements dv_B = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } Expected output: dv_B = { 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1 } For every element in dv_A{},...

aiwyn

278

asked Sep 25, 2022 at 9:03

0 votes

1 answer

193 views

CUDA Thrust How can I combine copy_if and transform without materialize data

Let's say we have two inputs, the first one is an array, and the second is a bitmap thrust::device_vector<point_t> points; Bitset bits; // Imagine this can be accessed within the kernel. What I ...

geng liang

90

asked Aug 5, 2023 at 17:01

1 vote

1 answer

158 views

Further chance of optimization of Thrust operation of CUDA kernel

I have a CUDA kernel which essentially looks like the following. __global__ void myOpKernel(double *vals, double *data, int *nums, double *crit, int N, int K) { int index = blockIdx.x*blockDim.x + ...

Sangjun Lee

455

asked May 17, 2023 at 7:18

2 votes

1 answer

411 views

CUDA force instruction execution order

I'm trying to transfer some data manipulations from CPU to GPU (CUDA), but there's one small part that requires instructions to be run in a specific order. In principle I could do the first few ...

defladamouse

635

asked Nov 23, 2021 at 17:33

0 votes

1 answer

371 views

Parallelization of a for loop consisting of Thrust Transforms

I've implemented a for loop consisting of several Thrust transformations. My aim is to calculate r[i] for each value of i from 0 to N. To put simply, r is a column vector and each of its elements can ...

Muhteva

2,840

asked Jan 5, 2023 at 11:18

2 votes

1 answer

290 views

Why is the iterating range of thrust::reduce limited to 2048 doubles in device code?

I am using the NVIDIA HPC SDK (2022) to compile the following code, the basic purpose of which is to sum a NxM matrix into a vector of size N. #include <thrust/host_vector.h> #include <thrust/...

batman216

77

asked Nov 20, 2022 at 11:40

0 votes

1 answer

432 views

Is THRUST stable_sort_by_key O(n)?

Can I assume that Thrust stable_sort_by_key performed on unsigned int has complexity O(n)? If not what should I do to be sure that this complexity will be achieved? (Except of implementing radix sort ...

complikator

292

asked Nov 10, 2022 at 12:02

1 vote

1 answer

1k views

How to do a reduction over one dimension of 2D data in Thrust

I'm new to CUDA and the thrust library. I'm learning and trying to implement a function that will have a for loop doing a thrust function. Is there a way to convert this loop into another thrust ...

KLi2708

13

asked Feb 25, 2022 at 8:38

-1 votes

1 answer

72 views

Storing data from device to main memory

I have a device vector that I continuously modify and then want to save in an HDF5 file. Because of the size of the device vector I cannot make multiple modifications and then save them to reduce the ...

Luluio

129

asked Jun 6, 2023 at 21:31

Collectives™ on Stack Overflow

Deleted function compiler errors using thrust::remove in C++

Efficiently Sorting Rows of a Flattened Matrix Using thrust::sort_by_key

CUB reduce_by_key

Is it possible to overcome the maximum number of iterators in thrust::zip_iterator?

Unable to include thrust/host_vector.h and others with CUDA 12.5

Optimizing Complex Number Computations for GPU with Thrust: Seeking Efficient Migration Advice

CUDA thrust iterator: how to use iterator to implement efficient fill and copy on device_vectors?

Attempt to use an extended device lambda in a context that requires querying its return type in host code

Thrust inside user written kernels

Pair deduplication on CUDA

error: "forceinline" redefined in simple program

Cyclically rotating a GPU vector?

Replace/Merge operations in vectors using CUDA Thrust

How do I thrust::sort() or thrust::sort_by_key() with raw pointers [duplicate]

CUDA, how to find the first item in an array that makes a function maximal

How can I do segmented reduction using CUDA thrust?

How to use thrust::transform on larger Vector derived from smaller Vector?

CUDA Thrust How can I combine copy_if and transform without materialize data

Further chance of optimization of Thrust operation of CUDA kernel

CUDA force instruction execution order

Parallelization of a for loop consisting of Thrust Transforms

Why is the iterating range of thrust::reduce limited to 2048 doubles in device code?

Is THRUST stable_sort_by_key O(n)?

How to do a reduction over one dimension of 2D data in Thrust

Storing data from device to main memory

Hot Network Questions