Reduce by Keys performance using linear index to rows index

Question

I have been checking the code rows_sum.cu from thrust examples and I could not understand what is exactly happening in linear_index_to_rows_index.

Can someone please explain me with some example what does convert a linear index to row index mean?

Reference: https://github.com/thrust/thrust/blob/master/examples/sum_rows.cu

Robert Crovella · Accepted Answer · 2014-05-13 16:14:29Z

4

In thrust, a common storage format is a vector container, (e.g. device_vector and host_vector). In typical usage, these containers are a one-dimensional storage format. The concept of "row" and "column" don't really apply to a 1D vector. We simply talk about the index of an element. In this case, it is a linear (1D) index.

In order to store a 2 dimensional item, such as a matrix where the concept of "row" and "column" indexing of elements is applicable, a common approach to using the 1D containers in thrust is to "flatten" or "linearize" the storage:

Matrix A:
        column:
        0    1 
row: 0  1    2
     1  3    4


Vector A:
index:   0    1    2    3
element: 1    2    3    4

At this point, we can conveniently use thrust operations. But what if we want to do operations on specific rows or columns of the original matrix? That information (row, column) is "lost" when we linearize or flatten it into a 1D vector. But if we know the dimensions of the original matrix, we can use that to convert a linear index:

  0     1     2     3

into a row/column index:

(0,0) (0,1) (1,0) (1,1)

and the reverse (row/column to linear index).

In the case of the thrust example you linked, the functor linear_index_to_row_index converts a given linear index to its associated row index. This allows the programmer to write a thrust operation that works on specific rows of data, such as summing the "rows" of the original matrix, even though it is now stored in a linear 1D vector.

Specifically, when given the following linear indicies:

  0    1    2    3

For my example, the functor would return:

  0    0    1    1

Because that represents the row of the original matrix that the specific element in the vector belongs to.

If I want to sum all the elements of each row together, producing one sum per row, then I can use the row index generated by that functor to identify the row of each element. At that point, reduce_by_key can easily sum together the elements that have the same row index, producing one result per row.

edited May 13, 2014 at 16:14

answered May 13, 2014 at 16:05

Robert Crovella

154k12 gold badges254 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Walter white Over a year ago

Thanks Robert, very good example, In the same function, what is exactly the value of i? T operator()(T i) { return i / C; }

Robert Crovella Over a year ago

i is the linear index passed to the functor. The functor returns the corresponding row (i/C} that the element at linear index i belongs to.

Walter white Over a year ago

Hello Robert, Coming back to the Reduce_by_keys implementation of rows_sum.cu, I noticed that it is possible to also get the same result (sum rows per columns) by implementing this way: thrust::counting_iterator<int> first(0); thrust::counting_iterator<int> last = first + nR; repeated_range<thrust::counting_iterator<int> > keys(first, last, nC) //Number of columns. After getting the vector of keys used as input for: thrust_reduce_by_key( key.begin(), key.end(), tempVector.begin(), Vector_indices.begin(), Vector.begin());

Walter white Over a year ago

This implementation as mentioned using counting_iterator also can help me to produce the keys for input. But the computation is very slow compared to the linear_index_to_row_index, actually two times slower more or less. Do you know what could be the reason? is it that using counting_iterator, the implementation does not know when to stop doing the reduce until finding a different key and in linear_index_to_row_index, it exactly knows when to stop reducing?

Robert Crovella Over a year ago

I believe you have stumbled on a typical "fusion" optimization/comparison. These problems (e.g. reduction) will all be memory-bound on the GPU. So the slower method first generates the vector of keys and stores it in memory, then uses this generated vector to select the reduction points. This two-step process is fused into a single process if we generate the key vector "on the fly" using a functor (linear_index_to_row_index) which doesn't require generation (and storage in global memory) of a key vector. The two step process approximately doubles the time for a memory bound problem.

|

Collectives™ on Stack Overflow

Reduce by Keys performance using linear index to rows index

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related