3

Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (each row has different starting and ending index). Number of elements is the same for each row, elements are consecutive. Example:

[ . . . . | | | . . . . . ]
[ | | | . . . . . . . . . ]
[ . . | | | . . . . . . . ]
[ . . . . . . . . | | | . ]

One solution would be to get indexes (row-column pair) of elements, and than use my_array[row_list,col_list].

Is there any other (simpler) way without using for loops?

4
  • Yes, but can you provide a better example? Commented Jan 17, 2015 at 23:38
  • In the example ( | ) are elements I want to access, ( . ) are other elements. Would you like to know anything else? Commented Jan 17, 2015 at 23:45
  • 1
    @tjons: what convinces you that we are working with a dictionary? The OP repeatedly refers to an array; the OP added the numpy tag; the representation looks a lot more like that of an array than a dictionary; etc. Commented Jan 18, 2015 at 0:04
  • @DSM my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops! Commented Jan 19, 2015 at 12:57

3 Answers 3

4
A = np.arange(40).reshape(4,10)*.1
startend = [[2,5],[3,6],[4,7],[5,8]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
A.flat[index_list]

producing

array([[ 0.2,  0.3,  0.4],
       [ 1.3,  1.4,  1.5],
       [ 2.4,  2.5,  2.6],
       [ 3.5,  3.6,  3.7]])

This still has an iteration, but it's a rather basic one over a list. I'm indexing the flattened, 1d, version of A. np.take(A, index_list) also works.

If the row intervals differ in size, I can use np.r_ to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.3,  1.4,  1.5,  2.4,  2.5,  2.6,  3.5,  3.6, 3.7])

The idx that ajcr uses can be used without choose:

idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
A[np.arange(A.shape[0])[:,None], idx]

idx is like my index_list except that it doesn't add the row length.

np.array(idx)

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7]])

Since each arange has the same length, idx can be generated without iteration:

col_start = np.array([2,3,4,5])
idx = col_start[:,None] + np.arange(3)

The first index is a column array that broadcasts to match this idx.

np.arange(A.shape[0])[:,None] 
array([[0],
       [1],
       [2],
       [3]])

With this A and idx I get the following timings:

In [515]: timeit np.choose(idx,A.T[:,:,None])
10000 loops, best of 3: 30.8 µs per loop

In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
100000 loops, best of 3: 10.8 µs per loop

In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 24.9 µs per loop

The flat indexing is faster, but calculating the fancier index takes up some time.

For large arrays, the speed of flat indexing dominates.

A=np.arange(4000).reshape(40,100)*.1
col_start=np.arange(20,60)
idx=col_start[:,None]+np.arange(30)

In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
10000 loops, best of 3: 108 µs per loop

In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 59.4 µs per loop

The np.choose method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).


What out of bounds idx?

col_start=np.array([2,4,6,8])
idx=col_start[:,None]+np.arange(3)
A[np.arange(A.shape[0])[:,None], idx]

produces an error because the last idx value is 10, too large.

You could clip idx

idx=idx.clip(0,A.shape[1]-1)

producing duplicate values in the last row

[ 3.8,  3.9,  3.9]

You could also pad A before indexing. See np.pad for more options.

np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]

Another option is to remove out of bounds values. idx would then become a ragged list of lists (or array of lists). The flat approach can handle this, though the result will not be a matrix.

startend = [[2,5],[4,7],[6,9],[8,10]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.4,  1.5,  1.6,  2.6,  2.7,  2.8,  3.8,  3.9])
Sign up to request clarification or add additional context in comments.

9 Comments

Do you think using list comprehension will be faster than simply using a for loop?
For constant length ranges you don't need any iteration - just matrix addition.
I measured it myself and your method is faster indeed. Do you have any suggestions on how to prevent index out of bounds?
Out of bounds - like if a col_start value is too large, so col_start+n>A.shape[1]? What should happen?
I added some examples of dealing with out-of-bounds.
|
3

You can use np.choose.

Here's an example NumPy array arr:

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20]])

Let's say we want to pick the values [1, 2, 3] from the first row, [11, 12, 13] from the second row and [17, 18, 19] from the third row.

In other words, we'll pick out the indices from each row of arr as shown in an array idx:

array([[1, 2, 3],
       [4, 5, 6],
       [3, 4, 5]])

Then using np.choose:

>>> np.choose(idx, arr.T[:,:,np.newaxis])
array([[ 1,  2,  3],
       [11, 12, 13],
       [17, 18, 19]])

To explain what just happened: arr.T[:,:,np.newaxis] meant that arr was temporarily viewed as 3D array with shape (7, 3, 1). You can imagine this as 3D array where each column of the original arr is now a 2D column vector with three values. The 3D array looks a bit like this:

#  0       1       2       3       4       5       6
[[ 0]   [[ 1]   [[ 2]   [[ 3]   [[ 4]   [[ 5]   [[ 6]   # choose values from 1, 2, 3
 [ 7]    [ 8]    [ 9]    [10]    [11]    [12]    [13]   # choose values from 4, 5, 6
 [14]]   [15]]   [16]]   [17]]   [18]]   [19]]   [20]]  # choose values from 3, 4, 5

To get the zeroth row of the output array, choose selects the zeroth element from the 2D column at index 1, the zeroth element from the 2D column at index 2, and the zeroth element from the 2D column at index 3.

To get the first row of the output array, choose selects the first element from the 2D column at index 4, the first element from the 2D column at index 5, ... and so on.

6 Comments

Thanks, that looks like what I was thinking of. Now I have to check the performance of given solutions.
I have one more question. What is the best way to create idx array if I have col_start vector and col_end vector equals (col_start + n)?
@soccersniper: one way could be to use np.vstack and a list comprehension, e.g. np.vstack([np.arange(x, x+n) for x in col_start]). So above in my example, n is 3 and col_start is [1, 4, 3].
Because n << len(col_start) I would rather do this: np.array( [col_start+i for i in range(n)] ) (same result if I use np.vstack). I would have to transpose this array to use your solution. Is there any other way?
It is possible to index arr with idx without choose - just use a matching column array for the 1st dimension.
|
1

I think you're looking for something like the below. I'm not sure what you want to do with them when you access them though.

indexes = [(4,6), (0,2), (2,4), (8, 10)]
arr = [
    [ . . . . | | | . . . . . ],
    [ | | | . . . . . . . . . ],
    [ . . | | | . . . . . . . ],
    [ . . . . . . . . | | | . ]
]

for x in zip(indexes, arr):
    index = x[0]
    row = x[1]
    print row[index[0]:index[1]+1]

5 Comments

only problem is you now don't have a numpy array
I want to find max value for "masked" elements in each row. Solution for accessing those elements would be simple if columns would be the same for all rows: my_array[:,col_start:col_end]. What I was looking for was a modification of previous statement in the case of different column indexes.
Where is numpy array coming from? OP says nothing about that? And @tjons: nothing in my answer is a dictionary?
Original array contains dot products between direction vectors and gradient vectors on "rays" pointing from the center outwards for given angles. So i-th row of dot_product array contains dot products along the "ray" for i-th angle.
@mattm my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.