Access multiple elements of an array

Question

Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (each row has different starting and ending index). Number of elements is the same for each row, elements are consecutive. Example:

[ . . . . | | | . . . . . ]
[ | | | . . . . . . . . . ]
[ . . | | | . . . . . . . ]
[ . . . . . . . . | | | . ]

One solution would be to get indexes (row-column pair) of elements, and than use my_array[row_list,col_list].

Is there any other (simpler) way without using for loops?

In the example ( | ) are elements I want to access, ( . ) are other elements. Would you like to know anything else? — recodeFuture
– recodeFuture, Commented Jan 17, 2015 at 23:45
@tjons: what convinces you that we are working with a dictionary? The OP repeatedly refers to an array; the OP added the numpy tag; the representation looks a lot more like that of an array than a dictionary; etc. — DSM
– DSM, Commented Jan 18, 2015 at 0:04
@DSM my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops! — tjons
– tjons, Commented Jan 19, 2015 at 12:57

hpaulj · Accepted Answer · 2015-01-21 03:49:30Z

4

A = np.arange(40).reshape(4,10)*.1
startend = [[2,5],[3,6],[4,7],[5,8]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
A.flat[index_list]

producing

array([[ 0.2,  0.3,  0.4],
       [ 1.3,  1.4,  1.5],
       [ 2.4,  2.5,  2.6],
       [ 3.5,  3.6,  3.7]])

This still has an iteration, but it's a rather basic one over a list. I'm indexing the flattened, 1d, version of A. np.take(A, index_list) also works.

If the row intervals differ in size, I can use np.r_ to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.3,  1.4,  1.5,  2.4,  2.5,  2.6,  3.5,  3.6, 3.7])

The idx that ajcr uses can be used without choose:

idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
A[np.arange(A.shape[0])[:,None], idx]

idx is like my index_list except that it doesn't add the row length.

np.array(idx)

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7]])

Since each arange has the same length, idx can be generated without iteration:

col_start = np.array([2,3,4,5])
idx = col_start[:,None] + np.arange(3)

The first index is a column array that broadcasts to match this idx.

np.arange(A.shape[0])[:,None] 
array([[0],
       [1],
       [2],
       [3]])

With this A and idx I get the following timings:

In [515]: timeit np.choose(idx,A.T[:,:,None])
10000 loops, best of 3: 30.8 µs per loop

In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
100000 loops, best of 3: 10.8 µs per loop

In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 24.9 µs per loop

The flat indexing is faster, but calculating the fancier index takes up some time.

For large arrays, the speed of flat indexing dominates.

A=np.arange(4000).reshape(40,100)*.1
col_start=np.arange(20,60)
idx=col_start[:,None]+np.arange(30)

In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
10000 loops, best of 3: 108 µs per loop

In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 59.4 µs per loop

The np.choose method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).

What out of bounds idx?

col_start=np.array([2,4,6,8])
idx=col_start[:,None]+np.arange(3)
A[np.arange(A.shape[0])[:,None], idx]

produces an error because the last idx value is 10, too large.

You could clip idx

idx=idx.clip(0,A.shape[1]-1)

producing duplicate values in the last row

[ 3.8,  3.9,  3.9]

You could also pad A before indexing. See np.pad for more options.

np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]

Another option is to remove out of bounds values. idx would then become a ragged list of lists (or array of lists). The flat approach can handle this, though the result will not be a matrix.

startend = [[2,5],[4,7],[6,9],[8,10]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.4,  1.5,  1.6,  2.6,  2.7,  2.8,  3.8,  3.9])

edited Jan 21, 2015 at 3:49

answered Jan 18, 2015 at 4:52

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

recodeFuture Over a year ago

Do you think using list comprehension will be faster than simply using a for loop?

hpaulj Over a year ago

For constant length ranges you don't need any iteration - just matrix addition.

recodeFuture Over a year ago

I measured it myself and your method is faster indeed. Do you have any suggestions on how to prevent index out of bounds?

hpaulj Over a year ago

Out of bounds - like if a col_start value is too large, so col_start+n>A.shape[1]? What should happen?

hpaulj Over a year ago

I added some examples of dealing with out-of-bounds.

|

Alex Riley · Accepted Answer · 2015-01-18 18:07:06Z

3

You can use np.choose.

Here's an example NumPy array arr:

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20]])

Let's say we want to pick the values [1, 2, 3] from the first row, [11, 12, 13] from the second row and [17, 18, 19] from the third row.

In other words, we'll pick out the indices from each row of arr as shown in an array idx:

array([[1, 2, 3],
       [4, 5, 6],
       [3, 4, 5]])

Then using np.choose:

>>> np.choose(idx, arr.T[:,:,np.newaxis])
array([[ 1,  2,  3],
       [11, 12, 13],
       [17, 18, 19]])

To explain what just happened: arr.T[:,:,np.newaxis] meant that arr was temporarily viewed as 3D array with shape (7, 3, 1). You can imagine this as 3D array where each column of the original arr is now a 2D column vector with three values. The 3D array looks a bit like this:

#  0       1       2       3       4       5       6
[[ 0]   [[ 1]   [[ 2]   [[ 3]   [[ 4]   [[ 5]   [[ 6]   # choose values from 1, 2, 3
 [ 7]    [ 8]    [ 9]    [10]    [11]    [12]    [13]   # choose values from 4, 5, 6
 [14]]   [15]]   [16]]   [17]]   [18]]   [19]]   [20]]  # choose values from 3, 4, 5

To get the zeroth row of the output array, choose selects the zeroth element from the 2D column at index 1, the zeroth element from the 2D column at index 2, and the zeroth element from the 2D column at index 3.

To get the first row of the output array, choose selects the first element from the 2D column at index 4, the first element from the 2D column at index 5, ... and so on.

edited Jan 18, 2015 at 18:07

answered Jan 18, 2015 at 13:48

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

6 Comments

recodeFuture Over a year ago

Thanks, that looks like what I was thinking of. Now I have to check the performance of given solutions.

recodeFuture Over a year ago

I have one more question. What is the best way to create idx array if I have col_start vector and col_end vector equals (col_start + n)?

Alex Riley Over a year ago

@soccersniper: one way could be to use np.vstack and a list comprehension, e.g. np.vstack([np.arange(x, x+n) for x in col_start]). So above in my example, n is 3 and col_start is [1, 4, 3].

recodeFuture Over a year ago

Because n << len(col_start) I would rather do this: np.array( [col_start+i for i in range(n)] ) (same result if I use np.vstack). I would have to transpose this array to use your solution. Is there any other way?

hpaulj Over a year ago

It is possible to index arr with idx without choose - just use a matching column array for the 1st dimension.

|

dursk · Accepted Answer · 2015-01-17 23:46:28Z

1

I think you're looking for something like the below. I'm not sure what you want to do with them when you access them though.

indexes = [(4,6), (0,2), (2,4), (8, 10)]
arr = [
    [ . . . . | | | . . . . . ],
    [ | | | . . . . . . . . . ],
    [ . . | | | . . . . . . . ],
    [ . . . . . . . . | | | . ]
]

for x in zip(indexes, arr):
    index = x[0]
    row = x[1]
    print row[index[0]:index[1]+1]

answered Jan 17, 2015 at 23:46

dursk

4,4452 gold badges22 silver badges30 bronze badges

5 Comments

Padraic Cunningham Over a year ago

only problem is you now don't have a numpy array

recodeFuture Over a year ago

I want to find max value for "masked" elements in each row. Solution for accessing those elements would be simple if columns would be the same for all rows: my_array[:,col_start:col_end]. What I was looking for was a modification of previous statement in the case of different column indexes.

dursk Over a year ago

Where is numpy array coming from? OP says nothing about that? And @tjons: nothing in my answer is a dictionary?

recodeFuture Over a year ago

Original array contains dot products between direction vectors and gradient vectors on "rays" pointing from the center outwards for given angles. So i-th row of dot_product array contains dot products along the "ray" for i-th angle.

tjons Over a year ago

@mattm my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops!

Collectives™ on Stack Overflow

Access multiple elements of an array

3 Answers 3

9 Comments

6 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

6 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related