10

I went through several questions on StackOverflow but could not find the relevant answer. I want to fetch indices of k maximum values from a numpy ndarray. This link discusses the same but for 1D array. np.argsort for 2D array resulted into sorting of elements row-wise . i.e

Note: array elements are not unique.

input:

import numpy as np
n = np.arange(9).reshape(3,3)
>>> n
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])
s = n.argsort()
>>> s
array([[0, 1, 2],
   [0, 1, 2],
   [0, 1, 2]], dtype=int32)

Also,

import numpy as np
n = np.arange(9).reshape(3,3)
s = n.argsort(axis=None)
>>>s
array([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)

but I am loosing the array structure here and can not redeem the original indices of the elements.

Any knid of help is appreciated.

1
  • do you want an answer for 2D arrays of for nD arrays where n > 2? Commented Apr 13, 2017 at 7:53

1 Answer 1

12

Couple of approaches with np.argpartition and np.argsort for ndarrays -

def k_largest_index_argpartition_v1(a, k):
    idx = np.argpartition(-a.ravel(),k)[:k]
    return np.column_stack(np.unravel_index(idx, a.shape))

def k_largest_index_argpartition_v2(a, k):
    idx = np.argpartition(a.ravel(),a.size-k)[-k:]
    return np.column_stack(np.unravel_index(idx, a.shape))

def k_largest_index_argsort(a, k):
    idx = np.argsort(a.ravel())[:-k-1:-1]
    return np.column_stack(np.unravel_index(idx, a.shape))

Discussion on two versions with argpartition

Difference between k_largest_index_argpartition_v1 and k_largest_index_argpartition_v2 is how we are using argparition. In the first version, we are negating input array and then using argpartition to get the indices for the smallest k indices, thus effectively getting the largest k indices, whereas in the second version, we are getting the first a.size-k smallest indices and then we are choosing the leftover largest k indices.

Also, its worth mentioning here that with argpartition, we are not getting the indices in their sorted order. If the sorted order is needed, we need to feed in range array to np.argpartition, as mentioned in this post.

Sample runs -

1) 2D case :

In [42]: a    # 2D array
Out[42]: 
array([[38, 14, 81, 50],
       [17, 65, 60, 24],
       [64, 73, 25, 95]])

In [43]: k_largest_index_argsort(a, k=2)
Out[43]: 
array([[2, 3],
       [0, 2]])

In [44]: k_largest_index_argsort(a, k=4)
Out[44]: 
array([[2, 3],
       [0, 2],
       [2, 1],
       [1, 1]])

In [66]: k_largest_index_argpartition_v1(a, k=4)
Out[66]: 
array([[2, 1], # Notice the order is different
       [2, 3],
       [0, 2],
       [1, 1]])

2) 3D case :

In [46]: a # 3D array
Out[46]: 
array([[[20, 98, 27, 73],
        [33, 78, 48, 59],
        [28, 91, 64, 70]],

       [[47, 34, 51, 19],
        [73, 38, 63, 94],
        [95, 25, 93, 64]]])

In [47]: k_largest_index_argsort(a, k=2)
Out[47]: 
array([[0, 0, 1],
       [1, 2, 0]])

Runtime test -

In [56]: a = np.random.randint(0,99999999999999,(3000,4000))

In [57]: %timeit k_largest_index_argsort(a, k=10)
1 loops, best of 3: 2.18 s per loop

In [58]: %timeit k_largest_index_argpartition_v1(a, k=10)
10 loops, best of 3: 178 ms per loop

In [59]: %timeit k_largest_index_argpartition_v2(a, k=10)
10 loops, best of 3: 128 ms per loop
Sign up to request clarification or add additional context in comments.

11 Comments

Just for clarity, argpartition won't give the top k in order, just the top k in their initial order. A way to have the best of both worlds (speed of argpartition, order of argsort) is to sort after the partition: do idx2=np.argsort(-a.ravel()[idx]) and then row,col = np.unravel.index(idx[idx2], a.shape)
Well that's easier!
"As such, the first version would be better" - why does it matter whether you take the first or last k? Argpartition doesn't sort the first k, does it?
Also, the cost of doing -a.ravel() is somewhat significant here
@mLstudent33 So, what I do when I am quoting codes from elsewhere on here, I put the link and the author name with @ on top of the function code, something like in here - stackoverflow.com/a/51951787. So, I guess that could be used in assignments/outside of Stackoverflow too.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.