Find indices of rows of numpy 2d array with float data in another 2D array

Question

This post helped to achieve what I wanted but the implementation takes longer for some large datasets I work onNumPyhave two NumPy arrays (fairly large):

p[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.18999948,  0.        ,  0.0226188 ]]

v[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19775054,  0.0019907 ,  0.01800372],
   [ 0.19800517, -0.00065405,  0.01800135],
   [ 0.19731225, -0.00330035,  0.01799999],
   [ 0.19596213, -0.00537427,  0.01800001],
   [ 0.18937038, -0.00797523,  0.018     ],
   [ 0.18739267, -0.00759293,  0.01799974],
   [ 0.18565072, -0.00671446,  0.018     ],
   [ 0.18411626, -0.00545196,  0.01800367],
   [ 0.19136006, -0.00791202,  0.01799961],
   [ 0.1938769 , -0.00702934,  0.01799973],
   [ 0.1314003 , -0.06724723,  0.0645    ]])

v array is generated from p array using:

p_uniques, p_indices, p_inverse, p_counts = np.unique(
                                              p, return_index=True, 
                                              return_inverse=True, 
                                              return_counts=True, 
                                              axis=0)

v = p[np.sort(p_indices, axis=None)]

Now, the target is to generate an array containing the indices/occurrences of elements of the v array in the p array including duplicates. Therefore, the desired output would be:

indices[:24]=array([ 0,  1,  2,  3,  4,  2,  5,  6,  2,  6,  7,  2,  
                     7,  8,  2,  9, 10, 2,  2, 11, 12, 10, 11,  2])

I just posted the first 24 indices from the indices array to save space.

I tried various methods using np.where, np.isin, and others but I could not achieve the desired result with better performance over the solution shared in the linked post.

I'd greatly appreciate your help.

Homer512 · Accepted Answer · 2022-09-24 10:17:17Z

0

The key insight here is that v is a permutation of p_uniques and np.argsort(p_indices) provides this permutation. Inverting this permutation gives us the mapping that we have to apply to p_inverse to get what we want.

To invert the permutation, we use the code from How to invert a permutation array in numpy

# p_indices: len(v), range(0, len(p)). Maps v indices to p indices
# p_inverse: len(p), range(0, len(v)). Maps p indices to p_unique indices
p_uniques, p_indices, p_inverse = np.unique(
      p, return_index=True, return_inverse=True, axis=0)

# len(v), range(0, len(v)). Maps v indices to p_unique indices
sort_permut = np.argsort(p_indices)
v = p_uniques[sort_permut]

# len(v), range(0, len(v)). Maps p_unique indices to v indices
inv_sort = np.empty_like(sort_permut)
inv_sort[sort_permut] = np.arange(len(inv_sort))

# len(p), range(0, len(v)). Maps p indices to v indices
indices = inv_sort[p_inverse]

edited Sep 24, 2022 at 10:17

answered Sep 22, 2022 at 21:45

Homer512

15.1k2 gold badges16 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ravi Over a year ago

Thanks for the explanation. Unfortunately, it does not work. I think I should have explained my problem better. v array is generated from p array by using: p_uniques, p_indices = np.unique( p, return_index=True, axis=0 ) v = p[np.sort(p_indices, axis=None)] Therefore, v is nothing but the unique values from the p array. Now, I want to generate an indices array that tells all the occurrences of elements of v in the p array.

Homer512 Over a year ago

@Ravi then I suggest you edit your question because I'm not answering a question that isn't asked ;-) But I can already tell you that it is literally just the return_inverse=True option, followed by some index mapping , maybe withargsort, to keep in sync with the sorting

Ravi Over a year ago

I edited my question and added more clarification. return_inverse=True generates an array of indices to reconstruct the original array, in my case the p array. I tried argsort but it generates indices of the v array and does not include duplicate occurrences. For more clarification, you can visit this post but the solution mentioned takes a longer time and I want to achieve a faster solution.

Homer512 Over a year ago

@Ravi this should do it

Ravi Over a year ago

you're awesome, man!!! It works like charm and significantly brings down the time consumed in computation. To give you the context, for some datasets the time has come down from 7 secs to 0.63 msec.

Collectives™ on Stack Overflow

Find indices of rows of numpy 2d array with float data in another 2D array

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related