0

This post helped to achieve what I wanted but the implementation takes longer for some large datasets I work onNumPyhave two NumPy arrays (fairly large):

p[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.18999948,  0.        ,  0.0226188 ]]

v[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19775054,  0.0019907 ,  0.01800372],
   [ 0.19800517, -0.00065405,  0.01800135],
   [ 0.19731225, -0.00330035,  0.01799999],
   [ 0.19596213, -0.00537427,  0.01800001],
   [ 0.18937038, -0.00797523,  0.018     ],
   [ 0.18739267, -0.00759293,  0.01799974],
   [ 0.18565072, -0.00671446,  0.018     ],
   [ 0.18411626, -0.00545196,  0.01800367],
   [ 0.19136006, -0.00791202,  0.01799961],
   [ 0.1938769 , -0.00702934,  0.01799973],
   [ 0.1314003 , -0.06724723,  0.0645    ]])

v array is generated from p array using:

p_uniques, p_indices, p_inverse, p_counts = np.unique(
                                              p, return_index=True, 
                                              return_inverse=True, 
                                              return_counts=True, 
                                              axis=0)

v = p[np.sort(p_indices, axis=None)]

Now, the target is to generate an array containing the indices/occurrences of elements of the v array in the p array including duplicates. Therefore, the desired output would be:

indices[:24]=array([ 0,  1,  2,  3,  4,  2,  5,  6,  2,  6,  7,  2,  
                     7,  8,  2,  9, 10, 2,  2, 11, 12, 10, 11,  2])

I just posted the first 24 indices from the indices array to save space.

I tried various methods using np.where, np.isin, and others but I could not achieve the desired result with better performance over the solution shared in the linked post.

I'd greatly appreciate your help.

1 Answer 1

0

The key insight here is that v is a permutation of p_uniques and np.argsort(p_indices) provides this permutation. Inverting this permutation gives us the mapping that we have to apply to p_inverse to get what we want.

To invert the permutation, we use the code from How to invert a permutation array in numpy

# p_indices: len(v), range(0, len(p)). Maps v indices to p indices
# p_inverse: len(p), range(0, len(v)). Maps p indices to p_unique indices
p_uniques, p_indices, p_inverse = np.unique(
      p, return_index=True, return_inverse=True, axis=0)

# len(v), range(0, len(v)). Maps v indices to p_unique indices
sort_permut = np.argsort(p_indices)
v = p_uniques[sort_permut]

# len(v), range(0, len(v)). Maps p_unique indices to v indices
inv_sort = np.empty_like(sort_permut)
inv_sort[sort_permut] = np.arange(len(inv_sort))

# len(p), range(0, len(v)). Maps p indices to v indices
indices = inv_sort[p_inverse]
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the explanation. Unfortunately, it does not work. I think I should have explained my problem better. v array is generated from p array by using: p_uniques, p_indices = np.unique( p, return_index=True, axis=0 ) v = p[np.sort(p_indices, axis=None)] Therefore, v is nothing but the unique values from the p array. Now, I want to generate an indices array that tells all the occurrences of elements of v in the p array.
@Ravi then I suggest you edit your question because I'm not answering a question that isn't asked ;-) But I can already tell you that it is literally just the return_inverse=True option, followed by some index mapping , maybe withargsort, to keep in sync with the sorting
I edited my question and added more clarification. return_inverse=True generates an array of indices to reconstruct the original array, in my case the p array. I tried argsort but it generates indices of the v array and does not include duplicate occurrences. For more clarification, you can visit this post but the solution mentioned takes a longer time and I want to achieve a faster solution.
@Ravi this should do it
you're awesome, man!!! It works like charm and significantly brings down the time consumed in computation. To give you the context, for some datasets the time has come down from 7 secs to 0.63 msec.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.