Improving performance by avoiding python for cycle when using counts of numpy unique

Question

I have two numpy arrays, A with shape (N,3) and B with shape (N,) and I generate from the vector A the vector with unique entries, e.g.:

A = np.array([[1.,2.,3.],
              [4.,5.,6.],
              [1.,2.,3.],
              [7.,8.,9.]])

B = np.array([10.,33.,15.,17.])

AUnique, directInd, inverseInd, counts = np.unique(A, 
                                             return_index = True, 
                                             return_inverse = True, 
                                             return_counts = True, 
                                             axis = 0)

So that AUnique will be array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

Then I obtain the simil-vector B associated to AUnique, and for each non-unique row in A I sum the associated values of B in this vector, that is:

BNew = B[directInd] 

# here BNew is [10., 33.,17]

for Id in np.asarray(counts>1).nonzero()[0]: 
  BNew[Id] = np.sum(B[inverseInd == Id])

# here BNew is [25., 33.,17]

The problem is that the for cycle gets extremely slow for large N vectors (millions or tens of millions rows), and I was wondering if there is a way to avoid cycling and/or to make the code much faster.

Thanks in advance!

Daniel F · Accepted Answer · 2020-01-14 13:40:38Z

1

I think you can do what you want with np.bincount

BNew = np.bincount(inverseInd, weights = B)
BNew

Out[]: array([25., 33., 17.])

answered Jan 14, 2020 at 13:40

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel F Over a year ago

make sure to mark as solved! there's a checkmark under the upvote button.

Collectives™ on Stack Overflow

Improving performance by avoiding python for cycle when using counts of numpy unique

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related