1

I have two numpy arrays, A with shape (N,3) and B with shape (N,) and I generate from the vector A the vector with unique entries, e.g.:

A = np.array([[1.,2.,3.],
              [4.,5.,6.],
              [1.,2.,3.],
              [7.,8.,9.]])

B = np.array([10.,33.,15.,17.])

AUnique, directInd, inverseInd, counts = np.unique(A, 
                                             return_index = True, 
                                             return_inverse = True, 
                                             return_counts = True, 
                                             axis = 0)

So that AUnique will be array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

Then I obtain the simil-vector B associated to AUnique, and for each non-unique row in A I sum the associated values of B in this vector, that is:

BNew = B[directInd] 

# here BNew is [10., 33.,17]

for Id in np.asarray(counts>1).nonzero()[0]: 
  BNew[Id] = np.sum(B[inverseInd == Id])

# here BNew is [25., 33.,17]

The problem is that the for cycle gets extremely slow for large N vectors (millions or tens of millions rows), and I was wondering if there is a way to avoid cycling and/or to make the code much faster.

Thanks in advance!

1 Answer 1

1

I think you can do what you want with np.bincount

BNew = np.bincount(inverseInd, weights = B)
BNew

Out[]: array([25., 33., 17.])
Sign up to request clarification or add additional context in comments.

1 Comment

make sure to mark as solved! there's a checkmark under the upvote button.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.