0

I would like to take an array of numpy.bytes_ and compute a grouped sum with another array of weights. So given

x = np.array(['X', 'Y', 'X', 'Z'], dtype=np.bytes_)
w = np.array([0, 2, 1, 1])

I would like to get something like

{'X': 1, 'Y': 2, 'Z': 1}

or if numpy has named arrays that would work. The np.bytes_ array is guaranteed to be single bytes representing ASCII characters and the weights array can be integers or floats.

I've tried using some solutions here but I keep getting an exception

numpy.core._exceptions._UFuncNoLoopError: ufunc 'minimum' did not contain a loop with signature matching types (dtype('S1'), dtype('S1')) -> None

I am not familiar with numpy so I don't know what that exception means or if I should even be trying to do this with numpy. It seems like it would be pretty simple in pandas, but I thought numpy would be more efficient and I didn't want to use pandas just for this.

2 Answers 2

3

You can use np.unique(return_inverse=True) and np.bincount(weight=...) to do this, however I don't know if this is faster than pandas.

import numpy as np
a = np.array(['X', 'Y', 'X', 'Z', 'Z', 'Y'], dtype=np.bytes_)
w = np.array([0, 2, 1, 1, 10, 100])

names, groups = np.unique(a, return_inverse=True)
counts = np.bincount(groups, weights=w)
result = dict(zip(names, counts))
print(result)
Sign up to request clarification or add additional context in comments.

Comments

0

Use a dictionary comprehension

You can use if to exclude items with a w value of 0.

import numpy as np

x = np.array(['X', 'Y', 'X', 'Z'], dtype=np.bytes_)
w = np.array([0, 2, 1, 1])

out = {k:v for k,v in zip(x,w) if v>0}
print(out)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.