1

So I have this array, right?

a=np.zeros(5)

I want to add values to it at the given indices, where indices can be duplicates.

e.g.

a[[1, 2, 2]] += [1, 2, 3]

I want this to produce array([ 0., 1., 5., 0., 0.]), but the answer I get is array([ 0., 1., 3., 0., 0.]).

I'd like this to work with multidimensional arrays and broadcastable indices and all that. Any ideas?

2 Answers 2

3

You need to use np.add.at to get around the buffering issue that you encounter with += (values are not accumulated at repeated indices). Specify the array, the indices, and the values to add in place at those indices:

>>> a = np.zeros(5)
>>> np.add.at(a, [1, 2, 2], [1, 2, 3])
>>> a
array([ 0.,  1.,  5.,  0.,  0.])

at is part of other ufuncs too (multiply, divide, and so on). This method will also work for multidimensional arrays.

Sign up to request clarification or add additional context in comments.

Comments

1

The operation you are performing can be looked at as binning, and to be technically more specific, you are doing weighted bining with those values being the weights and the indices being the bins. For such a binning operation, you can use np.bincount.

Here's the implementation -

import numpy as np

a=np.zeros(5)        # initialize output array

idx  = [1, 2, 2]     # indices
vals = [1, 2, 3]     # values

a[:max(idx)+1] = np.bincount(idx,vals) # finally store the bincounts

Runtime tests

Here are some runtime tests for two sets of input datasizes comparing the proposed bincount based approach and the add.at based approach listed in the other answer:

Datasize #1 -

In [251]: a=np.zeros(1000)
     ...: idx = np.sort(np.random.randint(1,1000,(500))).tolist()
     ...: vals = np.random.rand(500).tolist()
     ...: 

In [252]: %timeit np.add.at(a, idx, vals)
10000 loops, best of 3: 63.4 µs per loop

In [253]: %timeit a[:max(idx)+1] = np.bincount(idx,vals)
10000 loops, best of 3: 42.4 µs per loop

Datasize #2 -

In [254]: a=np.zeros(10000)
     ...: idx = np.sort(np.random.randint(1,10000,(5000))).tolist()
     ...: vals = np.random.rand(5000).tolist()
     ...: 

In [255]: %timeit np.add.at(a, idx, vals)
1000 loops, best of 3: 597 µs per loop

In [256]: %timeit a[:max(idx)+1] = np.bincount(idx,vals)
1000 loops, best of 3: 404 µs per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.