6
[1, 1, 1, 0, 0, 0, 1, 1, 0, 0]

I have a NumPy array consisting of 0's and 1's like above. How can I add all consecutive 1's like below? Any time I encounter a 0, I reset.

[1, 2, 3, 0, 0, 0, 1, 2, 0, 0]

I can do this using a for loop, but is there a vectorized solution using NumPy?

2 Answers 2

9

Here's a vectorized approach -

def island_cumsum_vectorized(a):
    a_ext = np.concatenate(( [0], a, [0] ))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]

Sample run -

In [91]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [92]: island_cumsum_vectorized(a)
Out[92]: array([1, 2, 3, 0, 0, 0, 1, 2, 0, 0])

In [93]: a = np.array([0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1])

In [94]: island_cumsum_vectorized(a)
Out[94]: array([0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 1])

Runtime test

For the timings , I would use OP's sample input array and repeat/tile it and hopefully this should be a less opportunistic benchmark -

Small case :

In [16]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [17]: a = np.tile(a,10)  # Repeat OP's data 10 times

# @Paul Panzer's solution
In [18]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
10000 loops, best of 3: 73.4 µs per loop

In [19]: %timeit island_cumsum_vectorized(a)
100000 loops, best of 3: 8.65 µs per loop

Bigger case :

In [20]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [21]: a = np.tile(a,1000)  # Repeat OP's data 1000 times

# @Paul Panzer's solution
In [22]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
100 loops, best of 3: 6.52 ms per loop

In [23]: %timeit island_cumsum_vectorized(a)
10000 loops, best of 3: 49.7 µs per loop

Nah, I want really huge case :

In [24]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [25]: a = np.tile(a,100000)  # Repeat OP's data 100000 times

# @Paul Panzer's solution
In [26]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
1 loops, best of 3: 725 ms per loop

In [27]: %timeit island_cumsum_vectorized(a)
100 loops, best of 3: 7.28 ms per loop
Sign up to request clarification or add additional context in comments.

20 Comments

benchmark these: a1 = np.repeat(np.arange(1000)%2, np.random.randint(1, 1000, (1000,))); a2 = np.repeat(np.arange(100)%2, np.random.randint(1, 10000, (100,))); a3 = np.repeat(np.random.random((100,)) < 0.2, np.random.randint(1, 10000, (100,))) ;-)
@PaulPanzer I decided to run all of these benchmarks myself on your suggested inputs. Divakar's implementation is still faster :P... by about 2x.
@rayreng Also on the last one? But I admit his is better; I'm just objecting to opportunistic benchmarks ;-)
@PaulPanzer :D. I've already upvoted yours as it's very nice and in one line. The last one I just ran... it's just beaten by a slight margin.
@rayryeng Strange, on my laptop (100 repeats) Divakar:0.47050797403790057;0.26334572909399867;0.3771821779664606 Paul Panzer:0.6545195570215583;0.4508163968566805;0.17192376195453107
|
2

If a list comprehension is acceptable

np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.