Split numpy array into segments where condition is met

Question

I have an array like so:

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])

I want to get rid of all negative values, and split the array at each index where there was a negative value. The result should look like this:

split_list = [[1, 2, 3, 4], [3, 5, 1], [5], [10]]

I know how to do this using list comprehension, but since the array can get quite large and I have to do the calculation many times, I want to find a solution using numpy. I found this https://www.geeksforgeeks.org/python-split-list-into-lists-by-particular-value/, which I can use to split the array where there are negative values, but I can't simultaneously remove them.

For the linked solution, couldn't you change the condition to if len(sublist) > 0 and sublist[0] > 0? — B Remmelzwaal
– B Remmelzwaal, Commented Sep 5, 2023 at 17:26
That removes the subarrays that start with a negative value, but this could remove subarrays that include positive numbers, and also does not remove subarrays that end with negative values. I don't think it quite solves the problem. — Alex V.
– Alex V., Commented Sep 5, 2023 at 17:34

Swifty · Accepted Answer · 2023-09-05 21:42:22Z

6

Note that instead of numpy, you could make use of itertools.groupby this way (though, judging on this: NumPy grouping using itertools.groupby performance, pure numpy will likely be more efficient):

import numpy as np
from itertools import groupby

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])
split_list = [list(group) for key, group in groupby(arr, key=lambda x:x>=0) if key]

# [[1, 2, 3, 4], [3, 5, 1], [5], [10]]

edited Sep 5, 2023 at 21:42

answered Sep 5, 2023 at 17:36

Swifty

3,4642 gold badges6 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vitalizzare Over a year ago

Looks nice, I like it. Any thoughts about the performance of groupby on big data?

Swifty Over a year ago

I'm not really sure; you might want to read the answers there for insight: stackoverflow.com/questions/4651683/…

Vitalizzare · Accepted Answer · 2023-09-05 21:01:01Z

2

Get indexes where a sign is changing, use them as bins to split, choose each second array of the result:

start = 0 if arr[0] >= 0 else 1
np.split(arr, np.arange(1, len(arr))[np.diff(arr < 0)])[start::2]

Note, that numpy.diff on boolean data is applying XOR on neighboring elements, as in the example below:

data = np.array([1,1,0,0,0,1,1], dtype=bool)
assert all(np.diff(data) == (data[1:] ^ data[:-1]))

edited Sep 5, 2023 at 21:01

answered Sep 5, 2023 at 19:53

Vitalizzare

7,63611 gold badges23 silver badges49 bronze badges

Comments

mozway · Accepted Answer · 2023-09-05 17:38:04Z

1

For a pure numpy approach:

m = arr<0
np.split(arr[~m], np.unique((np.arange(m.shape[0])-np.cumsum(m))[m])+1)

Or with a loop:

m = arr<0
m2 = m & ~np.r_[False, m[:-1]]

out = [a[a>0] for a in np.split(arr, np.nonzero(m2)[0])]

Output:

[array([1, 2, 3, 4]), array([3, 5, 1]), array([5]), array([10])]

Intermediates (first approach):

np.unique((np.arange(m.shape[0])-np.cumsum(m))[m])+1
# array([4, 7, 8])

arr[~m]
# array([ 1,  2,  3,  4,  3,  5,  1,  5, 10])

answered Sep 5, 2023 at 17:38

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

JRiggles · Accepted Answer · 2023-09-05 17:45:26Z

Here's an approach that leans on the linked example. There is a possible concern re: best practices using filter here to remove the empty lists caused by the comprehension...ultimately there are better answers here already, but I wanted to add this for the sake of completeness

import numpy as np

# input array
arr = np.array([1, 2, 3, 4, -5, -6, 3, 2, 1, -2, 5, -1, -1, 10])
# get indices of negative values
idx = np.where(arr < 0)[0]
# split the input array at that index
subarrays = np.split(arr, idx)
# build the final list, removing all negative values
# ('filter' is used to remove the empty lists caused by 'sub[sub > 0]')
result = list(filter(None, [sub[sub > 0].tolist() for sub in subarrays]))

print(result)
# => [[1, 2, 3, 4], [3, 2, 1], [5], [10]]

PaulS · Accepted Answer · 2023-09-05 21:02:19Z

1

Another possible solution:

[x[mask] for x in np.split(arr, np.where(arr < 0)[0]) if (mask := x >= 0).any()]

Output:

[array([1, 2, 3, 4]), array([3, 5, 1]), array([5]), array([10])]

edited Sep 5, 2023 at 21:02

answered Sep 5, 2023 at 20:57

PaulS

27.1k3 gold badges18 silver badges40 bronze badges

1 Comment

jared Over a year ago

Note to future readers: this requires Python 3.8+ since it makes use of the walrus operator, :=.

desertnaut · Accepted Answer · 2023-09-05 22:22:21Z

0

import numpy as np

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])

# Find the indices where the array is negative
neg_indices = np.where(arr < 0)[0]

# Split the array at the indices where it is negative
split_arr = np.split(arr, neg_indices)

# Remove the negative values from each subarray
split_list = [subarr[subarr >= 0] for subarr in split_arr]

# Convert the subarrays to lists and remove any empty lists
split_list = [subarr.tolist() for subarr in split_list if len(subarr) > 0]

print(split_list)

edited Sep 5, 2023 at 22:22

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

answered Sep 5, 2023 at 17:48

Mayar Eskafi

12 bronze badges

1 Comment

Diego Borba Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

Split numpy array into segments where condition is met

6 Answers 6

2 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related