2

I have an array like so:

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])

I want to get rid of all negative values, and split the array at each index where there was a negative value. The result should look like this:

split_list = [[1, 2, 3, 4], [3, 5, 1], [5], [10]]

I know how to do this using list comprehension, but since the array can get quite large and I have to do the calculation many times, I want to find a solution using numpy. I found this https://www.geeksforgeeks.org/python-split-list-into-lists-by-particular-value/, which I can use to split the array where there are negative values, but I can't simultaneously remove them.

2
  • For the linked solution, couldn't you change the condition to if len(sublist) > 0 and sublist[0] > 0? Commented Sep 5, 2023 at 17:26
  • That removes the subarrays that start with a negative value, but this could remove subarrays that include positive numbers, and also does not remove subarrays that end with negative values. I don't think it quite solves the problem. Commented Sep 5, 2023 at 17:34

6 Answers 6

6

Note that instead of numpy, you could make use of itertools.groupby this way (though, judging on this: NumPy grouping using itertools.groupby performance, pure numpy will likely be more efficient):

import numpy as np
from itertools import groupby

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])
split_list = [list(group) for key, group in groupby(arr, key=lambda x:x>=0) if key]

# [[1, 2, 3, 4], [3, 5, 1], [5], [10]]
Sign up to request clarification or add additional context in comments.

2 Comments

Looks nice, I like it. Any thoughts about the performance of groupby on big data?
I'm not really sure; you might want to read the answers there for insight: stackoverflow.com/questions/4651683/…
2

Get indexes where a sign is changing, use them as bins to split, choose each second array of the result:

start = 0 if arr[0] >= 0 else 1
np.split(arr, np.arange(1, len(arr))[np.diff(arr < 0)])[start::2]

Note, that numpy.diff on boolean data is applying XOR on neighboring elements, as in the example below:

data = np.array([1,1,0,0,0,1,1], dtype=bool)
assert all(np.diff(data) == (data[1:] ^ data[:-1]))

Comments

1

For a pure numpy approach:

m = arr<0
np.split(arr[~m], np.unique((np.arange(m.shape[0])-np.cumsum(m))[m])+1)

Or with a loop:

m = arr<0
m2 = m & ~np.r_[False, m[:-1]]

out = [a[a>0] for a in np.split(arr, np.nonzero(m2)[0])]

Output:

[array([1, 2, 3, 4]), array([3, 5, 1]), array([5]), array([10])]

Intermediates (first approach):

np.unique((np.arange(m.shape[0])-np.cumsum(m))[m])+1
# array([4, 7, 8])

arr[~m]
# array([ 1,  2,  3,  4,  3,  5,  1,  5, 10])

Comments

1

Here's an approach that leans on the linked example. There is a possible concern re: best practices using filter here to remove the empty lists caused by the comprehension...ultimately there are better answers here already, but I wanted to add this for the sake of completeness

import numpy as np

# input array
arr = np.array([1, 2, 3, 4, -5, -6, 3, 2, 1, -2, 5, -1, -1, 10])
# get indices of negative values
idx = np.where(arr < 0)[0]
# split the input array at that index
subarrays = np.split(arr, idx)
# build the final list, removing all negative values
# ('filter' is used to remove the empty lists caused by 'sub[sub > 0]')
result = list(filter(None, [sub[sub > 0].tolist() for sub in subarrays]))

print(result)
# => [[1, 2, 3, 4], [3, 2, 1], [5], [10]]

Comments

1

Another possible solution:

[x[mask] for x in np.split(arr, np.where(arr < 0)[0]) if (mask := x >= 0).any()]

Output:

[array([1, 2, 3, 4]), array([3, 5, 1]), array([5]), array([10])]

1 Comment

Note to future readers: this requires Python 3.8+ since it makes use of the walrus operator, :=.
0
import numpy as np

arr = np.array([1, 2, 3, 4, -5, -6, 3, 5, 1, -2, 5, -1, -1, 10])

# Find the indices where the array is negative
neg_indices = np.where(arr < 0)[0]

# Split the array at the indices where it is negative
split_arr = np.split(arr, neg_indices)

# Remove the negative values from each subarray
split_list = [subarr[subarr >= 0] for subarr in split_arr]

# Convert the subarrays to lists and remove any empty lists
split_list = [subarr.tolist() for subarr in split_list if len(subarr) > 0]

print(split_list)

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.