1

I have a 2d array that takes this kind of form:

[5643, 22, 0.67, [1.00, 0.05, -0.044....]]
[6733, 12, -0.44, [0.00, 1.00, -0.08...]] 

so it has dimensions ~13k x 4 but the 4th column of every row is itself an array

what I’d like to do is subset this array such that I only keep the rows for which the yth element of the 4th column is greater than 0

my current approach has been this:

mask = [x[y] > 0 for x in array[:,3]]

new_array = array[mask]

Is there a faster way to do this?

2
  • 3
    You could attempt to utilize the filter method. Commented Jul 25, 2020 at 18:34
  • What is the expected output? Commented Jul 26, 2020 at 0:24

3 Answers 3

1

Try this:

y = 1

[i for i in filter(lambda x: x[3][y] > 0, a)]
Sign up to request clarification or add additional context in comments.

Comments

0

Use the if clause of a list comprehension

new_array = [r for r in array if r[3][y] > 0]

Comments

0

The fastest way to do this is to not pack arrays in other arrays. This causes many issues, including not being able to use the shape attribute of numpy arrays effectively.

So, first split your data into two arrays, one of which has 13k rows, and 3 columns and the other one which also has 13k rows, and the columns of which depends on the dimensionality of the embedded array. Call these X and Y.

You can then do the following:

# Split the arrays
X, Y = array[:, :3], array[:, 3]
Y = np.asarray(Y)

mask = Y[:, y] > 0
X = X[mask]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.