26

I want to delete rows when a few conditions are met:

An example dataframe is shown below:

        one       two     three      four
0 -0.225730 -1.376075  0.187749  0.763307
1  0.031392  0.752496 -1.504769 -1.247581
2 -0.442992 -0.323782 -0.710859 -0.502574
3 -0.948055 -0.224910 -1.337001  3.328741
4  1.879985 -0.968238  1.229118 -1.044477
5  0.440025 -0.809856 -0.336522  0.787792
6  1.499040  0.195022  0.387194  0.952725
7 -0.923592 -1.394025 -0.623201 -0.738013
8 -1.775043 -1.279997  0.194206 -1.176260
9 -0.602815  1.183396 -2.712422 -0.377118

I want to delete rows based on the conditions that:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

Then I tried to implement as follows:

df = df[df.one > 0 or df.two > 0 or df.three > 0 and df.four < 1]

However, it results in a error message as follows:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Could someone help me on how to delete based on multiple conditions?

0

2 Answers 2

54

For reasons that aren't 100% clear to me, pandas plays nice with the bitwise logical operators | and &, but not the boolean ones or and and.

Try this instead:

df = df[(df.one > 0) | (df.two > 0) | (df.three > 0) & (df.four < 1)]
Sign up to request clarification or add additional context in comments.

4 Comments

You want df = df[((df.one > 0) | (df.two > 0) | (df.three > 0)) & (df.four < 1)] as to why it's because it's ambiguous to compare arrays as there are potentially multiple matches see this: stackoverflow.com/questions/10062954/…
Oh, whoops, didn't see the and at the end. Edited.
@Brionius: it's basically because or and and can't have their behaviour customized by a class. They do what they do based on the result of bool(the_object), and that's it.
To delete, say, any row with a string that contains 1 of 20 possible subkeys, look here
2

drop could be used to drop rows

The most obvious way is to constructing a boolean mask given the condition, filter the index by it to get an array of indices to drop and drop these indices using drop(). If the condition is:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

then the following works.

msk = (df['one'].gt(0) | df['two'].gt(0) | df['three'].gt(0)) & df['four'].lt(0)
idx_to_drop = df.index[msk]
df1 = df.drop(idx_to_drop)

The first part of the condition, i.e. col 'one', 'two', or 'three' greater than 0 can be written a little concisely with .any(axis=1):

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0)

Keep the complement of the rows to drop

Deleting/removing/dropping rows is the inverse of keeping rows. So another way to do this task is to negate (~) the boolean mask for dropping rows and filter the dataframe by it.

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0)
df1 = df[~msk]

query() the rows to keep

pd.DataFrame.query() is a pretty readable API for filtering rows to keep. It also "understands" and/or etc. So the following works.

# negate the condition to drop
df1 = df.query("not ((one > 0 or two > 0 or three > 0) and four < 0)")

# the same condition transformed using de Morgan's laws
df1 = df.query("one <= 0 and two <= 0 and three <= 0 or four >= 0")

All of the above perform the following transformation:

result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.