How to remove rows from a dataframe based on conditions in python?

Question

I've been trying to select rows that meet 2 conditions in my dataset, then randomly remove 25% of those rows from my total dataset. I've been doing this with trying to piece together code from similar questions on here but I don't have good python knowledge and can't figure out where I'm going wrong.

I've tried 2 ways:

#Store rows meeting conditions in a variable
test = dataset[(dataset['betamax'].isnull()) & (dataset['label'] == "probable")]

#Only select 75% of them in a new variable
test2 = test.sample(frac=.75)

#Remove any matches from test2 in my total dataset
test3 = dataset[~dataset.isin(test2)].dropna()

test2 is 146 rows by 84 columns and dataset is 750 rows by 84 columns. When I create test3 it is 0 rows by 84 columns - why does this happen?

I've also tried to remove the selection of rows by:

cond = dataset['Gene'].isin(test2['Gene']) #Gene is my only unique column per row
test4 = dataset.drop(dataset[cond].index, inplace = True)

TypeError: 'NoneType' object is not subscriptable

Unfortunately I can't give example data, but if I have 2 variables - one where I've subset random rows based on conditions and one which is my total data, how do I remove the subset from my total dataset?

what's the size of the test df? Also the dropna might drop more rows that what you want. You should specify the how or at least the columns subset. see -> pandas.pydata.org/pandas-docs/stable/reference/api/… — el_bobo
– el_bobo, Commented Nov 20, 2020 at 10:33
test df is 195 rows, and thank you for this I'll check it out — DN1
– DN1, Commented Nov 20, 2020 at 10:34

jezrael · Accepted Answer · 2020-11-20 10:29:33Z

2

In your solution remove inplace = True, because it return None, so cannot assign to new variable test4:

test4 = dataset.drop(dataset[cond].index)

Better is invert mask by ~ for values not exist in test2['Gene']:

cond = dataset['Gene'].isin(test2['Gene'])

test4 = dataset[~cond]

answered Nov 20, 2020 at 10:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rftr · Accepted Answer · 2020-11-20 10:38:45Z

1

In your first solution you can use index:

#Remove any matches from test2 in my total dataset
test3 = dataset[~dataset.index.isin(test2.index)].dropna()

answered Nov 20, 2020 at 10:38

rftr

1,2752 gold badges16 silver badges22 bronze badges

Collectives™ on Stack Overflow

How to remove rows from a dataframe based on conditions in python?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related