1

I've been trying to select rows that meet 2 conditions in my dataset, then randomly remove 25% of those rows from my total dataset. I've been doing this with trying to piece together code from similar questions on here but I don't have good python knowledge and can't figure out where I'm going wrong.

I've tried 2 ways:

#Store rows meeting conditions in a variable
test = dataset[(dataset['betamax'].isnull()) & (dataset['label'] == "probable")]

#Only select 75% of them in a new variable
test2 = test.sample(frac=.75)

#Remove any matches from test2 in my total dataset
test3 = dataset[~dataset.isin(test2)].dropna()

test2 is 146 rows by 84 columns and dataset is 750 rows by 84 columns. When I create test3 it is 0 rows by 84 columns - why does this happen?

I've also tried to remove the selection of rows by:

cond = dataset['Gene'].isin(test2['Gene']) #Gene is my only unique column per row
test4 = dataset.drop(dataset[cond].index, inplace = True)

TypeError: 'NoneType' object is not subscriptable

Unfortunately I can't give example data, but if I have 2 variables - one where I've subset random rows based on conditions and one which is my total data, how do I remove the subset from my total dataset?

2
  • what's the size of the test df? Also the dropna might drop more rows that what you want. You should specify the how or at least the columns subset. see -> pandas.pydata.org/pandas-docs/stable/reference/api/… Commented Nov 20, 2020 at 10:33
  • test df is 195 rows, and thank you for this I'll check it out Commented Nov 20, 2020 at 10:34

2 Answers 2

2

In your solution remove inplace = True, because it return None, so cannot assign to new variable test4:

test4 = dataset.drop(dataset[cond].index)

Better is invert mask by ~ for values not exist in test2['Gene']:

cond = dataset['Gene'].isin(test2['Gene'])

test4 = dataset[~cond]
Sign up to request clarification or add additional context in comments.

Comments

1

In your first solution you can use index:

#Remove any matches from test2 in my total dataset
test3 = dataset[~dataset.index.isin(test2.index)].dropna()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.