I've been trying to select rows that meet 2 conditions in my dataset, then randomly remove 25% of those rows from my total dataset. I've been doing this with trying to piece together code from similar questions on here but I don't have good python knowledge and can't figure out where I'm going wrong.
I've tried 2 ways:
#Store rows meeting conditions in a variable
test = dataset[(dataset['betamax'].isnull()) & (dataset['label'] == "probable")]
#Only select 75% of them in a new variable
test2 = test.sample(frac=.75)
#Remove any matches from test2 in my total dataset
test3 = dataset[~dataset.isin(test2)].dropna()
test2 is 146 rows by 84 columns and dataset is 750 rows by 84 columns. When I create test3 it is 0 rows by 84 columns - why does this happen?
I've also tried to remove the selection of rows by:
cond = dataset['Gene'].isin(test2['Gene']) #Gene is my only unique column per row
test4 = dataset.drop(dataset[cond].index, inplace = True)
TypeError: 'NoneType' object is not subscriptable
Unfortunately I can't give example data, but if I have 2 variables - one where I've subset random rows based on conditions and one which is my total data, how do I remove the subset from my total dataset?