Right now, I am trying to create a function that removes rows based on certain criteria that are outlined within an excel file. This excel file (bad words2) contains word pairs that should be removed from within the DF and looks like this:
header
the man
is a
The second part of my code is function I am trying to apply
import pandas as pd
data = ({'words':['the man','is a','good guy']})
df = pd.DataFrame(data)
xl = pd.ExcelFile('C:/Users/j/Desktop/bad words2.xlsx')
badwords = xl.parse()
badwords = badwords['header']
def removewords(x):
for w in x:
pattern = '^'+''.join('(?=.*{})'.format(word) for word in w.split())
df[df['words'].str.contains(pattern)==False]
df.dropna()
print(removewords(badwords))
So ideally, at the end of applying this function, I should end up with a DF that contains only:
words
good guy
However, right now, all that this function returns is 'None'. What am I doing wrong?
df[df['words'].str.contains(pattern)==False]should bedf = df[df['words'].str.contains(pattern)==False]