0

Right now, I am trying to create a function that removes rows based on certain criteria that are outlined within an excel file. This excel file (bad words2) contains word pairs that should be removed from within the DF and looks like this:

header
the man
is a

The second part of my code is function I am trying to apply

import pandas as pd
data = ({'words':['the man','is a','good guy']})
df = pd.DataFrame(data)

xl = pd.ExcelFile('C:/Users/j/Desktop/bad words2.xlsx')
badwords = xl.parse()
badwords = badwords['header']

def removewords(x):
    for w in x:
        pattern = '^'+''.join('(?=.*{})'.format(word) for word in w.split())
        df[df['words'].str.contains(pattern)==False]
        df.dropna()


 print(removewords(badwords))

So ideally, at the end of applying this function, I should end up with a DF that contains only:

 words
 good guy

However, right now, all that this function returns is 'None'. What am I doing wrong?

1
  • Sorry are you just wanting to filter out words that are in the pattern or filter out words that are not in the pattern? anyway unclear what you are trying to do but this line does nothing without an assignment: df[df['words'].str.contains(pattern)==False] should be df = df[df['words'].str.contains(pattern)==False] Commented Sep 30, 2014 at 20:04

1 Answer 1

1

Some thoughts:

  1. The final two operations return a new DataFrame. I.e. they do not modify the DataFrame in-place. You need to assign the result of those operations to something, e.g. df.
  2. And then if you do the above, the variable df is not available for assignment within the scope of the function. You can pass it in as an argument. (Please note: This is not a problem with your code, but with the proposed solution.) Alternatively you could instantiate a new DataFrame within the function.
  3. You are not returning the DataFrame at the end of your function.

Try instead:

def removewords(df,x):
    for w in x:
        pattern = '^'+''.join('(?=.*{})'.format(word) for word in w.split())
        df = df[df['words'].str.contains(pattern)==False]
        df = df.dropna()
    return df

print(removewords(df,badwords))
Sign up to request clarification or add additional context in comments.

3 Comments

Hi Bernie thanks for the input. However, when I try to assign any type of variable to df (i.e. df = ), I get the error UnboundLocalError: local variable 'df' referenced before assignment
So problem solved, second problem springs up -- the changes being made don't "stack" within the for loop. So using the code above, the return output is "is a / good guy". I was hoping to remove all word pairs that appeared in my excel file so that my final return output was JUST 'Good Guy'
I made a mistake in the indentation of the return statement. Please see edited code which now does what you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.