2

I am comparing two large CSVs with Pandas both containing contact information. I want to remove any rows from one CSV that contain any of the email addresses from the other CSV.

So if I had

DF1

name phone email
1    1     [email protected]
2    2     [email protected]
3    3     [email protected]

DF2

name phone email
x    y     [email protected]
a    b     [email protected]

I would be left with

DF3

name phone email
1    1     [email protected]

I don't care about any columns except the email addresses. This seems like it would be easy, but I'm really struggling with this one.

Here is what I have, but I don't think this is even close:

def remove_warm_list_duplicates(dataframe):
    '''Remove rows that have emails from the warmlist'''
    warm_list = pd.read_csv(r'warmlist/' + 'warmlist.csv'
                            , encoding="ISO-8859-1"
                            , error_bad_lines=False)
    warm_list_emails = warm_list['Email Address'].tolist()
    dataframe = dataframe[dataframe['Email Address'].isin(warm_list_emails) == False]

3 Answers 3

9

You can use pandas isin()

df3 = df1[~df1['email'].isin(df2['email'])]

Resulting df

    name    phone   email
0   1       1       [email protected]
Sign up to request clarification or add additional context in comments.

Comments

1

try this:

In [143]: pd.merge(df1, df2[['email']], on='email', how='left', indicator=True) \
            .query("_merge == 'left_only'") \
            .drop('_merge',1)
Out[143]:
   name  phone      email
0     1      1  [email protected]

1 Comment

This is the answer I went with, but I think several approaches work.
1

You could simplify a bit with unique() and sets:

warm_list = pd.read_csv(r'warmlist/' + 'warmlist.csv',
                        encoding="ISO-8859-1",
                        error_bad_lines=False)

warm_list_emails = set(warm_list['Email Address'].unique())
df = df.loc[df['Email Address'].isin(warm_list_emails), :]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.