I am comparing two large CSVs with Pandas both containing contact information. I want to remove any rows from one CSV that contain any of the email addresses from the other CSV.
So if I had
DF1
name phone email
1 1 [email protected]
2 2 [email protected]
3 3 [email protected]
DF2
name phone email
x y [email protected]
a b [email protected]
I would be left with
DF3
name phone email
1 1 [email protected]
I don't care about any columns except the email addresses. This seems like it would be easy, but I'm really struggling with this one.
Here is what I have, but I don't think this is even close:
def remove_warm_list_duplicates(dataframe):
'''Remove rows that have emails from the warmlist'''
warm_list = pd.read_csv(r'warmlist/' + 'warmlist.csv'
, encoding="ISO-8859-1"
, error_bad_lines=False)
warm_list_emails = warm_list['Email Address'].tolist()
dataframe = dataframe[dataframe['Email Address'].isin(warm_list_emails) == False]