- Use pandas Boolean Indexing the determine if
'Cust_id' of parent, is in 'Cust_id' of 'child'.
- Use
.isin on a list of unique 'Cust_id' from 'child'.
- Indexing with isin
child.Cust_id.unique() creates an array of all the unique values in 'Cust_id'
import pandas as pd
child = pd.DataFrame({'Cust_id': [1, 34, 45], 'Description': ['Good', 'Excellent', 'Bulk'], 'Detail': ['Regular', 'Normal', 'Buyer']})
parent = pd.DataFrame({'Name': ['xyz', 'abc', 'mno', 'pqr', 'rst', 'ert'], 'Cust_id': [1, 45, 56, 67, 34, 1], 'order': ['ice', 'bread', 'Butter', 'cookies', 'Rice', 'egg'],
'date': ['01-02-2019', '01-02-2019', '01-02-2019', '01-02-2019', '01-02-2019', '01-02-2019'], 'Payment': ['online', 'offline', 'offline', 'online', 'online', 'online']})
# mask using isin
mask = parent.Cust_id.isin(child.Cust_id.unique())
# return only the data from parent, where parent Cust_id isin child Cust_id
parent[mask]
# add a column to the parent dataframe
parent['in_child'] = mask
# display(parent)
Name Cust_id order date Payment in_child
0 xyz 1 ice 01-02-2019 online True
1 abc 45 bread 01-02-2019 offline True
2 mno 56 Butter 01-02-2019 offline False
3 pqr 67 cookies 01-02-2019 online False
4 rst 34 Rice 01-02-2019 online True
5 ert 1 egg 01-02-2019 online True
pandas.DataFrame.merge can be used in various ways as well.
- The following solution uses an
'outer' merge with indicator=True
- The
'_merge' column indicates which dataframe the 'Cust_id' is in.
'left_only' is the parent dataframe.
.merge combines the information from both dataframes, and I'm not sure if that's the desired output.
merged = parent.merge(child, on='Cust_id', how='outer', indicator=True)
# display(merged)
Name Cust_id order date Payment Description Detail _merge
0 xyz 1 ice 01-02-2019 online Good Regular both
1 ert 1 egg 01-02-2019 online Good Regular both
2 abc 45 bread 01-02-2019 offline Bulk Buyer both
3 mno 56 Butter 01-02-2019 offline NaN NaN left_only
4 pqr 67 cookies 01-02-2019 online NaN NaN left_only
5 rst 34 Rice 01-02-2019 online Excellent Normal both