0

I have 2 dataframes for which I want to do a comparison. Please find the information below and appreciate any help on this.

df 1 shows the relationship between the IDs

df1 = 

IDA   IDB   Relationship
A100  A200   Parent
A200  A500   Spouse
A111  A112   Child
A112  A111   Parent

df2 contains a list of IDs that I will check against df1 if there's any form of relationship between the ID's Party 1 and Party 2 and map the relationship over (the first instance in the event there's multiple relationships)

df2 = 

Sender      Receiver
[A900,A200] [A500,A220]
[A150,A100] [A400]
[A400,A112] [A500]
[A700,A112] [A111,A001]

Here's my expected output with explanations

Output =
 
Sender      Receiver     Relationship
[A900,A200] [A500,A220]  Spouse         #A200 and A500 
[A150,A100] [A400]       NAN            #No match
[A400,A112] [A500]       NAN            #No match
[A700,A112] [A111,A001]  Parent         #A112 and A111

2 Answers 2

1

You could extract the information into native python data structures and then merge it back with your original DataFrames -

To do this - I would first make pairs out of the Sender and Receiver columns in df2 -

def make_pairs(row):
    senders = row['Sender'].replace("[", "").replace("]", "").split(",")
    receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
    pairs = [(s, r) for s in senders for r in receivers]
    return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()

Then map the combination of IDA and IDB from df1 into a dictionary:

rels = {(ida, idb): rel for ida, idb, rel in df1.values}

A dict comprehension (or even a simple for loop) can then be used to subset values of interest

rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}

And finally, we can merge this dict with df2 -

df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
    Sender     Receiver relationship
#0  [A900,A200]  [A500,A220]       Spouse
#1  [A150,A100]       [A400]          NaN
#2  [A400,A112]       [A500]          NaN
#3  [A700,A112]  [A111,A001]       Parent
Sign up to request clarification or add additional context in comments.

Comments

0

I wasn't able to test it as you did not provide a data sample but something like that should work:

Output = df2.copy()
detected_relations = []

for transaction in df2.iterrows:
    Receiver = transaction.Receiver
    Sender = transaction.Sender
    
    df = df1[(df1.IDA.isin(Sender) & df1.IDB.isin(Receiver)) | (df1.IDB.isin(Sender) & df1.IDA.isin(Receiver))]
    
    detected_relations = detected_relations + df.Relationship
    
Outpout["Relationship"] = detected_relations

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.