0

I have three data frames:

Frame1:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)
ip

Frame2:

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)
cusipmap

Frame3:

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)
isinmap

I want to add a column to frame 1 , keeping the number of rows in Frame 1 constant ( left join) by first joining on CUSIP on Frame 2 and Then with ISIN on Frame 3. HOWEVER, I want to do the ISIN join on frame 3 only if there is a NaN produced as part of the Frame 2 join. SO, the output that I am looking for is something like:

op= {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN],'ID':['a1','b1','c1']}
op = pd.DataFrame(data=op)
op

Does the pd.merge have any functionality to perform the operations above ?

2 Answers 2

1

You can create new Series with Series.map and then replace missing values by another Series with Series.fillna or Series.combine_first:

s1 = ip['instrument'].map(cusipmap.set_index('CUSIP')['ID'])
s2 = ip['instrument'].map(isinmap.set_index('ISIN')['ID'])
ip['Id'] = s1.fillna(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1

Solution with merge:

s1 = ip.merge(cusipmap.rename(columns={'CUSIP':'instrument'}),on='instrument',how='left')['ID']
s2 = ip.merge(isinmap.rename(columns={'ISIN':'instrument'}),on='instrument',how='left')['ID']

ip['Id'] = s1.combine_first(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1
Sign up to request clarification or add additional context in comments.

2 Comments

thanks a lot ! what about if I had to do more than two joins ?
@NumberLogic - Then use ip['Id'] = s1.fillna(s2).fillna(s3) - s3 is coun similar like s1, s2
0

The way I would do it would be to merge both dataframes and then build the desired column using .loc. Example below:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)


df = (ip
      .merge(cusipmap.rename(columns={"CUSIP": "instrument"}), how="left", on="instrument")
      .merge(isinmap.rename(columns={"ISIN": "instrument"}), how="left", on="instrument", suffixes=["CUSIP", "ISIN"]))

df["ID"] = df.IDCUSIP
df.loc[df.ID.isnull(), "ID"] = df.loc[df.ID.isnull(), "IDISIN"]
df = df.drop(["IDCUSIP", "IDISIN"], axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.