1

Let's say I have 2 DataFrames

df1 = pd.DataFrame({'name': ['Jack', 'Lucy', 'Mark'], 'age': [1, 2, 3]})
df2 = pd.DataFrame({'name': ['Jack', 'Mark'], 'age': [10, 11], 'address': ['addr1', 'addr2']})

What operation should I use to make df1 become

name    age    address
--------------------
Jack    10     addr1
Lucy    2      NaN
Mark    11     addr2

3 Answers 3

2

You could merge both df and then replace missing values :

df_out = df1.merge(df2,on=['name'],how='left')
df_out['age'] =  df_out.apply(lambda x : x['age_y'] if x['age_y']>0 else x['age_x'],axis = 1)
df_out[['name','age','address']]

Output

| name   |   age | address   |
|:-------|------:|:----------|
| Jack   |    10 | addr1     |
| Lucy   |     2 | nan       |
| Mark   |    11 | addr2     |
Sign up to request clarification or add additional context in comments.

Comments

1

Use DataFrame.combine_first by name columns converted to index in both DataFrames:

df1 = df1.set_index('name') 
df2 = df2.set_index('name')

df1 = df2.combine_first(df1).reset_index()
print (df1)
   name address   age
0  Jack   addr1  10.0
1  Lucy     NaN   2.0
2  Mark   addr2  11.0

First original solution should be changed:

df1 = df1.set_index('name')
df2 = df2.set_index('name')
df1 = df1.reindex(df1.columns.union(df2.columns, sort=False), axis=1)

df1.update(df2)
df1 = df1.reset_index()
print (df1)
   name   age address
0  Jack  10.0   addr1
1  Lucy   2.0     NaN
2  Mark  11.0   addr2

Or solution with left join in DataFrame.merge and DataFrame.combine_first:

#left join df2, if existing columns name is added _ to end
df = df1.merge(df2, on='name', how='left', suffixes=('','_'))

#filter columns names
new_cols = df.columns[df.columns.str.endswith('_')]

#remove last char from column names
orig_cols = new_cols.str[:-1]
#dictionary for rename
d = dict(zip(new_cols, orig_cols))

#filter columns and replace NaNs by new appended columns
df[orig_cols] = df[new_cols].rename(columns=d).combine_first(df[orig_cols])
#remove appended columns 
df = df.drop(new_cols, axis=1)
print (df)
   name   age address
0  Jack  10.0   addr1
1  Lucy   2.0     NaN
2  Mark  11.0   addr2

1 Comment

Hi jezrael thanks for your answer. I've updated my question. What is there is another column from df2 I also want to add to df1?
1

You could do by using concat, drop_duplicates, sort_index & reset_index

df = pd.concat([df1,df2],ignore_index=False, sort=False).drop_duplicates(["name"], keep="last").sort_index().reset_index(drop=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.