How to update one dataframe using values from another dataframe in pandas

Question

I have two dfs, df1 is like,

primary_key    code    amount
220492763      763     32.41
213274768      764     23.41
226835769      766     88.41
224874836      7766    100.31
219074759      74836   111.33

df2 is like,

primary_key    code    amount
213274768      764     24.41
224874836      7766    101.31
217774816      768     123.43
222176762      798     111.44
219374759      24774   134.56

I like to use df2 to update df_1 based on the same primary_key, and for the rest of rows in df2, append them to the end of df1, so the result looks like,

primary_key    code    amount
220492763      763     32.41
213274768      764     24.41
226835769      766     88.41
224874836      7766    101.31
219074759      74836   111.33
217774816      768     123.43
222176762      798     111.44
219374759      24774   134.56

have tried to use

df1.set_index('primary_key').combine_first(df2.set_index('primary_key')).reset_index()

but the two dfs mixed together, I am wondering how to fix it.

BENY · Accepted Answer · 2019-01-07 15:08:44Z

2

Using combine_first

yourdf=df2.set_index('primary_key').combine_first(df1.set_index('primary_key')).reset_index()
yourdf
Out[287]: 
   primary_key     code  amount
0    213274768    764.0   24.41
1    217774816    768.0  123.43
2    219074759  74836.0  111.33
3    219374759  24774.0  134.56
4    220492763    763.0   32.41
5    222176762    798.0  111.44
6    224874836   7766.0  101.31
7    226835769    766.0   88.41

Update adding the order

idx=pd.concat([df1.primary_key,df2.primary_key]).drop_duplicates()
yourdf=df2.set_index('primary_key').combine_first(df1.set_index('primary_key')).reindex(idx).reset_index()
yourdf
Out[293]: 
   primary_key     code  amount
0    220492763    763.0   32.41
1    213274768    764.0   24.41
2    226835769    766.0   88.41
3    224874836   7766.0  101.31
4    219074759  74836.0  111.33
5    217774816    768.0  123.43
6    222176762    798.0  111.44
7    219374759  24774.0  134.56

edited Jan 7, 2019 at 15:08

answered Jan 7, 2019 at 15:02

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

daiyue Over a year ago

thank you for the solution, but the output isn't exactly what I look for.

Scott Boston · Accepted Answer · 2019-01-07 15:16:59Z

2

Use pd.concat, drop_duplicates, and reindex:

idx=pd.concat([df1.primary_key,df2.primary_key]).drop_duplicates()
pd.concat([df2,df1]).drop_duplicates('primary_key').set_index('primary_key').reindex(idx).reset_index()

Output:

   primary_key   code  amount
0    220492763    763   32.41
1    213274768    764   24.41
2    226835769    766   88.41
3    224874836   7766  101.31
4    219074759  74836  111.33
5    217774816    768  123.43
6    222176762    798  111.44
7    219374759  24774  134.56

answered Jan 7, 2019 at 15:16

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

How to update one dataframe using values from another dataframe in pandas

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related