1

I have two dfs, df1 is like,

primary_key    code    amount
220492763      763     32.41
213274768      764     23.41
226835769      766     88.41
224874836      7766    100.31
219074759      74836   111.33

df2 is like,

primary_key    code    amount
213274768      764     24.41
224874836      7766    101.31
217774816      768     123.43
222176762      798     111.44
219374759      24774   134.56

I like to use df2 to update df_1 based on the same primary_key, and for the rest of rows in df2, append them to the end of df1, so the result looks like,

primary_key    code    amount
220492763      763     32.41
213274768      764     24.41
226835769      766     88.41
224874836      7766    101.31
219074759      74836   111.33
217774816      768     123.43
222176762      798     111.44
219374759      24774   134.56

have tried to use

df1.set_index('primary_key').combine_first(df2.set_index('primary_key')).reset_index()  

but the two dfs mixed together, I am wondering how to fix it.

2 Answers 2

2

Using combine_first

yourdf=df2.set_index('primary_key').combine_first(df1.set_index('primary_key')).reset_index()
yourdf
Out[287]: 
   primary_key     code  amount
0    213274768    764.0   24.41
1    217774816    768.0  123.43
2    219074759  74836.0  111.33
3    219374759  24774.0  134.56
4    220492763    763.0   32.41
5    222176762    798.0  111.44
6    224874836   7766.0  101.31
7    226835769    766.0   88.41

Update adding the order

idx=pd.concat([df1.primary_key,df2.primary_key]).drop_duplicates()
yourdf=df2.set_index('primary_key').combine_first(df1.set_index('primary_key')).reindex(idx).reset_index()
yourdf
Out[293]: 
   primary_key     code  amount
0    220492763    763.0   32.41
1    213274768    764.0   24.41
2    226835769    766.0   88.41
3    224874836   7766.0  101.31
4    219074759  74836.0  111.33
5    217774816    768.0  123.43
6    222176762    798.0  111.44
7    219374759  24774.0  134.56
Sign up to request clarification or add additional context in comments.

1 Comment

thank you for the solution, but the output isn't exactly what I look for.
2

Use pd.concat, drop_duplicates, and reindex:

idx=pd.concat([df1.primary_key,df2.primary_key]).drop_duplicates()
pd.concat([df2,df1]).drop_duplicates('primary_key').set_index('primary_key').reindex(idx).reset_index()

Output:

   primary_key   code  amount
0    220492763    763   32.41
1    213274768    764   24.41
2    226835769    766   88.41
3    224874836   7766  101.31
4    219074759  74836  111.33
5    217774816    768  123.43
6    222176762    798  111.44
7    219374759  24774  134.56

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.