Updating a dataframe rows based on another dataframe rows

Question

I have two dataframes, and one bigger dataframe needs to be updated based on data from smaller dataframe. So basically if there is a record with matching names I want to update the price in df1, just like in an example below. There might be multiple rows with the same name in df1

df1

id name price 

1 name_1 5,34 

2 name_2 5,36 

3 name_3 4,74 

4 name_4 5,23 

5 name_5 5,94 

6 name_1 5,34

df2

name price 

name_4 5,17 

name_1 5,37

df_result

id name price 

1 name_1 5,37 

2 name_2 5,36 

3 name_3 4,74 

4 name_4 5,17 

5 name_5 5,94 

6 name_1 5,37

I'm quite stuck. Tried doing this with df.loc[] but I got nowhere. Any ideas?

Tried doing this with df.loc[] but I got nowhere - add the code from you attempt to your question — Dan
– Dan, Commented Oct 2, 2020 at 11:35
Use pd.concat([df1,df2]).drop_duplicates(subset=['name'], keep='last') — jezrael
– jezrael, Commented Oct 2, 2020 at 11:40
Alternatively df1.merge(df2, on="name", how="left").ffill(axis=1).drop("price_x", axis=1) — Dan
– Dan, Commented Oct 2, 2020 at 11:43
@jezrael your method doesn't preserve the "id" column on the replaced rows (nor the index if that matters). I'm not sure this is an exact duplicate. — Dan
– Dan, Commented Oct 2, 2020 at 11:44

Dan · Accepted Answer · 2020-10-02 13:25:58Z

2

You are trying to do multiple one-to-one matches, merge can help you here:

df1.merge(df2, on="name", how="left").ffill(axis=1).drop("price_x", axis=1)

by doing a left join, you keep all the values in df1 that don't have matches in df2. The ffill then does null-coallesing where you keep the right most non-null column.

Another option based on Sandeep's answer:

df3 = df1.set_index("name")
df3.update(df2.set_index("name")).reset_index()

edited Oct 2, 2020 at 13:25

answered Oct 2, 2020 at 11:47

Dan

45.8k20 gold badges98 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tomasz Słowik Over a year ago

I updated my question. I didn't write one important thing - there are places where there are multiple rows with the same name but with different id

Dan Over a year ago

@TomaszSłowik I think my solution works as is for that case...?

Tomasz Słowik Over a year ago

It does! Thank you :) Had to play around since I have some mess with my data.

Dan Over a year ago

@TomaszSłowik btw check my comment on Sandeeps answer, it might be cleaner for you

Sandeep Kothari · Accepted Answer · 2020-10-02 11:48:27Z

0

you can use update...like so..

df1.update(df2)

answered Oct 2, 2020 at 11:48

Sandeep Kothari

4153 silver badges6 bronze badges

1 Comment

Dan Over a year ago

this could work if you set the index to name first. So df3 = df1.set_index("name") and then df3.update(df2.set_index("name")).reset_index()

Collectives™ on Stack Overflow

Updating a dataframe rows based on another dataframe rows

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related