1

I have two dataframes, and one bigger dataframe needs to be updated based on data from smaller dataframe. So basically if there is a record with matching names I want to update the price in df1, just like in an example below. There might be multiple rows with the same name in df1

df1

id name price 

1 name_1 5,34 

2 name_2 5,36 

3 name_3 4,74 

4 name_4 5,23 

5 name_5 5,94 

6 name_1 5,34 

df2

name price 

name_4 5,17 

name_1 5,37

df_result

id name price 

1 name_1 5,37 

2 name_2 5,36 

3 name_3 4,74 

4 name_4 5,17 

5 name_5 5,94 

6 name_1 5,37

I'm quite stuck. Tried doing this with df.loc[] but I got nowhere. Any ideas?

5
  • Tried doing this with df.loc[] but I got nowhere - add the code from you attempt to your question Commented Oct 2, 2020 at 11:35
  • Use pd.concat([df1,df2]).drop_duplicates(subset=['name'], keep='last') Commented Oct 2, 2020 at 11:40
  • 1
    Alternatively df1.merge(df2, on="name", how="left").ffill(axis=1).drop("price_x", axis=1) Commented Oct 2, 2020 at 11:43
  • 1
    @jezrael your method doesn't preserve the "id" column on the replaced rows (nor the index if that matters). I'm not sure this is an exact duplicate. Commented Oct 2, 2020 at 11:44
  • @Dan - Agree, reopened Commented Oct 2, 2020 at 11:45

2 Answers 2

2

You are trying to do multiple one-to-one matches, merge can help you here:

df1.merge(df2, on="name", how="left").ffill(axis=1).drop("price_x", axis=1)

by doing a left join, you keep all the values in df1 that don't have matches in df2. The ffill then does null-coallesing where you keep the right most non-null column.


Another option based on Sandeep's answer:

df3 = df1.set_index("name")
df3.update(df2.set_index("name")).reset_index()
Sign up to request clarification or add additional context in comments.

4 Comments

I updated my question. I didn't write one important thing - there are places where there are multiple rows with the same name but with different id
@TomaszSłowik I think my solution works as is for that case...?
It does! Thank you :) Had to play around since I have some mess with my data.
@TomaszSłowik btw check my comment on Sandeeps answer, it might be cleaner for you
0

you can use update...like so..

df1.update(df2)

1 Comment

this could work if you set the index to name first. So df3 = df1.set_index("name") and then df3.update(df2.set_index("name")).reset_index()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.