2

I have two data frames below:

import pandas as pd

data1 = {'date' : ['1', '2','3'],
     'value1' : ['a', 'b' ,'c'],
     'value2' : ['12','24','4']}
data2 = {'date' : ['2','3','4'],
     'value1' : ['b', 'c' ,'g'],
     'value2' : ['24','4','55']}

df1 = pd.DataFrame(data1)
df1 = df1.set_index('date')
df2 = pd.DataFrame(data2)
df2 = df2.set_index('date')

and here is my desired output:

desired_result = {'date' : ['1','2','3','4'],
     'value1' : ['a', 'b', 'c', 'g'],
     'value2' : ['12', '24', '4', '55']}

I have tried all different kinds of merge, join, concat but couldn't figure it out.

4 Answers 4

2

This isn't exactly a merge problem but you can use combine_first:

df1.combine_first(df2).reset_index()

  date value1 value2
0    1      a     12
1    2      b     24
2    3      c      4
3    4      g     55

Another suggestion is concat and drop_duplicates:

pd.concat([df1, df2]).reset_index('date').drop_duplicates('date')

  date value1 value2
0    1      a     12
1    2      b     24
2    3      c      4
5    4      g     55
Sign up to request clarification or add additional context in comments.

Comments

2

Feel like a groupby problem

pd.concat([df1,df2]).groupby(level=0).last()
     value1 value2
date              
1         a     12
2         b     24
3         c      4
4         g     55

Comments

1

If you use a simple join/merge you will have some null values.

pandas.DataFrame.combine_first or pandas.DataFrame.combine are there for this purpose.

A simple: df1.combine_first(df2) should work just fine.

Comments

1

This is most definitely a perfect merge problem, simply use the outer merge and select the correct keys for the join like this.

Remove the set_index for the dataframes, you dont need that.

data1 = {'date' : ['1', '2','3'],
     'value1' : ['a', 'b' ,'c'],
     'value2' : ['12','24','4']}
data2 = {'date' : ['2','3','4'],
     'value1' : ['b', 'c' ,'g'],
     'value2' : ['24','4','55']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)


# join with the key columns date, value1 & value2
df4 = pd.merge(df1, df2, on=['date', 'value1', 'value2'], how='outer')

Output

    date    value1  value2
0   1       a       12
1   2       b       24
2   3       c       4
3   4       g       55

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.