20

I have two different DataFrames that I want to merge with date and hours columns. I saw some threads that are there, but I could not find the solution for my issue. I also read this document and tried different combinations, however, did not work well.

Example of my two different DataFrames,

DF1

        date    hours        var1            var2 
0   2013-07-10  00:00:00    150.322617  52.225920   
1   2013-07-10  01:00:00    155.250917  53.365296   
2   2013-07-10  02:00:00    124.918667  51.158249   
3   2013-07-10  03:00:00    143.839217  53.138251
 .....  
9   2013-09-10  09:00:00    148.135818  86.676341
10  2013-09-10  10:00:00    147.833517  53.658016   
11  2013-09-10  12:00:00    149.580233  69.745368   
12  2013-09-10  13:00:00    163.715317  14.524894   
13  2013-09-10  14:00:00    168.856650  10.762779   

DF2

       date      hours      myvar1        myvar2 
0   2013-07-10  09:00:00    1.617         98.56 
1   2013-07-10  10:00:00    2.917         23.60 
2   2013-07-10  12:00:00    19.667        36.15 
3   2013-07-10  13:00:00    14.217        45.16
 .....  
20 2013-09-10   20:00:00    1.517         53.56 
21 2013-09-10   21:00:00    5.233         69.47
22 2013-09-10   22:00:00    13.717        14.25
23 2013-09-10   23:00:00    18.850        10.69 

As you can see in both DataFrames, DF2 starts with 09:00:00 and I want to join with DF1 09:00:00, which is basically the matchind dates and times. So far, I tried many different combination using previous threads and the documentation mentioned above. An example,

merged_df = DF2.merge(DF1, how = 'left', on = ['date', 'hours'])

This was introduces NAN values for right right DataFrame. I know, I do not have to use both date and hours columns, however, still getting the same result. I tried R quick like this, which works perfectly fine.

merged_df  <- left_join(DF1, DF2, by = 'date')

Is there anyway in pandas to merge DatFrames just with matching values without getting NaN values?

1
  • 1
    Your R code and pandas code do not match. Either way, please show the outputs (incorrect and correct) on a small DataFrame. Commented Jul 18, 2017 at 19:07

1 Answer 1

32

Use how='inner' in pd.merge:

merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])

This will perform and "inner-join" thereby omitting rows in each dataframe that do not match. Hence, no NaN in either the right or left part of merged dataframe.

Sign up to request clarification or add additional context in comments.

3 Comments

check your dtypes of all for columns they much match... either both dates are strings or datetime and the hours should be strings.
@ Scott Boston both are object type now and I am going to convert them to date_time thanks for suggestion
it's a wonderful solution, thanks you help me so much!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.