Adding two pandas dataframes

Question

I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I've tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don't have the element.

Any suggestions?

Can you clarify what you want to happen if an item does not exist in one or both dataframes? You say if the item does not exist in one dataframe, it should be treated as zero --- do you mean the value in that dataframe should be treated as zero and added to the value from the other dataframe, or do you mean the value in the result dataframe should be zero? Also, you say df1+df2 doesn't work because it gives NaN if both don't have the element. What do you want to happen in this case? You want a zero in the result? — BrenBarn
– BrenBarn, Commented Jun 19, 2012 at 18:44

Renaud · Accepted Answer · 2014-03-11 08:55:02Z

126

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606

edited Mar 11, 2014 at 8:55

Renaud

16.6k7 gold badges83 silver badges81 bronze badges

answered Jun 20, 2012 at 3:28

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs0679 Over a year ago

Perfect, just what I was after. Thanks

user8188120 Over a year ago

I recently found that this method doesn't work if you first create two dataframes and then set their index columns as one of the existing dataframe columns using df.set_index('Column_A') for example

questionto42 Over a year ago

@user8188120 That is probably because the column number and / or the column names differ between the two. The column names must be the same in both dfs, else, the df columns get just concatenated to the existing df.

saurish · Accepted Answer · 2019-02-21 21:29:35Z

19

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12

edited Feb 21, 2019 at 21:29

saurish

781 silver badge12 bronze badges

answered Jun 19, 2012 at 19:02

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

1 Comment

cs0679 Over a year ago

Thanks, but I didn't explain my data very well as I have different columns in both DataFrames e.g. A, B, C in dataframe1 and A, B, D in dataframe 2. The output should be a dataframe with A, B, C, D

Prafulla Pallal · Accepted Answer · 2017-02-16 12:19:46Z

4

Both the above answers - fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

answered Feb 16, 2017 at 12:19

Prafulla Pallal

571 bronze badge

1 Comment

questionto42 Over a year ago

Downvote. It is an old answer anyway, but this is exactly the accepted answer that had been given three years before this answer.

Xavi · Accepted Answer · 2014-11-07 11:13:09Z

For making more general the answer... first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google's stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
                    index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
                    columns=['Volume', 'Adj Close'])

dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]

df_prices = df_prices.join(df_corrections)

Collectives™ on Stack Overflow

Adding two pandas dataframes

4 Answers 4

3 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related