71

I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I've tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don't have the element.

Any suggestions?

1
  • 2
    Can you clarify what you want to happen if an item does not exist in one or both dataframes? You say if the item does not exist in one dataframe, it should be treated as zero --- do you mean the value in that dataframe should be treated as zero and added to the value from the other dataframe, or do you mean the value in the result dataframe should be zero? Also, you say df1+df2 doesn't work because it gives NaN if both don't have the element. What do you want to happen in this case? You want a zero in the result? Commented Jun 19, 2012 at 18:44

4 Answers 4

126

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606
Sign up to request clarification or add additional context in comments.

3 Comments

Perfect, just what I was after. Thanks
I recently found that this method doesn't work if you first create two dataframes and then set their index columns as one of the existing dataframe columns using df.set_index('Column_A') for example
@user8188120 That is probably because the column number and / or the column names differ between the two. The column names must be the same in both dfs, else, the df columns get just concatenated to the existing df.
19

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12

1 Comment

Thanks, but I didn't explain my data very well as I have different columns in both DataFrames e.g. A, B, C in dataframe1 and A, B, D in dataframe 2. The output should be a dataframe with A, B, C, D
4

Both the above answers - fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

1 Comment

Downvote. It is an old answer anyway, but this is exactly the accepted answer that had been given three years before this answer.
2

For making more general the answer... first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google's stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
                    index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
                    columns=['Volume', 'Adj Close'])

dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]

df_prices = df_prices.join(df_corrections)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.