0

I simply want to add data frames that are stored in a dictionary. intuitively I would want to loop the dictionary. However I do not have an initial dataframe with zero values. What is the best way to accomplish that elegantly. Currently I am doing the following:

dict = {'B' :df1, 'C':df2, 'D': df3}

total = dict['B'] + dict['C'] + dict['D']

the dfs are initialized from reading from a csv file and there could be more than 3.

How can I accomplish this in a loop?

4 Answers 4

1

You can pass the dict values to concat, example:

In [195]:
d = {}
d['a'] = pd.DataFrame({'a':np.arange(5)})
d['b'] = pd.DataFrame({'b':np.arange(5)})
total = pd.concat(d.values(), axis=1)
total.sum()

Out[195]:
a    10
b    10
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

1

Assuming you want to add (and not concatenate as shown in another answer) these DataFrames you could use something like the following:

#!/usr/bin/env python3
# coding: utf-8

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(3,2))
df2 = pd.DataFrame(np.random.rand(3,2))
df3 = pd.DataFrame(np.random.rand(3,2))
df4 = pd.DataFrame(np.random.rand(3,2))

d = {'a': df1, 'b': df2, 'c': df3, 'd': df4}
total = 0    

for key, df in d.items():
    total += df

4 Comments

here is the error i get:--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-1843cd9e7fa8> in <module>() 10 11 for key, df in d.items(): ---> 12 total += df NameError: name 'total' is not defined
@AlMerchant: You're right. I forgot to initialize the variable total and edited my answer accordingly. Sorry for that.
i guess the problem boils down to how do i know whether a variable is undefined in python?
As you described - you get a NameError telling you that a variable (in this case total is not defined). Did I solve your main problem concerning the addition of dataframes?
0

You could create a panel and then sum:

pd.Panel(dict).sum()

On a side note, it's not best practice to overwrite the inbuilt dict function

Comments

0

just for completeness, here is what demonstrates the problem and the solution:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(3,2))
df2 = pd.DataFrame(np.random.rand(3,2))
df3 = pd.DataFrame(np.random.rand(3,2))
df4 = pd.DataFrame(np.random.rand(3,2))

d = {'a': df1, 'b': df2, 'c': df3, 'd': df4}

for key, df in d.items():
    if 'total' in locals():
        print("found")
        total += df
    else:
        print("not")
        total = df

print(total)
del total

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.