Compute Average/Mean across Dataframes in Python Pandas

Question

I have a list of dataframes. Each dataframe was originally numerical data taken from which are all shaped identically with 21 rows and 5 columns. The first column is an index (index 0 to index 20). I want to compute the average (mean) values into a single dataframe. Then I want to export the dataframe to excel.

Here's a simplified version of my existing code:

#look to concatenate the dataframes together all at once
#dataFrameList is the given list of dataFrames
concatenatedDataframes = pd.concat(dataFrameList, axis = 1)

#grouping the dataframes by the index, which is the same across all of the dataframes
groupedByIndex = concatenatedDataframes.groupby(level = 0)

#take the mean 
meanDataFrame = groupedByIndex.mean()

# Create a Pandas Excel writer using openpyxl as the engine.
writer = pd.ExcelWriter(filepath, engine='openpyxl')
meanDataFrame.to_excel(writer)

However, when I open the excel file, I see what looks like EVERY dataframe is copied into the sheet and the average/mean values are not shown. A simplified example is shown below (cutting most of the rows and dataframes)

              Dataframe 1                   Dataframe 2                   Dataframe 3
Index  Col2   Col3   Col4   Col5     Col2   Col3   Col4   Col5     Col2   Col3   Col4   Col5
0      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
1      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
2      Data   Data   Data   Data     Data   Data   Data   Data     Data   Data   Data   Data
....

I'm looking for something more like:

           Averaged DF
Index  Col2                                   Col3                                   Col4
0      Mean Index0,Col2 across DFs    Mean Index0,Col3 across DFs    Mean Index0,Col4 across DFs
1      Mean Index1,Col2 across DFs    Mean Index1,Col3 across DFs    Mean Index1,Col4 across DFs
2      Mean Index2,Col2 across DFs    Mean Index2,Col3 across DFs    Mean Index3,Col4 across DFs
...

I have also already seen this answer: Get the mean across multiple Pandas DataFrames

If possible, I'm looking for a clean solution, not one which would simply involve looping through each dataFrame value by value. Any suggestions?

Sharky · Accepted Answer · 2019-08-08 18:49:21Z

2

Perhaps I misunderstood what you asked

The solution is simple. You just need to concat along the correct axis

dummy data

df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ])
df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])

ps. this should be your job as OP

pd.concat

df_concat0 = pd.concat((df1, df2), axis=1)

puts all the dataframes next to eachother.

    0   1   0   1
0   10  10  0   1
1   10  11  1   2
2   10  12  2   3

If we want to do a groupby now, we first need to stack, groupby and stack again

df_concat0.stack().groupby(level=[0,1]).mean().unstack()

    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

If we do

df_concat = pd.concat((df1, df2))

This puts all the dataframes on top of eachother

now we need to just groupby the index, like you did

df_concat.groupby(level=0).mean()

    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

and then use ExcelWriter as context manager

with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
    result.to_excel(writer)

or just plain

result.to_excel(filepath, engine='openpyxl')

if you can overwrite what is is filepath

edited Aug 8, 2019 at 18:49

Sharky

4,5412 gold badges21 silver badges27 bronze badges

answered Jun 13, 2017 at 8:53

Maarten Fabré

7,0781 gold badge19 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Keith Pham Over a year ago

This seems to generate a Series, which isn't quite what I'm looking for

Maarten Fabré Over a year ago

I adapted my answer, now you clarified what you needed

Keith Pham Over a year ago

Perfect, exactly what I was looking for!

Abhimanyu · Accepted Answer · 2017-06-13 14:25:31Z

I suppose you need the mean of all rows against each column.

Concatenating a list of data frames with same index will add the columns from other data frames to the right of the first data frame. As below:

      col1  col2  col3  col1  col2  col3
    0     1     2     3     2     3     4
    1     2     3     4     3     4     5
    2     3     4     5     4     5     6
    3     4     5     6     5     6     7

Try appending the data frames and then group by and take the mean to get the desired result.

    ##creating data frames
    df1= pd.DataFrame({'col1':[1,2,3,4],
        'col2':[2,3,4,5],
        'col3':[3,4,5,6]})

    df2= pd.DataFrame({'col1':[2,3,4,5],
        'col2':[3,4,5,6],
        'col3':[4,5,6,7]})

    ## list of data frames
    dflist = [df1,df2]

    ## empty data frame to use for appending
    df=pd.DataFrame()

    #looping through each item in list and appending to empty data frame
    for i in dflist:
        df = df.append(i)

    # group by and calculating mean on index
    data_mean=df.groupby(level=0).mean()

Write to file as you are writing

Alternatively : Instead of appending using a for loop you can also mention the axis along which you want to concatenate the data frames, in your case you want to concatenate along the index(axis = 0) to put the data data frames on top top each other. As below:

       col1  col2  col3
    0     1     2     3
    1     2     3     4
    2     3     4     5
    3     4     5     6
    0     2     3     4
    1     3     4     5
    2     4     5     6
    3     5     6     7

    ##creating data frames
    df1= pd.DataFrame({'col1':[1,2,3,4],
                       'col2':[2,3,4,5],
                       'col3':[3,4,5,6]})

    df2= pd.DataFrame({'col1':[2,3,4,5],
                       'col2':[3,4,5,6],
                       'col3':[4,5,6,7]})

    ## list of data frames
    dflist = [df1,df2]

    #concat the dflist along axis 0 to put the data frames on top of each other
    df_concat=pd.concat(dflist,axis=0)

    # group by and calculating mean on index
    data_mean=df_concat.groupby(level=0).mean()

Write to file as you are writing

Richard M · Accepted Answer · 2024-03-23 20:53:08Z

0

#First convert all your data frames to a numpy array, then use numpy  
#vector    based mean function. 
# import the two libraries 
import numpy as np
import pandas as pd
# convert to record arrays
ar_frame = [df.to_numpy(), for df in [df1, df2]]
# calculate the mean across the axis 0 for data frame if we want 
#for each row
data_mean = np.mean(ar_frame, axis =0)
# convert back to pandas if required. use any of the data frame 
# columns, 
#ofcourse this assumes that the columns names are the same across 
#data 
#frames
df_mean = pd.DataFrame(data_mean, columns = df.columns)

edited Mar 23, 2024 at 20:53

answered Mar 23, 2024 at 20:50

Richard M

256 bronze badges

Collectives™ on Stack Overflow

Compute Average/Mean across Dataframes in Python Pandas

3 Answers 3

dummy data

pd.concat

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

dummy data

pd.concat

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related