0

I am using pandas to parse a CSV File. The CSV file contains a value for each day of the last 10 years.

The CSV looks like this:

production,day,year
5.0,50,2015
80.0,51,2015
190.0,52,2015
10.0,53,2015
.
.
.
2.0,50,2016
2.0,51,2016
40.0,52,2016
20.0,53,2016
.
.

i use the following code:

def calcAverageFirstYears(productionCSV):

    myFile = pd.read_csv(productionCSV)

    result = myFile[myFile['day']==52]
    print(result)

So I get this reslut:

   production   day    year
2       190.0  52.0  2015.0
9        40.0  52.0  2016.0
16       60.0  52.0  2017.0
23        6.0  52.0  2018.0

How can I calculte the average of these values? How can I calculte the average of the 2015 and 2016?

Thank you for your help

4 Answers 4

2

describe gives the mean, median, etc for all the columns.

result.describe()

If you want the mean for each year, then use groupby

result.groupby('year').mean()
Sign up to request clarification or add additional context in comments.

Comments

1

Use:

#if want mean of column production
print(result['production'].mean())

And:

#if want mean of filtered year - 2015 and 2016 only
print (result.loc[result['year'].isin([2015, 2016]), 'production'].mean())

All values:

#if want mean of all years of filtered df
print (result.groupby('year')['production'].mean())


#if want mean of all years of original df
print (df.groupby('year')['production'].mean())

EDIT:

Filter by boolean indexing with between and then get mean:

print (df)
   production  day  year
0         5.0   50  2010
1        80.0   51  2011
2       190.0   52  2012
3        10.0   52  2013
4         2.0   52  2014
5         2.0   51  2015
6        40.0   52  2016
7        20.0   53  2017

#get minimal year
min_year = df['year'].min()
s = df.loc[df['year'].between(min_year, min_year + 5) & (df['day'] == 52), 'production'] 

print (s)
2    190.0
3     10.0
4      2.0
Name: production, dtype: float64

a = s.mean()
print (a)
67.33333333333333

5 Comments

Thank you. It work to calculate the mean of all years or for filtered years (2015 and 2016) But I dont know in all cases which year was the first year of production. How do I calculate the mean of the production for the first 5 years?
@relash - do you need mean of original data for first 5 years?
I need the mean for a specific day of the first 5 years . e.g. the mean of day 53 for the years 2012-2017
@relash - I think I understand, please check edited answer.
Thank you. This is exactly what I was looking for
0

You can use groupby and mean (I assume that you want mean of the column production)

result[['day','production']].groupby('day').mean()

or

result[['year','production']].groupby('year').mean()

You can also use the same approach for the whole data frame and you are going to see averages for all days / year:

myFile[['day','production']].groupby('day').mean()

or

myFile[['year','production']].groupby('year').mean()

Comments

0

If I understand correctly, you need production mean by day. But the solution below can easily be switched round for production mean by year.

df = pd.read_csv('productionCSV.csv')

s = df.groupby('day')['production'].mean()

# day
# 50      3.5
# 51     41.0
# 52    115.0
# 53     15.0
# Name: production, dtype: float64

s[52]

# 115.0

Explanation

  • Separate reading data from querying. There shouldn't be a need to read the file for each function call.
  • Grouping by day and calculating production mean gives you average production by day.
  • The resulting series s can be used dictionary-like, as the groupby result uses day as an index.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.