Python/ Pandas CSV Parsing / search for value

Question

I am using pandas to parse a CSV File. The CSV file contains a value for each day of the last 10 years.

The CSV looks like this:

production,day,year
5.0,50,2015
80.0,51,2015
190.0,52,2015
10.0,53,2015
.
.
.
2.0,50,2016
2.0,51,2016
40.0,52,2016
20.0,53,2016
.
.

i use the following code:

def calcAverageFirstYears(productionCSV):

    myFile = pd.read_csv(productionCSV)

    result = myFile[myFile['day']==52]
    print(result)

So I get this reslut:

   production   day    year
2       190.0  52.0  2015.0
9        40.0  52.0  2016.0
16       60.0  52.0  2017.0
23        6.0  52.0  2018.0

How can I calculte the average of these values? How can I calculte the average of the 2015 and 2016?

Thank you for your help

Thirupathi Thangavel · Accepted Answer · 2018-02-22 10:16:32Z

2

describe gives the mean, median, etc for all the columns.

result.describe()

If you want the mean for each year, then use groupby

result.groupby('year').mean()

answered Feb 22, 2018 at 10:16

Thirupathi Thangavel

2,5154 gold badges32 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2018-02-22 11:41:46Z

1

Use:

#if want mean of column production
print(result['production'].mean())

And:

#if want mean of filtered year - 2015 and 2016 only
print (result.loc[result['year'].isin([2015, 2016]), 'production'].mean())

All values:

#if want mean of all years of filtered df
print (result.groupby('year')['production'].mean())


#if want mean of all years of original df
print (df.groupby('year')['production'].mean())

EDIT:

Filter by boolean indexing with between and then get mean:

print (df)
   production  day  year
0         5.0   50  2010
1        80.0   51  2011
2       190.0   52  2012
3        10.0   52  2013
4         2.0   52  2014
5         2.0   51  2015
6        40.0   52  2016
7        20.0   53  2017

#get minimal year
min_year = df['year'].min()
s = df.loc[df['year'].between(min_year, min_year + 5) & (df['day'] == 52), 'production'] 

print (s)
2    190.0
3     10.0
4      2.0
Name: production, dtype: float64

a = s.mean()
print (a)
67.33333333333333

edited Feb 22, 2018 at 11:41

answered Feb 22, 2018 at 10:14

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

5 Comments

relash Over a year ago

Thank you. It work to calculate the mean of all years or for filtered years (2015 and 2016) But I dont know in all cases which year was the first year of production. How do I calculate the mean of the production for the first 5 years?

jezrael Over a year ago

@relash - do you need mean of original data for first 5 years?

relash Over a year ago

I need the mean for a specific day of the first 5 years . e.g. the mean of day 53 for the years 2012-2017

jezrael Over a year ago

@relash - I think I understand, please check edited answer.

relash Over a year ago

Thank you. This is exactly what I was looking for

honzajolic · Accepted Answer · 2018-02-22 10:26:32Z

0

You can use groupby and mean (I assume that you want mean of the column production)

result[['day','production']].groupby('day').mean()

or

result[['year','production']].groupby('year').mean()

You can also use the same approach for the whole data frame and you are going to see averages for all days / year:

myFile[['day','production']].groupby('day').mean()

or

myFile[['year','production']].groupby('year').mean()

answered Feb 22, 2018 at 10:26

honzajolic

564 bronze badges

Comments

jpp · Accepted Answer · 2018-02-22 10:29:32Z

0

If I understand correctly, you need production mean by day. But the solution below can easily be switched round for production mean by year.

df = pd.read_csv('productionCSV.csv')

s = df.groupby('day')['production'].mean()

# day
# 50      3.5
# 51     41.0
# 52    115.0
# 53     15.0
# Name: production, dtype: float64

s[52]

# 115.0

Explanation

Separate reading data from querying. There shouldn't be a need to read the file for each function call.
Grouping by day and calculating production mean gives you average production by day.
The resulting series s can be used dictionary-like, as the groupby result uses day as an index.

edited Feb 22, 2018 at 10:29

answered Feb 22, 2018 at 10:23

jpp

166k37 gold badges301 silver badges362 bronze badges

Collectives™ on Stack Overflow

Python/ Pandas CSV Parsing / search for value

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related