4

I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?

for EachMonth in MonthsInAnalysis:
    TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
    MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
    with open('my_csv.csv', 'a') as f:
        df.to_csv(f, header=False)

So in the above code how do I create the my_csv.csv prior to the for loop?

Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.

In terms of context I have the following csv files:

enter image description here

Each of which have the following structure:

enter image description here

The Day column reads up to 30 days for each file.

I would like to output a csv file that looks like this:

enter image description here

But obviously includes all the days for all the months.

My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.

Input as text:

    Unnamed: 0  AirTemperature  AirHumidity SoilTemperature SoilMoisture    LightIntensity  WindSpeed   Year    Month   Day Hour    Minute  Second  TimeStamp   MonthCategorical    TimeOfDay
6   6   18  84  17  41  40  4   2016    1   1   6   1   1   10106   January Day
7   7   20  88  22  92  31  0   2016    1   1   7   1   1   10107   January Day
8   8   23  1   22  59  3   0   2016    1   1   8   1   1   10108   January Day
9   9   23  3   22  72  41  4   2016    1   1   9   1   1   10109   January Day
10  10  24  63  23  83  85  0   2016    1   1   10  1   1   10110   January Day
11  11  29  73  27  50  1   4   2016    1   1   11  1   1   10111   January Day
10
  • why do you need to create it first? surely creating from scratch at save time is equivalent to append to an already existing, empty csv? Commented Mar 10, 2016 at 12:34
  • Because I don't know which csv's are present before the grouping occurs so I figure it is easier to create first and fill with whatever is present. How would you approach this? Commented Mar 10, 2016 at 12:38
  • So you want to overwrite 'my_csv.csv' file len(MonthsInAnalysis) times - is that what you want? ;-) Commented Mar 10, 2016 at 12:54
  • Well not overwrite, the for loop will run len(MonthsInAnalysis) times and each time I get a new groupby object I want to append it to the csv. I thought thats what the with open part achieved. Commented Mar 10, 2016 at 12:56
  • @PaulBarr, I guess it would be easier to help you if you would explain bit more - what is your source data and what do you want to achieve (i.e. how the output should look like). There might be another more elegant solution where you won't need to make any loops... Commented Mar 10, 2016 at 13:02

5 Answers 5

5

Just open the file in write mode to create it.

with open('my_csv.csv', 'w'):
    pass

Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.

with open('my_csv.csv', 'w') as f:
    for EachMonth in MonthsInAnalysis:
        TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
        MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
        df.to_csv(f, header=False)
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you, this makes a lot more sense that what I was doing. I will accept in a few minutes.
this will overwrite CSV file len(MonthsInAnalysis) times
@MaxU no it won't.
@Goyo, OK run the following test: [pd.DataFrame(np.random.randn(4, 4)).to_csv('out.csv') for i in range(5)] and tell us how many rows do you have in the out.csv at the end! Following your logic there must be 5*4 = 20 rows in the CSV file. Please test
@MaxU That has nothing to do with my suggestion. It's more like [pd.DataFrame(np.random.randn(4, 4)).to_csv(f) for i in range(5)] where fis a writeable file object, not a file name.
|
3

Creating a blank csv file is as simple as this one

import pandas as pd

pd.DataFrame({}).to_csv("filename.csv")

Comments

1

I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')

and if want to ignore the year:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')

Some details:

(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))

will generate tuple of data frames from all your CSV files

pd.concat(...)

will concatenate them into resulting single DF

df.groupby(['Year','Month','Day']).mean()

will produce wanted report as a data frame, which might be saved into new CSV file:

.to_csv('my_csv.csv')

3 Comments

The csv's are in a subdirectory MonthlyDataSplit/Day I don't quite understand in this example how I would direct it. Would i use glob.glob('MonthlyDataSplit/Day/*.csv')?
Thank you I think this approach is very clean and also more flexible. I appreciate your help
I'm happy to help. Please next time asking 'Pandas' questions post sample input and desired output (as text) - it helps to better understand what OP wants and also helps to develop a solution. :)
0

The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:

 #Before loops
 dflist=[]

Then in each loop do something like:

 dflist.append(MeanDailyTemperaturesForCurrentMonth)

Then at the end:

 final_df = pd.concat([dflist], axis=1)

and this will join everything into one dataframe.

Look at:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

http://pandas.pydata.org/pandas-docs/stable/merging.html

1 Comment

IMO doing pd.concat() in loop is not the best idea - you may want to collect data frames into list and concatenate them in one short, of course if they are not huge.
0

You could do this to create an empty CSV and add columns without an index column as well.

import pandas as pd
df=pd.DataFrame(columns=["Col1","Col2","Col3"]).to_csv(filename.csv,index=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.