8

I have a question very similar to this one but I need to take it a step further by saving split data frames to csv.

import pandas as pd
import numpy as np
import os

df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 1000),
                    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 1000),
                    'TIME_BIN': np.random.randint(1, 86400, size=1000),
                    'COUNT': np.random.randint(1, 700, size=1000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)

OUTPUT:
         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    476   Thursday  12:20:00
1     PHOENIX     50   Saturday  15:40:00
2       MIAMI    250     Friday  08:20:00
3     CHICAGO    358     Monday  15:40:00
4     PHOENIX    217   Thursday  22:10:00
5       MIAMI     12   Thursday  21:40:00
6      DENVER     22     Friday  10:30:00
7     CHICAGO    645     Sunday  23:40:00
8       MIAMI    188     Sunday  08:40:00

I want to make a separate data frame for each city and save it as a .csv. The code below works but how do I do it Pythonicly without having to explicitly state each city? Real data set has about 20 cities so I don't want to repaste this 20 times. I think the code below can be done in 1-2 lines using a for loop but I don't know what it would look like. Something like "for city in df['CITY']"

df_phoenix = df[df['CITY'] == "PHOENIX"]
df_atlanta = df[df['CITY'] == "ATLANTA"]
df_chicago = df[df['CITY'] == "CHICAGO"]
df_phoenix.to_csv(os.getcwd() + "/data_phoenix.csv")
df_atlanta.to_csv(os.getcwd() + "/data_atlanta.csv")
df_chicago.to_csv(os.getcwd() + "/data_chicago.csv")

1 Answer 1

17

I think you need groupby with custom lambda function or with loop:

f = lambda x: x.to_csv(os.getcwd() + "/data_{}.csv".format(x.name.lower()), index=False)
df.groupby('CITY').apply(f)

for i, x in df.groupby('CITY'):
     x.to_csv(os.getcwd() + "/data_{}.csv".format(i.lower()), index=False)

EDIT by comment, thanks @Anton vBR:

for i, x in df.groupby('CITY'):
    p = os.path.join(os.getcwd(), "data_{}.csv".format(i.lower()))
    x.to_csv(p, index=False)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, it works and is a beautiful one-liner that I was looking for. I see your name a lot in the answers section when researching Pandas questions :)
Nice, didn't know this was possible without a loop. Is the os.getcwd() + necessary though? Surely we just write to current working directory by default if we don't specify.
@Calculus One-liners might be beautiful but they are sometimes less readable too.
@AntonvBR - I add solution, do you think this?
Beautiful solution. Note os.path.join(os.getcwd(), ) is redundant and can be removed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.