Pandas split data frame into multiple csv's based on column value

Question

I have a question very similar to this one but I need to take it a step further by saving split data frames to csv.

import pandas as pd
import numpy as np
import os

df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 1000),
                    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 1000),
                    'TIME_BIN': np.random.randint(1, 86400, size=1000),
                    'COUNT': np.random.randint(1, 700, size=1000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)

OUTPUT:
         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    476   Thursday  12:20:00
1     PHOENIX     50   Saturday  15:40:00
2       MIAMI    250     Friday  08:20:00
3     CHICAGO    358     Monday  15:40:00
4     PHOENIX    217   Thursday  22:10:00
5       MIAMI     12   Thursday  21:40:00
6      DENVER     22     Friday  10:30:00
7     CHICAGO    645     Sunday  23:40:00
8       MIAMI    188     Sunday  08:40:00

I want to make a separate data frame for each city and save it as a .csv. The code below works but how do I do it Pythonicly without having to explicitly state each city? Real data set has about 20 cities so I don't want to repaste this 20 times. I think the code below can be done in 1-2 lines using a for loop but I don't know what it would look like. Something like "for city in df['CITY']"

df_phoenix = df[df['CITY'] == "PHOENIX"]
df_atlanta = df[df['CITY'] == "ATLANTA"]
df_chicago = df[df['CITY'] == "CHICAGO"]
df_phoenix.to_csv(os.getcwd() + "/data_phoenix.csv")
df_atlanta.to_csv(os.getcwd() + "/data_atlanta.csv")
df_chicago.to_csv(os.getcwd() + "/data_chicago.csv")

Anton vBR · Accepted Answer · 2018-03-01 13:34:16Z

17

I think you need groupby with custom lambda function or with loop:

f = lambda x: x.to_csv(os.getcwd() + "/data_{}.csv".format(x.name.lower()), index=False)
df.groupby('CITY').apply(f)

for i, x in df.groupby('CITY'):
     x.to_csv(os.getcwd() + "/data_{}.csv".format(i.lower()), index=False)

EDIT by comment, thanks @Anton vBR:

for i, x in df.groupby('CITY'):
    p = os.path.join(os.getcwd(), "data_{}.csv".format(i.lower()))
    x.to_csv(p, index=False)

edited Mar 1, 2018 at 13:34

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

answered Mar 1, 2018 at 13:13

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Calculus Over a year ago

Thanks, it works and is a beautiful one-liner that I was looking for. I see your name a lot in the answers section when researching Pandas questions :)

sjw Over a year ago

Nice, didn't know this was possible without a loop. Is the os.getcwd() + necessary though? Surely we just write to current working directory by default if we don't specify.

Anton vBR Over a year ago

@Calculus One-liners might be beautiful but they are sometimes less readable too.

jezrael Over a year ago

@AntonvBR - I add solution, do you think this?

arielf Over a year ago

Beautiful solution. Note os.path.join(os.getcwd(), ) is redundant and can be removed.

Collectives™ on Stack Overflow

Pandas split data frame into multiple csv's based on column value

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related