split large csv based on one column condition and write to smaller csv

Question

How can I split a large csv with many columns, based on changing one column e.g ID? here is an example:

import pandas as pd
from pandas.compat import StringIO
csvdata = StringIO("""ID,f1
1,3.2
1,4.3
1,10
7,9.1
7,2.3
7,4.4
""") 

df = pd.read_csv(csvdata, sep=",")
df

My aim is to save each block in separate csv which its name is generated in a loop based on ID:

df_ID_1.csv

    ID f1
    1  3.2
    1  4.3
    1  10.0

df_ID_7.csv

    ID f1
    7  9.1
    7  2.3
    7  4.4

Thank you very much!

related: stackoverflow.com/questions/26103676/…

EdChum
– EdChum

2019-04-17 13:47:49 +00:00
Commented Apr 17, 2019 at 13:47 — EdChum
– EdChum, Commented Apr 17, 2019 at 13:47

Ben Dickson · Accepted Answer · 2019-04-17 14:01:52Z

2

just cycle through the IDs, create a sliced dataframe for each one, and create your .csv file

for id in df['ID'].unique():
    temp_df = df.loc[df['ID'] == id]
    file_name = "df_ID_{}".format(id)
    # make the path to where you want it saved
    file_path = "C:/Users/you/Desktop/" + file_name
    # write the single ID dataframe to a csv
    temp_df.to_csv(file_path)

answered Apr 17, 2019 at 14:01

Ben Dickson

3162 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Erfan · Accepted Answer · 2019-04-17 14:07:50Z

2

You can use the groupby method for this and acces each seperate group and write it to a csv using pandas.to_csv.

for _, r in df.groupby('ID'):
    r.to_csv(f'df_ID_{r.ID.iloc[0]}')

Or if your Python version is < 3.5 use .format for string formatting instead of f-string:

for _, r in df.groupby('ID'):
    r.to_csv('df_ID_{}.csv'.format(r.ID.iloc[0]))

Which splits our dataframe in seperate csv's:

Explanation of the loop we use:

for _, r in df.groupby('ID'):
    print(r, '\n')
    print(f'This is our ID {r.ID.iloc[0]}', '\n')

   ID    f1
0   1   3.2
1   1   4.3
2   1  10.0 

This is our ID 1 

   ID   f1
3   7  9.1
4   7  2.3
5   7  4.4 

This is our ID 7

edited Apr 17, 2019 at 14:07

answered Apr 17, 2019 at 14:02

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Comments

wwii · Accepted Answer · 2019-04-17 15:07:52Z

0

Without using Pandas: read the file using the csv module, sort by the specified column, groupby the specified column using the itertools module, iterate over the groups and write new files.

import itertools, csv

key = operator.itemgetter('ID')
# assumes csvdata is a filelike object (io.StringIO in OP's example)
reader = csv.DictReader(csvdata)
fields = reader.fieldnames
data = sorted(reader, key = key)
for key,group in itertools.groupby(data, key):
    with open(f'ID_{key}.csv', 'w')as f:
        writer = csv.DictWriter(f, fields)
        writer.writeheader()
        writer.writerows(group)

answered Apr 17, 2019 at 15:07

wwii

23.9k7 gold badges42 silver badges80 bronze badges

Collectives™ on Stack Overflow

split large csv based on one column condition and write to smaller csv

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related