1

How can I split a large csv with many columns, based on changing one column e.g ID? here is an example:

import pandas as pd
from pandas.compat import StringIO
csvdata = StringIO("""ID,f1
1,3.2
1,4.3
1,10
7,9.1
7,2.3
7,4.4
""") 

df = pd.read_csv(csvdata, sep=",")
df

My aim is to save each block in separate csv which its name is generated in a loop based on ID:

df_ID_1.csv

    ID f1
    1  3.2
    1  4.3
    1  10.0

df_ID_7.csv

    ID f1
    7  9.1
    7  2.3
    7  4.4

Thank you very much!

1

3 Answers 3

2

just cycle through the IDs, create a sliced dataframe for each one, and create your .csv file

for id in df['ID'].unique():
    temp_df = df.loc[df['ID'] == id]
    file_name = "df_ID_{}".format(id)
    # make the path to where you want it saved
    file_path = "C:/Users/you/Desktop/" + file_name
    # write the single ID dataframe to a csv
    temp_df.to_csv(file_path)
Sign up to request clarification or add additional context in comments.

Comments

2

You can use the groupby method for this and acces each seperate group and write it to a csv using pandas.to_csv.

for _, r in df.groupby('ID'):
    r.to_csv(f'df_ID_{r.ID.iloc[0]}')

Or if your Python version is < 3.5 use .format for string formatting instead of f-string:

for _, r in df.groupby('ID'):
    r.to_csv('df_ID_{}.csv'.format(r.ID.iloc[0]))

Which splits our dataframe in seperate csv's:
csv's

Explanation of the loop we use:

for _, r in df.groupby('ID'):
    print(r, '\n')
    print(f'This is our ID {r.ID.iloc[0]}', '\n')

   ID    f1
0   1   3.2
1   1   4.3
2   1  10.0 

This is our ID 1 

   ID   f1
3   7  9.1
4   7  2.3
5   7  4.4 

This is our ID 7 

Comments

0

Without using Pandas: read the file using the csv module, sort by the specified column, groupby the specified column using the itertools module, iterate over the groups and write new files.

import itertools, csv

key = operator.itemgetter('ID')
# assumes csvdata is a filelike object (io.StringIO in OP's example)
reader = csv.DictReader(csvdata)
fields = reader.fieldnames
data = sorted(reader, key = key)
for key,group in itertools.groupby(data, key):
    with open(f'ID_{key}.csv', 'w')as f:
        writer = csv.DictWriter(f, fields)
        writer.writeheader()
        writer.writerows(group)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.