pandas: split dataframe into multiple csvs

Question

I have a large file, imported into a single dataframe in Pandas. I'm using pandas to split up a file into many segments, by the number of rows in the dataframe.

eg: 10 rows: file 1 gets [0:4] file 2 gets [5:9]

Is there a way to do this without having to create more dataframes?

thanks for the catch. I've updated the question with that detail — billyc59
– billyc59, Commented Nov 21, 2017 at 20:16
What's the reason why this has to be done in pandas? From the current description (large file, being split by rows) you could do it from the command line using 'split'. — Silenced Temporarily
– Silenced Temporarily, Commented Nov 21, 2017 at 20:18

BENY · Accepted Answer · 2017-11-21 20:28:56Z

4

assign a new column g here, you just need to specific how many item you want in each groupby, here I am using 3 .

df.assign(g=df.index//3)
Out[324]: 
    0  g
0   1  0
1   2  0
2   3  0
3   4  1
4   5  1
5   6  1
6   7  2
7   8  2
8   9  2
9  10  3

and you can call the df[df.g==1] to get what you need

edited Nov 21, 2017 at 20:28

answered Nov 21, 2017 at 20:27

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MaxU - stand with Ukraine Over a year ago

do we really need that new column? df[np.arange(len(df))//3==1]

Neil · Accepted Answer · 2017-11-21 20:46:14Z

4

There are two ways of doing this. I believe you are looking for the former. Basically, we open a series of csv writers, then we write to the correct csv writer by using some basic math with the index, then we close all files.

A single DataFrame evenly divided into N number of CSV files

import pandas as pd
import csv, math

df = pd.DataFrame([1,2,3,4,5,6,7,8,9,10]) # uncreative input values for 10 columns
NUMBER_OF_SPLITS = 2
fileOpens = [open(f"out{i}.csv","w") for i in range(NUMBER_OF_SPLITS)]
fileWriters = [csv.writer(v, lineterminator='\n') for v in fileOpens]
for i,row in df.iterrows():
    fileWriters[math.floor((i/df.shape[0])*NUMBER_OF_SPLITS)].writerow(row.tolist())
for file in fileOpens:
    file.close()

More than one DataFrame evenly divided into N number of CSV files

import pandas as pd
import numpy as np

df = pd.DataFrame([1,2,3,4,5,6,7,8,9,10]) # uncreative input values for 10 columns
NUMBER_OF_SPLITS = 2
for i, new_df in enumerate(np.array_split(df,NUMBER_OF_SPLITS)):
    with open(f"out{i}.csv","w") as fo:
            fo.write(new_df.to_csv())

edited Nov 21, 2017 at 20:46

answered Nov 21, 2017 at 20:20

Neil

14.3k3 gold badges35 silver badges53 bronze badges

3 Comments

billyc59 Over a year ago

This solution forces the creation of a new df.

Neil Over a year ago

@billyc59 Updated it.

Rene Over a year ago

Why do you use the file write method in combination with df.to_csv(). The .to_csv() method is already writing data to a file. In your case, I will get empty rows in the new CSVs.

kerfuffle · Accepted Answer · 2022-11-06 12:03:50Z

2

use numpy.array_split to split your dataframe dfX and save it in N csv files of equal size: dfX_1.csv to dfX_N.csv

N = 10
for i, df in enumerate(np.array_split(dfX, N)):
    df.to_csv(f"dfX_{i + 1}.csv", index=False)

answered Nov 6, 2022 at 12:03

kerfuffle

555 bronze badges

Comments

billyc59 · Accepted Answer · 2017-11-21 22:09:37Z

0

iterating over iloc's arguments will do the trick.

edited Nov 21, 2017 at 22:09

answered Nov 21, 2017 at 20:27

billyc59

912 gold badges2 silver badges8 bronze badges

Collectives™ on Stack Overflow

pandas: split dataframe into multiple csvs

4 Answers 4

1 Comment

A single DataFrame evenly divided into N number of CSV files

More than one DataFrame evenly divided into N number of CSV files

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

A single DataFrame evenly divided into N number of CSV files

More than one DataFrame evenly divided into N number of CSV files

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related