Concat Dataframe in file with pandas

Question

The data file is to big for RAM, so I can't use .read_csv() -> concat -> .to_csv(). Is there a easy option to concat two DataFrames?

have you considered using generators? stackoverflow.com/questions/18915941/… — Ade_1
– Ade_1, Commented May 30, 2021 at 12:25
read_csv() has parameters like iterator and chunksize to help with reading in big files. Check that out — Shubham Periwal
– Shubham Periwal, Commented May 30, 2021 at 12:41
If you just need to append files, you can read in individual files and append them all together using mode="a" for to_csv — Alex
– Alex, Commented May 30, 2021 at 12:42
If your dataset is exceeding memory you should try DASK which allows you to work with large datasets for both data manipulation works well with python libraries like NumPy, scikit-learn, etc. More info: Dask and pandas: There’s No Such Thing as Too Much Data — DarrylG
– DarrylG, Commented May 30, 2021 at 12:52

blackraven · Accepted Answer · 2021-05-30 13:24:15Z

0

I have an idea to read a batch of n rows (within RAM limits) from each csv file, and write/append it to a new csv file. Note that all files must have the same column schema.

Below codes seem to work on my small csv files. You could try on larger ones with a larger batch size, and let me know if it works.

filenames = ['file1.csv', 'file2.csv', 'file3.csv']
batch_size = 2
df = pd.read_csv(filenames[0], nrows=0)
df.to_csv('new.csv', index=False)   #save the header

for filename in filenames:
    this_batch = batch_size
    i = 0
    while this_batch == batch_size:
        df = pd.read_csv(filename, nrows=batch_size, skiprows=batch_size*i)
        this_batch = len(df)
        i += 1
        df.to_csv('new.csv', mode='a', index=False, header=None)

edited May 30, 2021 at 13:24

answered May 30, 2021 at 13:13

blackraven

5,6797 gold badges27 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Concat Dataframe in file with pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related