How to read data in chunks in Python dataframe? [duplicate]

Question

I want to read the file f in chunks to a dataframe. Here is part of a code that I used.

for i in range(0, maxline, chunksize):
df = pandas.read_csv(f,sep=',', nrows=chunksize, skiprows=i)
df.to_sql(member, engine, if_exists='append',index= False, index_label=None, chunksize=chunksize)

I get the error:

pandas.io.common.EmptyDataError: No columns to parse from file

The code works only when the chunksize >= maxline (which is total lines in file f). However, in my case, the chunksize<=maxline.

Please advise the fix.

Ami Tavory · Accepted Answer · 2016-09-08 08:23:13Z

5

I think it is better to use the parameter chunksize in read_csv. Also, use concat with the parameter ignore_index, because of the need to avoid duplicates in index:

chunksize = 5
TextFileReader = pd.read_csv(f, chunksize=chunksize)

df = pd.concat(TextFileReader, ignore_index=True)

See pandas docs.

edited Sep 8, 2016 at 8:23

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

answered Sep 8, 2016 at 7:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Geet Over a year ago

Thanks! Now I get df as TextFileReader. The next step of my code demands df to be a dataframe. How can I convert TextFileReader to dataframe?

Geet Over a year ago

My actual data is about 85GB. Wouldn't concatenation make the datafram big? I want to use chunksize to read and write in chunks. Please advise.

jezrael Over a year ago

Yes, it will be very big. Maybe you can check question.

Geet Over a year ago

That looks very difficult for a novice like me. "df = pandas.read_csv(f,sep=',', nrows=chunksize, skiprows=i)" actually gives dataframe. Can't this be modified to solve my problem. Updated the question. Thanks!

jezrael Over a year ago

I use your solution some time ago and I get same error. Unfortunately I never use to_sql, so I cant help you with it.

Collectives™ on Stack Overflow

How to read data in chunks in Python dataframe? [duplicate]

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related