0

I've a large CSV files (about a million records). I want to process write each record into a DB.

Since loading the complete file into the RAM makes no sense, hence I need to read the file in chunks (or any other better way).

So, I wrote this code .

import csv
with open ('/home/praful/Desktop/a.csv') as csvfile:
    config_file = csv.reader(csvfile, delimiter = ',', quotechar = '|')
    print config_file
    for row in config_file:
        print row

I guess it loads everything into its memory first and then process.

Upon looking at this thread and many others, I didnt see any difference in o/p code and the solution. Kindly advise, is it the only method for efficient processing of csv files

1 Answer 1

2

No, the csv module produces an iterator; rows are produced on demand. Unless you keep references to row elsewhere the file will not be loaded into memory in its entirety.

Note that that is exactly what I am saying in the other answer you linked to; the problem there is that the OP was building a list (data) holding all rows after reading instead of processing the rows as they were being read.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.