How to fetch data from db and write to csv more quickly?

Question

I need to fetch data from DB and write it to a corresponding column in CSV file. The following code do it very slow(iteratevly, one by one)

async def fetch_and_write():
  conn = await asyncpg.connect('...')
  with open('/Users/mac/Desktop/input.csv','r') as csvinput:
     with open('/Users/mac/Desktop/output.csv', 'w') as csvoutput:
        reader = csv.reader(csvinput)
        writer = csv.writer(csvoutput, lineterminator='\n')

        all = []
        row = next(reader)
        row.append('new_column_name')
        all.append(row)

        for row in reader:
            query = "SELECT .. FROM .. WHERE id = '%s';"
            query = query % row[14]
            try:
                result = await conn.fetch(query)
            except BaseException:
                print("Oops!That was no valid number.")
                continue

            row.append(result[0][0])
            all.append(row)

        writer.writerows(all)

How can I read id from CSV in chunks and use "in" clause to improve performance?

You can read ids from csv in chunks and use "in" clause of sql. — Technocrat
– Technocrat, Commented Apr 2, 2017 at 15:43

Afaq · Accepted Answer · 2017-04-02 16:03:06Z

1

You can use postgres' Copy command to do the trick. e.g. your query should be

Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',';

answered Apr 2, 2017 at 16:03

Afaq

1,1551 gold badge13 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Technocrat · Accepted Answer · 2017-04-02 16:05:12Z

0

As per my suggestion in comment, You can fetch "n" number of records in one query. Below is the modified version of code provided by you.

Not Tested

async def fetch_and_write():
    n = 500  # fetch #rows at once
    conn = await asyncpg.connect('...')
    with open('/Users/mac/Desktop/input.csv','r') as csvinput:
        with open('/Users/mac/Desktop/output.csv', 'w') as csvoutput:
        reader = csv.reader(csvinput)
        writer = csv.writer(csvoutput, lineterminator='\n')

        all = []
        ids_list = []
        row = next(reader)
        row.append('new_column_name')
        all.append(row)

        for row in reader:

            ids_list.append(row[14])
            if len(ids_list) >= n:

                in_p=', '.join(map(lambda x: '%s', args))

                query = "SELECT .. FROM .. WHERE id in '%s';"
                query = query % in_p
                try:
                    result = await conn.fetch(query)
                except BaseException:
                    print("Oops!That was no valid number.")
                    continue
                ids_list = []   

                row.append(result[0][0])
                writer.writerows(all)
                all = []

        if len(ids_list)>0:
            in_p=', '.join(map(lambda x: '%s', args))

            query = "SELECT .. FROM .. WHERE id in '%s';"
            query = query % in_p
            try:
                result = await conn.fetch(query)
            except BaseException:
                print("Oops!That was no valid number.")
                continue

            row.append(result[0][0])
            writer.writerows(all)

answered Apr 2, 2017 at 16:05

Technocrat

992 silver badges11 bronze badges

2 Comments

user6611764 Over a year ago

But you also do this iteratively, maybe you should use while instead of if?

Technocrat Over a year ago

Yes. But now you are getting 500 results in one query. So your number of iterations decrease by a factor of 500.

Collectives™ on Stack Overflow

How to fetch data from db and write to csv more quickly?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related