0

I need to fetch data from DB and write it to a corresponding column in CSV file. The following code do it very slow(iteratevly, one by one)

async def fetch_and_write():
  conn = await asyncpg.connect('...')
  with open('/Users/mac/Desktop/input.csv','r') as csvinput:
     with open('/Users/mac/Desktop/output.csv', 'w') as csvoutput:
        reader = csv.reader(csvinput)
        writer = csv.writer(csvoutput, lineterminator='\n')

        all = []
        row = next(reader)
        row.append('new_column_name')
        all.append(row)

        for row in reader:
            query = "SELECT .. FROM .. WHERE id = '%s';"
            query = query % row[14]
            try:
                result = await conn.fetch(query)
            except BaseException:
                print("Oops!That was no valid number.")
                continue

            row.append(result[0][0])
            all.append(row)

        writer.writerows(all)

How can I read id from CSV in chunks and use "in" clause to improve performance?

2
  • You can read ids from csv in chunks and use "in" clause of sql. Commented Apr 2, 2017 at 15:43
  • Yes, can you provide some examples, please? Commented Apr 2, 2017 at 15:44

2 Answers 2

1

You can use postgres' Copy command to do the trick. e.g. your query should be

Copy (Select * From foo) To '/tmp/test.csv' With CSV DELIMITER ',';

Sign up to request clarification or add additional context in comments.

Comments

0

As per my suggestion in comment, You can fetch "n" number of records in one query. Below is the modified version of code provided by you.

Not Tested

async def fetch_and_write():
    n = 500  # fetch #rows at once
    conn = await asyncpg.connect('...')
    with open('/Users/mac/Desktop/input.csv','r') as csvinput:
        with open('/Users/mac/Desktop/output.csv', 'w') as csvoutput:
        reader = csv.reader(csvinput)
        writer = csv.writer(csvoutput, lineterminator='\n')

        all = []
        ids_list = []
        row = next(reader)
        row.append('new_column_name')
        all.append(row)

        for row in reader:

            ids_list.append(row[14])
            if len(ids_list) >= n:

                in_p=', '.join(map(lambda x: '%s', args))

                query = "SELECT .. FROM .. WHERE id in '%s';"
                query = query % in_p
                try:
                    result = await conn.fetch(query)
                except BaseException:
                    print("Oops!That was no valid number.")
                    continue
                ids_list = []   

                row.append(result[0][0])
                writer.writerows(all)
                all = []

        if len(ids_list)>0:
            in_p=', '.join(map(lambda x: '%s', args))

            query = "SELECT .. FROM .. WHERE id in '%s';"
            query = query % in_p
            try:
                result = await conn.fetch(query)
            except BaseException:
                print("Oops!That was no valid number.")
                continue

            row.append(result[0][0])
            writer.writerows(all)            

2 Comments

But you also do this iteratively, maybe you should use while instead of if?
Yes. But now you are getting 500 results in one query. So your number of iterations decrease by a factor of 500.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.