How to extract all rows from a large postgres table using python efficiently?

Question

I have been able to extract close to 3.5 mil rows from a postgres table using python and write to a file. However the process is extremely slow and I'm sure not the most efficient. Following is my code:

import psycopg2, time,csv
conn_string = "host='compute-1.amazonaws.com' dbname='re' user='data' password='reck' port=5433"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
quert = '''select data from table;'''
cursor.execute(quert)

def get_data():
    while True:
        recs = cursor.fetchmany(10000)

        if not recs:
            break

        for columns in recs:
            # do transformation of data here
            yield(columns) 

solr_input=get_data()

with open('prc_ind.csv','a') as fh:
    for i in solr_input:
        count += 1

        if count % 1000 == 0:
             print(count)

         a,b,c,d = i['Skills'],i['Id'],i['History'],i['Industry']
         fh.write("{0}|{1}|{2}|{3}\n".format(a,b,c,d))

The table has about 8 mil rows. I want to ask is there is a better, faster and less memory intensive way to accomplish this.

Have you profiled it? Where is the bottleneck, and what are the constraints on time/memory/cpu/disk? — Chad S.
– Chad S., Commented Jan 4, 2016 at 19:16
if these are your real db credentials I suggest you delete this question, change your credentials on the server (and anywhere in your code), and paste your question again (without the db credentials). — Loïc
– Loïc, Commented Jan 4, 2016 at 19:25
@ChadS. I haven't profiled yet but I'll do it. All I wanted to know was whether my method has glaring programming flaw in it or not — ajaanbaahu
– ajaanbaahu, Commented Jan 4, 2016 at 19:35

Loïc · Accepted Answer · 2016-01-04 19:57:43Z

3

I can see four fields, so I'll assume you are selecting only these.

But even then, you are still loading 8 mil x 4 x n Bytes of data from what seems to be another server. So yes it'll take some time.

Though you are trying to rebuild the wheel, why not use the PostgreSQL client?

psql -d dbname -t -A -F"," -c "select * from users" > output.csv

answered Jan 4, 2016 at 19:57

Loïc

12k2 gold badges35 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

bucket · Accepted Answer · 2016-01-05 18:53:20Z

3

Psycopg2's copy_to command does the exact same thing as a psql dump, as Loïc suggested, except it's in the python side of things. I've found this to be the fastest way to get a table dump.

The formatting for certain data types (such as hstore/json and composite types) is a bit funky, but the command is very simple.

f = open('foobar.dat', 'wb')
cursor.copy_to(f, 'table', sep='|', columns=['skills', 'id', 'history', 'industry'])

Docs here: http://initd.org/psycopg/docs/cursor.html#cursor.copy_to

answered Jan 5, 2016 at 18:53

bucket

613 bronze badges

Collectives™ on Stack Overflow

How to extract all rows from a large postgres table using python efficiently?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related