Write fast pandas dataframe to postgres

Question

I wonder of the fastest way to write data from pandas DataFrame to table in postges DB.

1) I've tried pandas.to_sql, but for some reason it takes entity to copy data,

2) besides I've tried following:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f)
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()

but it returns empty list [].

So I have two questions: what is the fastest way to copy data from python code (dataframe) to postgres DB? and what was incorrect in the second approach that I've tried?

Michael · Accepted Answer · 2017-01-26 17:38:55Z

9

Your second approach should be very fast.

There are two problems with your code:

After writing the csv to f you are positioned at the end of the file. You need to put your position back to the beginning before starting to read.
When writing a csv, you need to omit the header and index

Here is what your final code should look like:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f, index=False, header=False)  # removed header
f.seek(0)  # move position to beginning of file before reading
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()

edited Jan 26, 2017 at 17:38

answered Jan 26, 2017 at 15:04

Michael

1,0097 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Marcel Mars Over a year ago

I recieve following error code: InternalError: current transaction is aborted, commands ignored until end of transaction block - looks like some sql error orr??)

Michael Over a year ago

You are in the middle of a transaction that had an error in it. You need to do conn.rollback() and then conn.begin() again.

Marcel Mars Over a year ago

Thanks a lot, but still. DataError: extra data after last expected column CONTEXT: COPY bbbb, line 1: "0,1,3"

Michael Over a year ago

Ok it looks like it is trying to insert three values but your table only has two columns. I think the extra value is from the dataframe index. So when you write the csv, you should also omit the index. I edited the code above to do that.

Marcel Mars Over a year ago

Finally, I got it! Thank you for your help! Could also give a hint why pandas.to_sql implementation works so slowly?

|

Collectives™ on Stack Overflow

Write fast pandas dataframe to postgres

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related