4

I wonder of the fastest way to write data from pandas DataFrame to table in postges DB.

1) I've tried pandas.to_sql, but for some reason it takes entity to copy data,

2) besides I've tried following:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f)
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()

but it returns empty list [].

So I have two questions: what is the fastest way to copy data from python code (dataframe) to postgres DB? and what was incorrect in the second approach that I've tried?

1 Answer 1

9

Your second approach should be very fast.

There are two problems with your code:

  1. After writing the csv to f you are positioned at the end of the file. You need to put your position back to the beginning before starting to read.
  2. When writing a csv, you need to omit the header and index

Here is what your final code should look like:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f, index=False, header=False)  # removed header
f.seek(0)  # move position to beginning of file before reading
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()
Sign up to request clarification or add additional context in comments.

6 Comments

I recieve following error code: InternalError: current transaction is aborted, commands ignored until end of transaction block - looks like some sql error orr??)
You are in the middle of a transaction that had an error in it. You need to do conn.rollback() and then conn.begin() again.
Thanks a lot, but still. DataError: extra data after last expected column CONTEXT: COPY bbbb, line 1: "0,1,3"
Ok it looks like it is trying to insert three values but your table only has two columns. I think the extra value is from the dataframe index. So when you write the csv, you should also omit the index. I edited the code above to do that.
Finally, I got it! Thank you for your help! Could also give a hint why pandas.to_sql implementation works so slowly?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.