0

I'm testing the to_sql method of pandas DataFrame.

to_sql is using INSERT SQL method and this is slower than COPY FROM SQL method.

Can I select the COPY FROM SQL method to be used by to_sql, using SQLAlchemy and Postgresql (psycopg2) engine?

1

2 Answers 2

2

This isn't quite true. You can implement this to use COPY FROM by providing your own function for the kwarg method that to_sql accepts. The key bit in the Pandas documentation for DataFrame.to_sql is where method can be not only just a string, but a callable. That's what you'll provide

For example:

from csv import (writer as csv_writer, QUOTE_MINIMAL)
from io import StringIO

def some_function(df, connection):        
    try:
        frame.to_sql(
            'my_table',
            connection,
            schema='public',
            if_exists='append',
            index=False,
            method=copy_from_method)
    except Exception as err:
        print('Got an error ({})'.format(str(err))

Then, you could implement copy_from_method as something like this:

def copy_from_method(table, conn, keys, data_iter, pre_truncate=False, fatal_failure=False):
    "Custom method for pandas.DataFrame.to_sql that will use COPY FROM"""

    dbapi_conn = conn.connection
    cur = dbapi_conn.cursor()

    s_buf = StringIO()
    writer = csv_writer(s_buf, quoting=QUOTE_MINIMAL)
    writer.writerows(data_iter)
    s_buf.seek(0)

    columns = ', '.join('"{}"'.format(k) for k in keys)
    table_name = '{}.{}'.format(
        table.schema, table.name) if table.schema else table.name

    sql_query = 'COPY %s (%s) FROM STDIN WITH CSV' % (table_name, columns)
    cur.copy_expert(sql=sql_query, file=s_buf)
    return cur.rowcount

The example copy_from_method could use some work obviously- and you'll need to be careful with quoting and escaping with the CSV writer in some cases. But this rough version should work relatively well, barring any typos/copy paste errors (this is a stripped down version I use in a project)

EDIT: For big datasets, don’t use format(), f-strings are significantly faster

Sign up to request clarification or add additional context in comments.

Comments

0

You are correct to_sql is using INSERT INTO via sqlalchemy (code where this happens is here), and so naturally you cannot use COPY FROM using to_sql.

But if you need this, you can of course make an own implementation of to_sql using that. And if you think this has much better performance and the same functionality as the current implementation, you can always send a contribution to pandas (https://github.com/pydata/pandas/blob/master/CONTRIBUTING.md).

But, as far as I understand, if you want to execute an SQL statement as "COPY table FROM file", you can always just use the execute command of the postgres engine.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.