1

I feel like this is a very basic question, but I cannot find a comprehensive answer on this. I am wondering what is best practice for copying python data types, like lists or dictionaries, to a postgres database (using psycopg2).

Assuming I create or have a table that I want to fill up with data from a previous calculation, like

data = [(n, random.randint(0, 100)) for n in range(0, 100)]

As far as I know, the standard way to copy the data would be something like

curs.executemany("""
  INSERT INTO my_table (id, rand_int)
  VALUES (%s, %s)""", data)

I am guessing this will loop through the list and be rather slow. Is there a smarter or more efficient way?

UPDATE:

In the meantime I found this answer, suggesting to use something like:

args_str = ','.join(cur.mogrify("(%s,%s)", x) for x in data)
cur.execute("INSERT INTO table VALUES " + args_str) 

And Craig suggests in his answer here to use copy_from. So my question has changed a little:

What is the most efficient method and how could it be implemented?

4
  • 2
    I'm voting to close this question as off-topic because this is a Python question and, as such, belongs on StackOverflow - recommend migration. Commented Jun 21, 2015 at 21:29
  • @Vérace while I agree with being OT here, question about improvement of a working solution are not overly welcome on SO. Commented Jun 21, 2015 at 21:56
  • I've only posted a few times on SO (mostly to give a bounty for a problem I was having). I'm surprised though - surely optimisation is a significant part of programming? AFAIC, questions about optimising poorly performing SQL are on-topic here. Commented Jun 21, 2015 at 21:59
  • oh. I was under the impression that all database related questions should go here (after seeing people complaining on SO). Will move the question to SO once I am back on my computer. thanks anyway. Commented Jun 21, 2015 at 22:53

1 Answer 1

2

For big data sets, use COPY, via psycopg2's copy_from function.

http://initd.org/psycopg/docs/cursor.html#cursor.copy_from

See also how to speed up insertion performance in PostgreSQL.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you! copy_from is actually the function that got me thinking in the first place. However, I am not sure how to make it work with data types. Would I need to pickle it first or something? May I ask you to give an example how this could look like? For instance based on my dummy case above.
You need to form a CSV row per tuple, where each field is a PostgreSQL-input-compatible representation of a value of that column's type. The Python csv module is very useful for creating the stream of rows. Most values are trivial to store - integers, floating point values, text, etc you just store as-is. Dates you just format as YYYY-MM-DD. Only a few things, like timestamps, take any real thought about how to represent them in the CSV input.
so something like data_csv = [str(i) for i in data] and then cur.copy_from(data_csv, 'my_table', columns=('id', 'rand_int'))? Would you mind if I edited your answer?
No, because the data_csv is a list. You need a file-like object where each data row is a comma-separated string, terminated with a newline. Use the Python csv module. I don't have time to write a canned example right now, but I have the feeling I might've written one in the past; search around for examples using psycopg2's copy_from

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.