2

Ok my Giant friends once again I seek a little space in your shoulders :P

Here is the issue, I have a python script that is fixing some database issues but it is taking way too long, the main update statement is this:

cursor.execute("UPDATE jiveuser SET username = '%s' WHERE userid = %d" % (newName,userId))

That is getting called about 9500 times with different newName and userid pairs...

Any suggestions on how to speed up the process? Maybe somehow a way where I can do all updates with just one query?

Any help will be much appreciated!

PS: Postgres is the db being used.

4
  • Could you post the output of "EXPLAIN ANALYZE <your_query>"? Commented Jun 3, 2010 at 17:57
  • Can you give some idea of how long is too long? Ten seconds, five hours? Commented Jun 6, 2010 at 20:19
  • @Anonymouse I need it to run in 2.5h and it is currently taking 5 Commented Jun 7, 2010 at 23:20
  • I don't understand why this can't run in seconds? Commented Jun 9, 2010 at 1:23

8 Answers 8

4

Insert all the data into another empty table (called userchanges, say) then UPDATE in a single batch:

UPDATE jiveuser
SET username = userchanges.username
FROM userchanges
WHERE userchanges.userid = jiveuser.userid
    AND userchanges.username <> jiveuser.username

See this documentation on the COPY command for bulk loading your data.

There are also tips for improving performance when populating a database.

Sign up to request clarification or add additional context in comments.

9 Comments

Make that a temporary table and you don't have to bother about it after you're done.
@Cade Roux: I updated your statement a bit as PostgreSQL's UPDATE...FROM construct differs a little from what you wrote (but close enough that I only noticed it on the second pass...)
Also, look into using COPY to push the data to the server, although it will need some driver-specific support to do that (usually called putcopydata or similar)
So in order to create that temp table I'll need to run the same number of insert statements as I was running update statements before... Are inserts that much faster than updates done based on an indexed field?
@Jay, Like I said, do one insert of 9500 rows and one update of 9500 rows. Fewer large set operations are generally faster on RDBMS than many small set operations.
|
3

First of all, do not use the % operator to construct your SQL. Instead, pass your tuple of arguments as the second parameter to cursor.execute, which also negates the need to quote your argument and allows you to use %s for everything:

cursor.execute("UPDATE jiveuser SET username = %s WHERE userid = %s", (newName, userId))

This is important to prevent SQL Injection attacks.

To answer your question, you can speed up these updates by creating an index on the userid column, which will allow the database to update in O(1) constant time rather than having to scan the entire database table, which is O(n). Since you're using PostgreSQL, here's the syntax to create your index:

CREATE INDEX username_lookup ON jiveuser (userid);

EDIT: Since your comment reveals that you already have an index on the userid column, there's not much you could possibly do to speed up that query. So your main choices are either living with the slowness, since this sounds like a one-time fix-something-broken thing, or following VeeArr's advice and testing whether cursor.executemany will give you a sufficient boost.

4 Comments

Well I have an index for the userid which is what is in the where clause... would an index on the username really help?
@hdx: No, that was my mistake, I carelessly typed username rather than userid, and I didn't realize you already had an index on that column. I have just corrected my typo.
Is it %s the placeholder per the Postgres DBAPI? I don't know, I'm asking, since the placeholder is typically ?.
@TZ: %s is the DB-API 2.0 standard placeholder; implementations may use placeholders such as ? as well as %s, but my understanding is that %s is required to work for an implementation to be considered DB-API 2.0 compliant.
3

The reason it's taking so long is probably that you've got autocommit enabled and each update gets done in its own transaction.

This is slow because even if you have a battery-backed raid controller (which you should definitely have on all database servers, of course), it still needs to do a write into that device for every transaction commit to ensure durability.

The solution is to do more than one row per transaction. But don't make transactions TOO big or you run into problems too. Try committing every 10,000 rows of changes as a rough guess.

Comments

2

You might want to look into executemany(): Information here

Comments

1

Perhaps you can create an index on userid to speed things up.

Comments

1

I'd do an explain on this. If it's doing an indexed lookup to find the record -- which it should if you have an index on userid -- then I don't see what you could do to improve performance. If it's not using the index, then the trick is figuring out why not and fixing it.

Oh, you could try using a prepared statement. With 9500 inserts, that should help.

Comments

0

Move this to a stored procedure and execute it from the database self.

1 Comment

well, by the time you've put the data onto the db in a temp table or similar, you don't need a server function.
0

First ensure you have an index on 'userid', this will ensure the dbms doesn't have to do a table scan each time

CREATE INDEX jiveuser_userid ON jiveuser (userid);

Next try preparing the statement, and then calling execute on it. This will stop the optimizer from having to examine the query each time

PREPARE update_username(string,integer) AS UPDATE jiveuser SET username = $1 WHERE userid = $2;
EXECUTE update_username("New Name", 123);

Finally, a bit more performance could be squeezed out by turning off autocommit

\set autocommit off

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.