Locking Postgres Tables with Twisted Python

Question

I have a twisted daemon that does some xml feed parsing.

I store my data in PostgreSQL via twisted.enterprise.adbapi , which IIRC is wrapping psycopg2

I've run into a few problems with storing data into database -- with duplicate data periodically getting in there.

To be honest, there are some underlying issues with my implementation which should be redone and designed much better. I lack the time and resources to do that though - so we're in 'just keep it running' mode for now.

I think the problem may happen from either my usage of deferToThread or how I've spawned the server at the start.

As a quick overview of the functionality I think is at fault:

Twisted queries Postgres for Accounts that should be analyzed , and sets a block on them

SELECT 
    id 
FROM 
    xml_sources 
WHERE 
    timestamp_last_process < ( CURRENT_TIMESTAMP AT TIME ZONE 'UTC' - INTERVAL '4 HOUR' ) 
    AND
    is_processing_block IS NULL ;

lock_ids = [ i['id'] for i in results ]

UPDATE xml_sources SET is_processing_block = True WHERE id IN %(lock_ids)s

What I think is happening, is (accidentally) having multiple servers running or various other issues results in multiple threads processing this data.

I think this would likely be fixed - or at least ruled out as an issue - if I wrap this quick section in an exclusive table lock.

I've never done table locking through twisted before though. can anyone point me in the right direction ?

Glyph · Accepted Answer · 2012-09-12 23:28:37Z

1

You can do a SELECT FOR UPDATE instead of a plain SELECT, and that will lock the rows returned by your query. If you actually want table locking you can just issue a LOCK statement, but based on the rest of your question I think you want row locking.

If you are using adbapi, then keep in mind that you will need to use runInteraction if you want to run more than one statement in a transaction. Functions passed to runInteraction will run in a thread, so you may need to use callFromThread or blockingCallFromThread to reach from the database interaction back into the reactor.

However, locking may not be your problem. For one thing, if you are mixing deferToThread and adbapi, something's likely wrong. adbapi is already doing the equivalent of deferToThread for you. You should be able to do everything on the main thread.

You'll have to include a representative example for a more specific answer though. Consider your question: it's basically "Sometimes I get duplicate data, with a self-admittedly problematic implementation, that is big and I can't fix and I also can't show you." This is not a question which it is possible to answer.

answered Sep 12, 2012 at 23:28

Glyph

32.1k12 gold badges93 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jonathan Vanasco Over a year ago

Thanks. 1. I'll implement the select for update. 2. I'll read up on runInteraction

Jonathan Vanasco Over a year ago

the full 'thanks' didn't go through on the comment. Yes the question could have been better - but I would have had to reformat 200+ lines of code to convey this. so i should do that in another question.

Collectives™ on Stack Overflow

Locking Postgres Tables with Twisted Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related