I have a twisted daemon that does some xml feed parsing.
I store my data in PostgreSQL via twisted.enterprise.adbapi , which IIRC is wrapping psycopg2
I've run into a few problems with storing data into database -- with duplicate data periodically getting in there.
To be honest, there are some underlying issues with my implementation which should be redone and designed much better. I lack the time and resources to do that though - so we're in 'just keep it running' mode for now.
I think the problem may happen from either my usage of deferToThread or how I've spawned the server at the start.
As a quick overview of the functionality I think is at fault:
Twisted queries Postgres for Accounts that should be analyzed , and sets a block on them
SELECT
id
FROM
xml_sources
WHERE
timestamp_last_process < ( CURRENT_TIMESTAMP AT TIME ZONE 'UTC' - INTERVAL '4 HOUR' )
AND
is_processing_block IS NULL ;
lock_ids = [ i['id'] for i in results ]
UPDATE xml_sources SET is_processing_block = True WHERE id IN %(lock_ids)s
What I think is happening, is (accidentally) having multiple servers running or various other issues results in multiple threads processing this data.
I think this would likely be fixed - or at least ruled out as an issue - if I wrap this quick section in an exclusive table lock.
I've never done table locking through twisted before though. can anyone point me in the right direction ?