I have a PostgreSQL events table partitioned by event_timestamp:
CREATE TABLE events
(
id SERIAL PRIMARY KEY,
event_timestamp TIMESTAMP NOT NULL,
processed BOOLEAN DEFAULT FALSE,
payload JSONB
) PARTITION BY RANGE (event_timestamp);
Currently, a single worker polls and processes events, marking them as processed to avoid reprocessing. The query used is:
SELECT *
FROM events
WHERE processed = false
ORDER BY event_timestamp
LIMIT 10_000;
To increase throughput, I need multiple workers. However, this risks duplicate processing as workers may select the same events simultaneously.
I'm seeking an efficient strategy to allow multiple workers to process events concurrently without duplicates. The solution should ensure each event is processed exactly once. How can I achieve this in PostgreSQL? Any guidance or examples would be greatly appreciated.
FOR UPDATE SKIP LOCKEDat the end of thisselectand that's it. Make sure your workers use separate sessions/transactions - some connection pools can be configured to re-use the same session and transaction for different queries, which won't work with this type of locking.