4

I have a postgres( PostgreSQL 12.3) db table which consists of 100k rows and I have a python script which reads from this table and perform some action based on the data. I want to run the same script through multiple machines so that data can be processed faster. But when I run from multiple machines, I want to make sure one row is processed by only one machine at a time, basically achieving it by locking that row.

Can you provide some pointers on how locking row can be achieved through python. I am using psycopg2 module to read and update data from table, but did not find a way to lock the row data.

1 Answer 1

6

Use SELECT ... FOR UPDATE SKIP LOCKED, which performs a row-level lock on any rows that are returned, and skips rows that are already locked by another transaction.

Session 1:

testdb=# create table tt(x text);
CREATE TABLE
testdb=# insert into tt select 'foo';
INSERT 0 1
testdb=# insert into tt select 'bar';
INSERT 0 1
testdb=# BEGIN;
BEGIN
testdb=# SELECT * FROM tt ORDER BY x LIMIT 1 FOR UPDATE SKIP LOCKED;
  x  
-----
 bar
(1 row)

testdb=# -- SELECT in session 2 is performed now.
testdb=# commit;
COMMIT

Session 2:

testdb=# BEGIN;
BEGIN
testdb=# SELECT * FROM tt ORDER BY x LIMIT 1 FOR UPDATE SKIP LOCKED;
  x  
-----
 foo
(1 row)

testdb=# commit;
COMMIT

You won't need to do anything special with psycopg2 for this, as long as your query already has a reasonably small limit (so the SELECT doesn't lock every row), adding FOR UPDATE SKIP LOCKED should do what you're asking for. See the docs on the locking clause for more details.

Sign up to request clarification or add additional context in comments.

5 Comments

Session 1: postgres=# select * from company limit 1 for update skip locked; id | name | age | address | salary ----+------+-----+----------------------------------------------------+-------- 1 | Paul | 32 | California | 20000 Session 2: postgres=# select * from company limit 1 for update skip locked; 1 | Paul | 32 | California | 20000 I tried the same , but got same o/p in both sessions. I am using psql shell from default pg install.
In psql, unless you issue a BEGIN, every statement is it's own transaction, so the lock is released as soon as the SELECT finishes. With psycopg2, the transaction is started implicitly and lasts until you issue a connection.commit().
Thanks, it works now. Just curious, will there be any chance that two sessions could pick up the same row, and one gets the lock finally? I am just thinking if in that case which session will process the data and wot happens to the other session. In my setup, I am using the same user to access the table in both sessions..
Also, for pscopyg2, can we say that the transaction will close if the script exits even if the connect.commit() was not issued, and hence lock is released?
No, if a connection with an open transaction gets a row from a SELECT FOR UPDATE SKIP LOCKED query, no other transaction can return that from from their own SELECT FOR UPDATE until the 1st has either committed or rolled back. And yes, a connection closing without issuing a commit is treated as a rollback, and the lock will be released as soon as PG notices the connection is closed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.