2

I have a simple scenario, where I want to atomically read and modify the state of a row. But the row may not exist yet.

For this example, I use the user_group_membership table:

user_id (pk) | group_id (pk) | state
-------------------------------------
1            | 3             | joined
  • User 1 is member of Group 3 with state joined (can also be invited or left or banned).
  • User 2 is not a member of Group 3. And never has been a member of that Group since there's no row in the table

The state value works like a state machine. There's a limited set of transitions:

null (no row present) -> invited, banned
invited -> joined, banned
joined  -> left, banned
left    -> invited, banned
banned  -> invited, left

If a row is already present I can use a SELECT ... FOR UPDATE to get the current state, validate the transition, update the state and commit the transaction. All other concurrent transactions will "wait" for the lock to be released. That's fine. In this case all state transitions run sequentially.

But if there is no row in the table, there's nothing to lock. So all concurrent transactions will try to execute an INSERT. The first will succeed and the rest will fail because of the duplicate primary key.

At this point I just could "rerun" the whole code, because now I know that the row exists and it will use the SELECT ... FOR UPDATE for locking/waiting. But I don't want to execute the same code twice. I'm looking for a more elegant solution.

What I came up with so far

This is a replacement for the SELECT ... FOR UPDATE:

INSERT INTO user_group_membership (user_id, group_id, state)
VALUES (2, 3, 'DUMMY_FOR_THE_ROW_LOCK')
ON CONFLICT (user_id, group_id) DO UPDATE
SET user_id = EXCLUDED.user_id
RETURNING *;

-- application code for validating state transition

UPDATE user_group_membership 
SET state = 'INVITED'
WHERE user_id = 2 AND group_id = 3;
  • This should prevent the situation where multiple concurrent transactions try to INSERT and will hit a duplicate key error.

  • The DO UPDATE part basically is a no-op, but it seems to be necessary to get RETURNING to work properly. This effectively replaces the SELECT.

Questions

  • Is this the right way to handle this scenario?
  • Is it "safe"?
  • Is there a better / easier solution?

Followup questions

  • How to properly handle dummy values? The state column is not nullable and of type enum (invited, joined, left, banned). Introducing a new enum value, which should never be used outside of this locking mechanism feels wrong. But I need some value to create and lock the row. Any ideas?

1 Answer 1

1

How to properly handle dummy values? The state column is not nullable

Don't use them. If you do, make it nullable and denote lack of user-group relation with a state is null.

Is this the right way to handle this scenario?
Is it "safe"?
Is there a better / easier solution?

If it works, it works. It's not unsafe but magic/dummy/flag/trip/rogue/signal/sentinel values aren't very elegant.

Make the selection, validation and upsert a single operation:
demo at db<>fiddle

prepare find_validate_apply(int,int,text) as
with find as(
  select state
  from user_group_membership as f
  where $1=user_id and $2=group_id
  for update of f
  --limit 1--unncessary given the uniqueness and non-nullability
),empty_as_null as(--`coalesce()` for rows
 (select state from find)
  union(select null)
  order by 1 nulls last limit 1
),validate as(
  select exists(select from allowed_transitions as t
                where t.source_state is not distinct from found.state
                and t.target_state is not distinct from $3) 
         as is_transition_allowed
  from empty_as_null as found
),apply as(
  insert into user_group_membership
  select $1,$2,$3
  from validate 
  where is_transition_allowed
  on conflict(user_id,group_id)do update
  set state=$3
  returning *)
select*from apply;

Now each worker can wait for others to finish their whole thing before they attempt to apply their desired transition. Note that this doesn't require a dummy value or sentinel row - they lock each other out either in the initial select or in the final upsert (concurrent writes to the unique pk index), but the whole thing is one atomic operation so it doesn't matter which one exactly.

The demo uses an audit trigger to track operations on the table and dblink to run parallel workers - at the end you can see how they queued up and applied their changes one after another:

create function trg1_f()returns trigger as $f$
begin
  create table if not exists user_group_membership_history
  as select TG_OP,transaction_timestamp(),clock_timestamp()
           ,OLD.user_id as OLD_user_id
           ,OLD.group_id as OLD_group_id
           ,OLD.state as OLD_state
           ,NEW.* 
  limit 0;

  insert into user_group_membership_history
  select TG_OP,now(),clock_timestamp(),OLD.*,NEW.*;
  
  return new;
end $f$ language plpgsql;

create trigger trg1 
after insert or update or delete
on user_group_membership
for each row execute function trg1_f();
create extension dblink;
select dblink_connect('another_worker1','');--defaults to localhost here
select dblink_connect('another_worker2','');
select dblink_send_query('another_worker1',
  $q$ begin;
      select pg_sleep(0.5);
      select find_validate_apply(9,9,'invited');
      select pg_sleep(1);
      commit;
  $q$);

select dblink_send_query('another_worker2',
  $q$ begin;
      select pg_sleep(0.7);
      select find_validate_apply(9,9,'banned');
      select pg_sleep(1);
      commit;
  $q$);
select pg_sleep(4);

table user_group_membership_history;
user_id group_id state
9 9 banned
tg_op transaction_timestamp clock_timestamp old_user_id old_group_id old_state user_id group_id state
INSERT 2024-11-13 16:18:18.929716+00 2024-11-13 16:18:19.94236+00 null null null 9 9 invited
UPDATE 2024-11-13 16:18:18.932864+00 2024-11-13 16:18:20.946128+00 9 9 invited 9 9 banned

The allowed_transitions idea is self-explanatory:

create table allowed_transitions(source_state,target_state)
as values
 (null,'invited'),(null, 'banned')
,('invited','joined'),('invited','banned')
,('joined','left'),('joined','banned')
,('left','invited'),('left','banned')
,('banned','invited'),('banned','left');
alter table allowed_transitions 
  add constraint uniq unique(source_state,target_state);
Sign up to request clarification or add additional context in comments.

2 Comments

Sadly the validation involves some extra application code, that can't be easily integrated into the SQL query itself. ... Still not sure about the dummy vs null thing. Its basically the same thing. It just needs a "nice explanation" like you said: "denote lack of user-group relation".
One of the main concepts behind null is that it's the go-to dummy, value-unknown flag. Technically, it doesn't matter what specific value out of the allowed range you sacrifice, even less so in case of a small, fairly constrained enum. What null gets you is a set of null-handling functions, assumptions and built-in behaviours you'll have re-implement or do away without. If there's something external required by but not possible inject, reference (postgres_fdw) or request (pg_http) from within the statement, it's an important detail - be sure to underline that in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.