2

I have a column tracking unique user_id by project (for external use).

I want to increment the user_id column when creating a new user, subject to whatever the count is at for that project. There are many existing records already, but from here on out we want the user_id to increment up by 1 for each new record in a given project_id.

Example User table

id   user_id   project_id
-------------------------
1    100       1
2    101       1
3    1000      2
4    1001      2
5    17        3
6    18        3
7    102       1

New row with project_id = 1 should use user_id = 103
New row with project_id = 2 should use user_id = 1002
New row with project_id = 3 should use user_id = 19

How can I construct the user_id column and/or INSERT query such that it will always increment the user_id based on the largest existing user_id within the corresponding project_id, and guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts?

10
  • 2
    Any solution for this is going to be more expensive (and worse for concurrency) than a sequence based counter. Is the requirement to have the same numbers for different project_ids so important for you? Commented Dec 10, 2019 at 3:22
  • @LaurenzAlbe I'm not sure what you're asking, but the user_ids between projects aren't related to each other at all Commented Dec 10, 2019 at 5:34
  • I mean, why not just have a sequence that generates user_id independent of product_id? Commented Dec 10, 2019 at 6:07
  • That's basically what id is, but each project has users and needs to have unique, sequential user_ids within the project. Commented Dec 10, 2019 at 6:48
  • 1
    If you use a sequence, they will still be unique and increasing, only there will be gaps. Unless you have a legal requirement that forbids such gaps, embrace them. It will make everything faster, simpler and better. Commented Dec 10, 2019 at 7:20

5 Answers 5

3
+50

The straightforward way to guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts is to prevent concurrent activity.

One way to achieve it is to set transaction isolation level to Serializable.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- here I took the query from George Joseph's answer

insert into user_table
    (user_id, project_id)
select
    coalesce(max(user_id), 0) + 1 as user_id
    ,@project_id as project_id
from
    user_table
where
    project_id=@project_id

COMMIT TRANSACTION

You can run this query block from several sessions simultaneously and the engine will handle concurrency behind the scenes. I don't really know the details how Postgres does it. Most likely concurrent transactions will wait for the previous to finish.

For this to work efficiently you'll need an index on (project_id, user_id). You also need to make it unique to enforce your constraint. The order of columns in this index is important.


You also mentioned that you expect thousands of projects, and eventually up to millions of users per project. This adds up to a billion rows, which is quite a lot to run MAX against for each insert. Even with appropriate index.

You can create a separate table project_sequences to store the last value of user_id for each project_id. This table will have two columns project_id and last_user_id with primary key on both of them (project_id, last_user_id). The order of columns in index is important.

Now you can query and update the small table project_sequences with only 1000 rows for each insert in the main large table. I'm not familiar with Postgres syntax for variables, so below is pseudo-code.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- read the last_user_id for the given project_id from the small table into a variable
-- and increment it
-- update the small table with the new last_user_id
-- use the freshly generated user_id to insert into the main table


-- or, without variables
-- increment the last_user_id
update project_sequences
set last_user_id = 
    (
    select coalesce(max(last_user_id), 0) + 1 
    from project_sequences
    where project_id=@project_id
    )
where
    project_id=@project_id


-- use the new id to insert into the main table
insert into user_table
    (user_id, project_id)
select
    last_user_id
    ,@project_id as project_id
from
    project_sequences
where
    project_id=@project_id


COMMIT TRANSACTION

With variables it will be a bit easier to handle the case when the given project_id is the new one, that doesn't exist in the table yet and set the new user_id to start with 1, or whatever starting value you need.

Sign up to request clarification or add additional context in comments.

1 Comment

Proposing slight change to accomodate a Null max value when 1st user is inserted: update project_sequences set last_user_id = ( select COALESCE(max(last_user_id),0) + 1 from project_sequences where project_id=@project_id ) where project_id=@project_id
2

you need to use WITH Clause.

Here is the implementation.

--PostgreSQL 9.6
create table tab
(
    id SERIAL ,
    user_id integer ,
    project_id integer 
 );

INSERT INTO tab(user_id, project_id ) VALUES (100 ,      1);
INSERT INTO tab(user_id, project_id ) VALUES (101     ,   1);
INSERT INTO tab(user_id, project_id ) VALUES (1000    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (1001    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (17      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (18      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (102     ,   1);

create table src 
(
    project_id integer
);

insert into src values  (1); 
insert into src values (2);
insert into src values (3) ; 

select * from src ; 
select * from tab ; 

with cur as 
(
select project_id , max(user_id)  as max_user_id from tab group by project_id
)
INSERT INTO tab(user_id, project_id ) 
    SELECT cur.max_user_id +  row_number() over( partition by src.project_id  ) , src.project_id 
    from src inner join cur on src.project_id = cur.project_id  ;

select * from tab order by project_id , user_id ;

Result :

    project_id
1   1
2   2
3   3

    id  user_id project_id
1   1   100 1
2   2   101 1
3   3   1000    2
4   4   1001    2
5   5   17  3
6   6   18  3
7   7   102 1

    id  user_id project_id
1   1   100 1
2   2   101 1
3   7   102 1
4   8   103 1
5   3   1000    2
6   4   1001    2
7   9   1002    2
8   5   17  3
9   6   18  3
10  10  19  3

https://rextester.com/HREM53701

Read more about with clause here

https://www.tutorialspoint.com/postgresql/postgresql_with_clause.htm

Comments

0

You can do this by finding out the max value of user_id by project_id and then increment it by 1. If you are having a multi user scenario then need to consider some kind of serialization to ensure that concurrent users do not make use of the same number Eg: Assuming you are going to pass the project_id as a variable @project_id

insert 
  into user_table
       (user_id
        ,project_id
       )
select 
       (select max(user_id)+1
          from user_table
         where project_id=@project_id) as user_id
       ,@project_id

3 Comments

We need to guarantee that 2 users will never receive the same user_id if in the same project, so your concurrency point means this won't work for us.
You could add a unique constraint/index on (project_id,user_id) field combination. So in the case of concurrent inserts, one of the sessions would get the next_up number while the other one would get an error
It seems like there should be a way to not get that error in the first place, which is what we're hoping to accomplish. Of course we could handle the error in the application logic, but it seems surprising to me that Postgres wouldn't have this case covered.
0

I suggest using a trigger before the insert, that way your 99.99% shure you wont have duplicates and holes in the sequence (101,102,missing,111,112).

The problem with sequences is that if not used carefully, you can lose control of the current number. Ending up with missing numbers in the database.

Just make the trigger take care of incrementing the number.

Also, in this way, you don't have to worry in complicated queries that consume a lot of memory and process power.

The trigger is:

CREATE OR REPLACE FUNCTION set_user_id()
    RETURNS trigger AS $set_user_id$
BEGIN
    IF NEW.user_id IS NULL
    THEN
        NEW.user_id = COALESCE( ( SELECT MAX(user_id) FROM data WHERE project_id = NEW.project_id ), 0 ) + 1;
    END IF;
    RETURN NEW;
END $set_user_id$ LANGUAGE plpgsql;

CREATE TRIGGER table_user_id
    BEFORE INSERT ON data
    FOR EACH ROW EXECUTE PROCEDURE set_user_id();

NOTES:

  • It only increments the user if the insert sends null to the user_id. For example:

INSERT INTO data (project_id) VALUES (1);

or

INSERT INTO data (user_id,project_id) VALUES (NULL,1);

With your example data, this would insert

id   user_id   project_id
-------------------------
8    103       1
  • If there is no previous user_id, it sets the value in 1 (COALESCE)

INSERT INTO data (project_id) VALUES (4);

With your example data, this would insert

id   user_id   project_id
-------------------------
9    1         4
  • If you want to set a starting user_id, you just have to set it on the first insert of that project_id.

INSERT INTO data (user_id,project_id) VALUES (10,5);

With your example data, this would insert

id   user_id   project_id
-------------------------
10   10        5

4 Comments

Can you elaborate on "99.99% sure you won't have duplicates"? What is the scenario where we would?
Well, since in this scenario the insertion is delegated to the postgresql engine, the handling of the inserts falls into the way postgresql inserts data. It is extremely rare but possible that two queries execute at the exact same time, at that time you can have duplicates. For that, I recomend you add a UNIQUE constraint for the columns user_id, project_id (must have both). Here's the manual regarding concurrency. I believe It should be enough with the UNIQUE constraint
Triggers are useful but very hard to manage and way too under the layers to ever make me 'like' them in last 15 years of my database development experience. My recommendation is to stay away from triggers.
In PostgreSQL, triggers are different than other DBMS. If you don't over complicate them, they work really good. If you need something that takes to much processing time and power, then you should change your approach, but for something as simple as an increment, triggers are just fine.
-1

for auto-incrementing your id you can do 3 methods,

  1. use an identity eg - when creating a table

        create table a( key int identity(1,1)) 
    
        -- first "1" is initial value
    
           -- second"1" is a value which is added to next one
    
  2. create a sequence

create sequence seq_name

as dat_type -- bigint

start with 1--one

increment by 1 - incrementing value

ref - https://www.techonthenet.com/sql_server/sequences.php

3 - use select sum(col_name) from t_name ##### from programming code and add one value to retrieved value and use that value for id which is going to add in newly created id..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.