Increment counter based on column match (postgres)

Question

I have a column tracking unique user_id by project (for external use).

I want to increment the user_id column when creating a new user, subject to whatever the count is at for that project. There are many existing records already, but from here on out we want the user_id to increment up by 1 for each new record in a given project_id.

Example User table

id   user_id   project_id
-------------------------
1    100       1
2    101       1
3    1000      2
4    1001      2
5    17        3
6    18        3
7    102       1

New row with project_id = 1 should use user_id = 103
New row with project_id = 2 should use user_id = 1002
New row with project_id = 3 should use user_id = 19

How can I construct the user_id column and/or INSERT query such that it will always increment the user_id based on the largest existing user_id within the corresponding project_id, and guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts?

Any solution for this is going to be more expensive (and worse for concurrency) than a sequence based counter. Is the requirement to have the same numbers for different project_ids so important for you? — Laurenz Albe
– Laurenz Albe, Commented Dec 10, 2019 at 3:22
@LaurenzAlbe I'm not sure what you're asking, but the user_ids between projects aren't related to each other at all — Tyler
– Tyler, Commented Dec 10, 2019 at 5:34
I mean, why not just have a sequence that generates user_id independent of product_id? — Laurenz Albe
– Laurenz Albe, Commented Dec 10, 2019 at 6:07
That's basically what id is, but each project has users and needs to have unique, sequential user_ids within the project. — Tyler
– Tyler, Commented Dec 10, 2019 at 6:48
If you use a sequence, they will still be unique and increasing, only there will be gaps. Unless you have a legal requirement that forbids such gaps, embrace them. It will make everything faster, simpler and better. — Laurenz Albe
– Laurenz Albe, Commented Dec 10, 2019 at 7:20

Vladimir Baranov · Accepted Answer · 2020-03-07 09:41:30Z

The straightforward way to guarantee that no two users in the same project are assigned the same user_id upon concurrent inserts is to prevent concurrent activity.

One way to achieve it is to set transaction isolation level to Serializable.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- here I took the query from George Joseph's answer

insert into user_table
    (user_id, project_id)
select
    coalesce(max(user_id), 0) + 1 as user_id
    ,@project_id as project_id
from
    user_table
where
    project_id=@project_id

COMMIT TRANSACTION

You can run this query block from several sessions simultaneously and the engine will handle concurrency behind the scenes. I don't really know the details how Postgres does it. Most likely concurrent transactions will wait for the previous to finish.

For this to work efficiently you'll need an index on (project_id, user_id). You also need to make it unique to enforce your constraint. The order of columns in this index is important.

You also mentioned that you expect thousands of projects, and eventually up to millions of users per project. This adds up to a billion rows, which is quite a lot to run MAX against for each insert. Even with appropriate index.

You can create a separate table project_sequences to store the last value of user_id for each project_id. This table will have two columns project_id and last_user_id with primary key on both of them (project_id, last_user_id). The order of columns in index is important.

Now you can query and update the small table project_sequences with only 1000 rows for each insert in the main large table. I'm not familiar with Postgres syntax for variables, so below is pseudo-code.

BEGIN TRANSACTION

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- read the last_user_id for the given project_id from the small table into a variable
-- and increment it
-- update the small table with the new last_user_id
-- use the freshly generated user_id to insert into the main table


-- or, without variables
-- increment the last_user_id
update project_sequences
set last_user_id = 
    (
    select coalesce(max(last_user_id), 0) + 1 
    from project_sequences
    where project_id=@project_id
    )
where
    project_id=@project_id


-- use the new id to insert into the main table
insert into user_table
    (user_id, project_id)
select
    last_user_id
    ,@project_id as project_id
from
    project_sequences
where
    project_id=@project_id


COMMIT TRANSACTION

With variables it will be a bit easier to handle the case when the given project_id is the new one, that doesn't exist in the table yet and set the new user_id to start with 1, or whatever starting value you need.

Proposing slight change to accomodate a Null max value when 1st user is inserted: update project_sequences set last_user_id = ( select COALESCE(max(last_user_id),0) + 1 from project_sequences where project_id=@project_id ) where project_id=@project_id

Digvijay S · Accepted Answer · 2020-03-03 06:48:37Z

you need to use WITH Clause.

Here is the implementation.

--PostgreSQL 9.6
create table tab
(
    id SERIAL ,
    user_id integer ,
    project_id integer 
 );

INSERT INTO tab(user_id, project_id ) VALUES (100 ,      1);
INSERT INTO tab(user_id, project_id ) VALUES (101     ,   1);
INSERT INTO tab(user_id, project_id ) VALUES (1000    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (1001    ,   2);
INSERT INTO tab(user_id, project_id ) VALUES (17      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (18      ,   3);
INSERT INTO tab(user_id, project_id ) VALUES (102     ,   1);

create table src 
(
    project_id integer
);

insert into src values  (1); 
insert into src values (2);
insert into src values (3) ; 

select * from src ; 
select * from tab ; 

with cur as 
(
select project_id , max(user_id)  as max_user_id from tab group by project_id
)
INSERT INTO tab(user_id, project_id ) 
    SELECT cur.max_user_id +  row_number() over( partition by src.project_id  ) , src.project_id 
    from src inner join cur on src.project_id = cur.project_id  ;

select * from tab order by project_id , user_id ;

Result :

    project_id
1   1
2   2
3   3

    id  user_id project_id
1   1   100 1
2   2   101 1
3   3   1000    2
4   4   1001    2
5   5   17  3
6   6   18  3
7   7   102 1

    id  user_id project_id
1   1   100 1
2   2   101 1
3   7   102 1
4   8   103 1
5   3   1000    2
6   4   1001    2
7   9   1002    2
8   5   17  3
9   6   18  3
10  10  19  3

https://rextester.com/HREM53701

3 Comments

Tyler Over a year ago

We need to guarantee that 2 users will never receive the same user_id if in the same project, so your concurrency point means this won't work for us.

George Joseph Over a year ago

You could add a unique constraint/index on (project_id,user_id) field combination. So in the case of concurrent inserts, one of the sessions would get the next_up number while the other one would get an error

Tyler Over a year ago

It seems like there should be a way to not get that error in the first place, which is what we're hoping to accomplish. Of course we could handle the error in the application logic, but it seems surprising to me that Postgres wouldn't have this case covered.

Dan · Accepted Answer · 2020-03-05 22:14:23Z

0

I suggest using a trigger before the insert, that way your 99.99% shure you wont have duplicates and holes in the sequence (101,102,missing,111,112).

The problem with sequences is that if not used carefully, you can lose control of the current number. Ending up with missing numbers in the database.

Just make the trigger take care of incrementing the number.

Also, in this way, you don't have to worry in complicated queries that consume a lot of memory and process power.

The trigger is:

CREATE OR REPLACE FUNCTION set_user_id()
    RETURNS trigger AS $set_user_id$
BEGIN
    IF NEW.user_id IS NULL
    THEN
        NEW.user_id = COALESCE( ( SELECT MAX(user_id) FROM data WHERE project_id = NEW.project_id ), 0 ) + 1;
    END IF;
    RETURN NEW;
END $set_user_id$ LANGUAGE plpgsql;

CREATE TRIGGER table_user_id
    BEFORE INSERT ON data
    FOR EACH ROW EXECUTE PROCEDURE set_user_id();

NOTES:

It only increments the user if the insert sends null to the user_id. For example:

INSERT INTO data (project_id) VALUES (1);

or

INSERT INTO data (user_id,project_id) VALUES (NULL,1);

With your example data, this would insert

id   user_id   project_id
-------------------------
8    103       1

If there is no previous user_id, it sets the value in 1 (COALESCE)

INSERT INTO data (project_id) VALUES (4);

With your example data, this would insert

id   user_id   project_id
-------------------------
9    1         4

If you want to set a starting user_id, you just have to set it on the first insert of that project_id.

INSERT INTO data (user_id,project_id) VALUES (10,5);

With your example data, this would insert

id   user_id   project_id
-------------------------
10   10        5

answered Mar 5, 2020 at 22:14

Dan

1,9211 gold badge16 silver badges20 bronze badges

4 Comments

Tyler Over a year ago

Can you elaborate on "99.99% sure you won't have duplicates"? What is the scenario where we would?

Dan Over a year ago

Well, since in this scenario the insertion is delegated to the postgresql engine, the handling of the inserts falls into the way postgresql inserts data. It is extremely rare but possible that two queries execute at the exact same time, at that time you can have duplicates. For that, I recomend you add a UNIQUE constraint for the columns user_id, project_id (must have both). Here's the manual regarding concurrency. I believe It should be enough with the UNIQUE constraint

J Sidhu Over a year ago

Triggers are useful but very hard to manage and way too under the layers to ever make me 'like' them in last 15 years of my database development experience. My recommendation is to stay away from triggers.

Dan Over a year ago

In PostgreSQL, triggers are different than other DBMS. If you don't over complicate them, they work really good. If you need something that takes to much processing time and power, then you should change your approach, but for something as simple as an increment, triggers are just fine.

supun randima · Accepted Answer · 2019-12-10 03:18:40Z

-1

for auto-incrementing your id you can do 3 methods,

use an identity eg - when creating a table

    create table a( key int identity(1,1)) 

    -- first "1" is initial value

       -- second"1" is a value which is added to next one

create a sequence

create sequence seq_name

as dat_type -- bigint

start with 1--one

increment by 1 - incrementing value

ref - https://www.techonthenet.com/sql_server/sequences.php

3 - use select sum(col_name) from t_name ##### from programming code and add one value to retrieved value and use that value for id which is going to add in newly created id..

edited Dec 10, 2019 at 3:18

answered Dec 10, 2019 at 3:12

supun randima

72 bronze badges

Collectives™ on Stack Overflow

Increment counter based on column match (postgres)

Example User table

5 Answers 5

1 Comment

Comments

3 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Example User table

5 Answers 5

1 Comment

Comments

3 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related