Running count in postgresql UPDATE

Question

I'm using Postgresql and I'm struggling updating some values in a already created column with running count based on clientID.

The goal is to recognize when a client is 'NEW' (none of the previous values contains that customerID specifically) or if the customer is 'EXISTING' (there is at least one transaction before associated to that customer).

Here is an image of what I got now

Here is an image of what I want to achieve

(customer or client are the same thing)

In my research I found that the use of sub-queries may cause long times of code execution, and I was looking how to use OVER and PARTITION BY clauses mixed with CASE, but still can't get a solution (all my errors are basically syntax error)

Sites visited but I failed the task anyway:

https://www.sqlservercentral.com/articles/cumulative-sum-of-previous-rows

Running Count Total with PostgresQL I look a lot of similar questions but I was unable to transform the SELECT query into the UPDATE statement that I need.

Sample data is better presented as formatted text. See here for some tips on how to create nice looking tables. — user330315
– user330315, Commented Jul 1, 2020 at 9:10

score 1 · Accepted Answer · 2020-07-01 09:19:25Z

you can do that with a window function that uses a running count:

select transact_id, client_id, 
       case count(*) over (partition by client_id order by transact_id) 
          when 1 then 'NEW'
          else 'EXISTING'
        end as client_status
from my_table
order by transact_id;

The expression count(*) over (partition by client_id order by transact_id) counts the number of rows per client_id up to "the current" row. So if the count is 1, this is the first occurance of the client_id and the NEW is displayed. For everything that is bigger than 1, EXISTING will be displayed.

If you want to update the existing column, you can use the above query as the source for an UPDATE.

update my_table
  set client_status = t.client_status
from (
  select transact_id, client_id, 
         case  count(client_id) over (partition by client_id order by transact_id) 
            when 1 then 'NEW'
            else 'EXISTING'
          end as client_status
  from my_table
) t
where my_table.transact_id = t.transact_id;

The above assumes that transact_id is the primary key or unique in the table.

Online example

Jack · Accepted Answer · 2020-07-01 08:52:49Z

0

With some aggregation you can achieve the result without using OVER/PARTITION (which I don't know how to use).

So you have this table with columns tid, cid, status.

First do a query that for each pair <tid,cid> select 1 if exists a row with same cid and lower tid, or 0 if such row doesn't exist.

Than apply an aggregation to obtain a row like <tid, cid, sum()> so that for each pair <tid,cid> you know the number of rows with same cid but lower tid (of course it may exist more than one row).

Then do ~~your update, either using CASE WHEN, or make~~ two updates like in the following example:

with 
    data as (
        select
            t1.tid, t1.cid, 
            case when t2.tid is null then 0 else 1 end as cnt 
        from tab t1 
        left join tab t2 
            on t1.cid = t2.cid and t1.tid > t2.tid order by t1.tid
), 
    aggreg as (
        select tid, cid, sum(cnt) 
        from data group by tid, cid order by tid
)
update tab set status = 'EXISTING' 
where (tid,cid) in (select tid,cid from aggreg where sum > 0);

and

with 
    data as (
        select
            t1.tid, t1.cid, 
            case when t2.tid is null then 0 else 1 end as cnt 
        from tab t1 
        left join tab t2 
            on t1.cid = t2.cid and t1.tid > t2.tid order by t1.tid
), 
    aggreg as (
        select tid, cid, sum(cnt) 
        from data group by tid, cid order by tid
)
update tab set status = 'NEW' 
where (tid,cid) in (select tid,cid from aggreg where sum = 0);

Of course you may consider to run the second of these two queries with a simple update where status is null, which should run considerably faster.

edited Jul 1, 2020 at 8:52

answered Jun 30, 2020 at 9:21

Jack

1,6161 gold badge15 silver badges24 bronze badges

2 Comments

Ignacio Muñoz Over a year ago

First, thanks for your reply! I tried and definitely it is fast to run. But I am unable to know how to transform it into a one statement CASE WHEN as you said. Also, I forgot to mention that sometimes customerID is missing, so there will be three options for the CASE (NEW, EXISTING and NOT IDENTIFIED).

Jack Over a year ago

Well.. you're right. I don't think with this approach you can use CASE/WHEN, I would just use 2 updates (3 to handle null cid)

Collectives™ on Stack Overflow

Running count in postgresql UPDATE

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related