Postgres get the next row with bigger value (coparing different columns)

Question

Postgres version: 10

Table example:

CREATE TABLE log (
    group_id INTEGER,
    log_begin TIMESTAMP,
    log_end TIMESTAMP
);

My goal: I want to know, for distinct groups, which log began right after the current log ends for each row or NULL if does not exists. Example: if the log of row 1 ends at 2022-07-15 15:30:00, the next log begins at 2022-07-15 16:00:00, so 2022-07-15 16:00:00 is the answer. If the log of row 4 ends at 2022-07-15 15:20:00, the next log begins at 2022-07-15 15:30:00, so it's the answer

Example data:

group_id	log_begin	log_end
1	2022-07-15 15:00:00	2022-07-15 15:30:00
1	2022-07-15 16:00:00	2022-07-15 16:30:00
1	2022-07-15 17:00:00	2022-07-15 17:30:00
2	2022-07-15 15:00:00	2022-07-15 15:20:00
2	2022-07-15 15:15:00	2022-07-15 15:40:00
2	2022-07-15 15:30:00	2022-07-15 16:30:00

My first solution was use a sub-query and search the next value for every row, but this table is very big, so the query result is correct, but it's very slow. Something like this:

SELECT *, ( SELECT _L.log_begin FROM log _L 
    WHERE _L.log_begin > L.log_end 
        AND _L.group_id = L.group_id 
    ORDER BY _L.log_begin ASC LIMIT 1 ) AS next_log_begin
FROM log L

My second solution was use a window function like LEAD as above

SELECT *, LEAD( log_begin, 1 ) OVER ( PARTITION BY group_id ORDER BY log_begin ) AS next_log_begin
FROM log

but the result isn't correct:

group_id	log_begin	log_end	next_log_begin
1	2022-07-15 15:00:00	2022-07-15 15:30:00	2022-07-15 16:00:00
1	2022-07-15 16:00:00	2022-07-15 16:30:00	2022-07-15 17:00:00
1	2022-07-15 17:00:00	2022-07-15 17:30:00	NULL
2	2022-07-15 15:00:00	2022-07-15 15:20:00	2022-07-15 15:15:00
2	2022-07-15 15:15:00	2022-07-15 15:40:00	2022-07-15 15:30:00
2	2022-07-15 15:30:00	2022-07-15 16:30:00	NULL

Because in row 4 it should get 2022-07-15 15:30:00 instead and row 5 should be NULL.

Correct output:

group_id	log_begin	log_end	next_log_begin
1	2022-07-15 15:00:00	2022-07-15 15:30:00	2022-07-15 16:00:00
1	2022-07-15 16:00:00	2022-07-15 16:30:00	2022-07-15 17:00:00
1	2022-07-15 17:00:00	2022-07-15 17:30:00	NULL
2	2022-07-15 15:00:00	2022-07-15 15:20:00	2022-07-15 15:30:00
2	2022-07-15 15:15:00	2022-07-15 15:40:00	NULL
2	2022-07-15 15:30:00	2022-07-15 16:30:00	NULL

Is there any way to do that using Postgres 10? Window function are preferable but not a required resource

It's a bit unclear what your expected result is. Could you write it out like you did your actual results? — Schwern
– Schwern, Commented Jul 15, 2022 at 20:31
I do not see how you can do this with a window function because in group 2, the first interval overlaps the second and the second overlaps the third. Please see dbfiddle.uk/… and change out the lines in the insert statement to see what I mean. The self-join you did is probably necessary to handle this condition. — Mike Organek
– Mike Organek, Commented Jul 15, 2022 at 22:07
@Schwern sorry, I've edited the question and add a example of a query that get the right result — Marcelo Gonçalves
– Marcelo Gonçalves, Commented Jul 18, 2022 at 12:23
@MikeOrganek I've think that there's a solution like range between INTERVAL '1 SECOND' FOLLOWING and UNBOUNDED FOLLOWING of postgres 11+, but it seen that it doesn't work too. I'd like to avoid this self-join; if it's necessary I think a "pre processing" strategy to save this data in insertion time is a better solution, but it'll give me A LOT of work to this here — Marcelo Gonçalves
– Marcelo Gonçalves, Commented Jul 18, 2022 at 12:43

Hambone · Accepted Answer · 2022-07-19 01:45:09Z

0

The data and the results you expect to see don't appear to line up with the logic you've outlined, but I think I get what you are saying.

If I understand you correctly, you want to look at the "next log begin" for every record, sorted by group then log start. If this is the case, you want to omit the "partition by" because it will yield a null any time the group id changes. It executes the lead within groups of whatever value(s) you specify in partition by, in this case group_id. So, for starters:

select
  group_id, log_begin, log_end,
  lead (log_begin) over (order by group_id, log_begin) as x
from log

Which looks for the next record, independent of changes to the group.

There is no way I'm aware of to evaluate the result of a window function within the expression that invokes it, so to do this you essentially would need to wrap it in a CTE and then evaluate it:

with cte as (
  select
    group_id, log_begin, log_end,
    lead (log_begin) over (order by group_id, log_begin) as x
  from log
)
select
  group_id, log_begin, log_end,
  x
from cte

And now you can compare x to any other field. I think the new field you want would look like this:

case
  when log_end < x then x
end as next_log_begin

But again, it does not match your desired results. So either I misunderstood, your sample data might be off, or your assumptions might be off. All are equally possible.

Full query example:

with cte as (
  select
    group_id, log_begin, log_end,
    lead (log_begin) over (order by group_id, log_begin) as x
  from log
)
select
  group_id, log_begin, log_end,
  x,
  case
    when log_end < x then x
  end as next_log_begin
from cte

-- EDIT 7/18/2022 --

I think I see now based on your revised question. I can't promise this will be efficient, but if you implement a scalar I think it will do what you think. Try this and let me know.

select
  group_id, log_begin, log_end,
  (select min (log_begin)
  from log l2
  where l1.group_id = l2.group_id
  and l2.log_begin > l1.log_end) as next_log_begin
from log l1
order by group_id, log_begin

edited Jul 19, 2022 at 1:45

answered Jul 15, 2022 at 22:19

Hambone

16.5k8 gold badges54 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Marcelo Gonçalves Over a year ago

Unfortunately it fail to the case that exists a "overlap" and next row log begins before the current log ends. Example: line 4 ends at 2022-07-15 15:20:00 and line 5 begins at 2022-07-15 15:15:00, so the right answer is the log of line 6 the begins at 2022-07-15 15:30:00

Hambone Over a year ago

Do you think you can update your question with the exact specific output you desire?

Marcelo Gonçalves Over a year ago

done, the 3th table is the correct output

Hambone Over a year ago

I believe I understand... second attempt and comments posted

Collectives™ on Stack Overflow

Postgres get the next row with bigger value (coparing different columns)

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related