Postgres search missing date in hourly sequence

Question

I have a table period_of_hours in PostgreSQL. This table contains several column but for my case is important column timestamp (timestamp without time zone) and key(character varying). I have a scheduler which add a new row to the table with period of hour. I don't want to bother you why but sometimes it skip adding to the table a new row.

As you can see from this example period from 2024-05-06 10:00:00 to 2024-05-06 15:00:00 contains all lines consecutively, then one line is skipped 2024-05-06 16:00:00. From 2024-05-06 17:00:00 to 2024-05-06 21:00:00 is Ok and two rows is missed 2024-05-06 22:00:00 and 2024-05-06 23:00:00 than 2024-05-07 02:00:00 also missed.

Table "period_of_hours"
timestamp            key 

2024-05-06 10:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 11:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 12:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 13:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 14:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 15:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-06 17:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 18:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 19:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 20:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 21:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 00:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 01:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 03:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 04:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 05:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'


2024-05-06 10:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 11:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 12:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 13:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 14:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 15:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-06 17:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 18:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 19:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 20:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 21:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 00:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 01:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 03:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 04:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 05:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

First of all i want to select all rows by key (for example '009eae73-0b55-3809-9938-4bc0a342e451') and filter result set to get all missed rows ? In real table it is more than 1000 rows.

Expected result for key column '009eae73-0b55-3809-9938-4bc0a342e451':

2024-05-06 16:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 22:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 23:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 02:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

This solution is works for me.

with continuous("timestamp") as (
  select 
    '2024-05-06 10:00:00' :: timestamp + n * '1 hour' :: interval 
  FROM 
    generate_series(
      0, 
      (
        select 
          extract(
            epoch 
            from 
              max(timestamp)- min(
                '2024-05-06 10:00:00' :: timestamp
              )
          )/ 3600 
        from 
          period_of_hours 
        where 
          "key" = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
      ):: integer, 
      1
    ) as n
) 
select 
  c.timestamp 
from 
  continuous c 
  left join period_of_hours p on c.timestamp = p.timestamp 
  and p.key = '0a3e1588-ad59-3586-b071-d5001f5ff9a7' 
where 
  p.timestamp is null;

is it possible to combine key in construction and solution above because i can have a list of UUID ?

key in (
    'a63ffce5-1d86-3afc-8b5a-97452f935632', 
    '0a3e1588-ad59-3586-b071-d5001f5ff9a7',
    'e5524b9b-3aca-3b80-9aab-19bfec30fb9b')```

I dont understand whats is continuous("timestamp") ? It is a function or what ?

continuous is the name of the common table expression, ("timestamp") is the name of the column in it. You can add your condition with the list of keys to the last solution in my answer. Here's a demo. Note that it's not showing any results because your proposed list of keys doesn't match any of those you gave in the sample table. — Zegarek
– Zegarek, Commented Jun 6, 2024 at 20:26

Zegarek · Accepted Answer · 2024-06-06 10:01:53Z

0

You can generate a continuous version of your data set and run an anti-join against it: demo

with continuous("timestamp") as (
select '2024-05-06 10:00:00'::timestamp+n*'1 hour'::interval
from generate_series( 0
                     ,(select extract(epoch from max("timestamp")
                                                -min("timestamp"))/3600
                      from period_of_hours
                      where "key"='009eae73-0b55-3809-9938-4bc0a342e451')
                     ,1)as n)
select c."timestamp" from continuous c
left join period_of_hours p 
  on c."timestamp"=p."timestamp"
 and p."key"='009eae73-0b55-3809-9938-4bc0a342e451'
where p."timestamp" is null;

timestamp
2024-05-06 16:00:00
2024-05-06 22:00:00
2024-05-06 23:00:00
2024-05-07 02:00:00

An except, a not in or <>all() could achieve the same.

You could also treat this as gaps-and-islands problem and run a window function to spot the gaps and report their width, then generate a patch of rows using that:

select "timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      where "key"='009eae73-0b55-3809-9938-4bc0a342e451'
      window w1 as(order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;

If you want to list each key with all its gaps, it's enough to change the window definition a bit:

select "key","timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      window w1 as(partition by "key" order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;

key	timestamp
009eae73-0b55-3809-9938-4bc0a342e451	2024-05-06 16:00:00
009eae73-0b55-3809-9938-4bc0a342e451	2024-05-06 22:00:00
009eae73-0b55-3809-9938-4bc0a342e451	2024-05-06 23:00:00
009eae73-0b55-3809-9938-4bc0a342e451	2024-05-07 02:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff	2024-05-06 16:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff	2024-05-06 22:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff	2024-05-06 23:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff	2024-05-07 02:00:00

edited Jun 6, 2024 at 10:01

answered Jun 5, 2024 at 18:05

Zegarek

29.9k5 gold badges27 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Romillion Over a year ago

thank you for your reply. I understand that i confuse you by column "value". I need only time timemark. I updated my post

Zegarek Over a year ago

@Romillion That's a minor change. I've updated the answer and the demo, removing the "value" column.

Romillion Over a year ago

I did a final update to my post. Can you please update your SQL query according to my final update

Romillion Over a year ago

Please Notice that column name is not timemark . The real column name is timestamp.

Zegarek Over a year ago

@Romillion I've updated the answer and the demo to use the new column name and added an example if you want to list all keys with their gaps, not just a specific one.

|

Collectives™ on Stack Overflow

Postgres search missing date in hourly sequence

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related