1

I have a table period_of_hours in PostgreSQL. This table contains several column but for my case is important column timestamp (timestamp without time zone) and key(character varying). I have a scheduler which add a new row to the table with period of hour. I don't want to bother you why but sometimes it skip adding to the table a new row.

As you can see from this example period from 2024-05-06 10:00:00 to 2024-05-06 15:00:00 contains all lines consecutively, then one line is skipped 2024-05-06 16:00:00. From 2024-05-06 17:00:00 to 2024-05-06 21:00:00 is Ok and two rows is missed 2024-05-06 22:00:00 and 2024-05-06 23:00:00 than 2024-05-07 02:00:00 also missed.

Table "period_of_hours"
timestamp            key 

2024-05-06 10:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 11:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 12:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 13:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 14:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 15:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-06 17:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 18:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 19:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 20:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 21:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 00:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 01:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 03:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 04:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 05:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'


2024-05-06 10:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 11:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 12:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 13:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 14:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 15:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-06 17:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 18:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 19:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 20:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 21:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 00:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 01:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 03:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 04:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 05:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

First of all i want to select all rows by key (for example '009eae73-0b55-3809-9938-4bc0a342e451') and filter result set to get all missed rows ? In real table it is more than 1000 rows.

Expected result for key column '009eae73-0b55-3809-9938-4bc0a342e451':

2024-05-06 16:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 22:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 23:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 02:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

This solution is works for me.

with continuous("timestamp") as (
  select 
    '2024-05-06 10:00:00' :: timestamp + n * '1 hour' :: interval 
  FROM 
    generate_series(
      0, 
      (
        select 
          extract(
            epoch 
            from 
              max(timestamp)- min(
                '2024-05-06 10:00:00' :: timestamp
              )
          )/ 3600 
        from 
          period_of_hours 
        where 
          "key" = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
      ):: integer, 
      1
    ) as n
) 
select 
  c.timestamp 
from 
  continuous c 
  left join period_of_hours p on c.timestamp = p.timestamp 
  and p.key = '0a3e1588-ad59-3586-b071-d5001f5ff9a7' 
where 
  p.timestamp is null;

is it possible to combine key in construction and solution above because i can have a list of UUID ?

key in (
    'a63ffce5-1d86-3afc-8b5a-97452f935632', 
    '0a3e1588-ad59-3586-b071-d5001f5ff9a7',
    'e5524b9b-3aca-3b80-9aab-19bfec30fb9b')```

I dont understand whats is continuous("timestamp") ? It is a function or what ?

1
  • continuous is the name of the common table expression, ("timestamp") is the name of the column in it. You can add your condition with the list of keys to the last solution in my answer. Here's a demo. Note that it's not showing any results because your proposed list of keys doesn't match any of those you gave in the sample table. Commented Jun 6, 2024 at 20:26

1 Answer 1

0

You can generate a continuous version of your data set and run an anti-join against it: demo

with continuous("timestamp") as (
select '2024-05-06 10:00:00'::timestamp+n*'1 hour'::interval
from generate_series( 0
                     ,(select extract(epoch from max("timestamp")
                                                -min("timestamp"))/3600
                      from period_of_hours
                      where "key"='009eae73-0b55-3809-9938-4bc0a342e451')
                     ,1)as n)
select c."timestamp" from continuous c
left join period_of_hours p 
  on c."timestamp"=p."timestamp"
 and p."key"='009eae73-0b55-3809-9938-4bc0a342e451'
where p."timestamp" is null;
timestamp
2024-05-06 16:00:00
2024-05-06 22:00:00
2024-05-06 23:00:00
2024-05-07 02:00:00

An except, a not in or <>all() could achieve the same.

You could also treat this as gaps-and-islands problem and run a window function to spot the gaps and report their width, then generate a patch of rows using that:

select "timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      where "key"='009eae73-0b55-3809-9938-4bc0a342e451'
      window w1 as(order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;

If you want to list each key with all its gaps, it's enough to change the window definition a bit:

select "key","timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
               /3600 as "diff"
      from period_of_hours
      window w1 as(partition by "key" order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;
key timestamp
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 16:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 22:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 23:00:00
009eae73-0b55-3809-9938-4bc0a342e451 2024-05-07 02:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 16:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 22:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 23:00:00
fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-07 02:00:00
Sign up to request clarification or add additional context in comments.

6 Comments

thank you for your reply. I understand that i confuse you by column "value". I need only time timemark. I updated my post
@Romillion That's a minor change. I've updated the answer and the demo, removing the "value" column.
I did a final update to my post. Can you please update your SQL query according to my final update
Please Notice that column name is not timemark . The real column name is timestamp.
@Romillion I've updated the answer and the demo to use the new column name and added an example if you want to list all keys with their gaps, not just a specific one.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.