I have a table containing the fields: user_ip, datetime, year, month, day, hour, tag_id, country, device_type, brand. I need to check if given a IP was active for a continuous period of 4 or more hours and flag as true / false.
- A single IP can have multiple records within the same hour with different timestamp. I need to identify IPs that were continuously active for at least 4 hours.
- for hour (0,1,2) we need to check previous days last few hours to account for atleast 4 consecutive hours
- minutes are to be ignored (thus 21:59 is accounted as 21:00, while 22:01 is 22:00: those 2-minutes apart timestamps are considered covering 2 consecutive hours, as would 21:01 and 22:59)
Query tried:
WITH sample_data (user_ip, datetime, year, month, day, hour, tag_id, country, device_type, brand) AS
(
SELECT '192.168.0.101', TO_TIMESTAMP('20250525000000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 0, 1001, 'US', 1, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525010000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 1, 1001, 'US', 1, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525020000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 2, 1001, 'US', 1, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525030000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 3, 1001, 'US', 1, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525040000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 4, 1001, 'US', 2, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525050000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 5, 1002, 'US', 2, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525060000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 6, 1002, 'US', 2, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525070000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 7, 1002, 'US', 2, 'Samsung' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525080000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 8, 1002, 'US', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525090000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 9, 1003, 'US', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525100000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 10, 1003, 'US', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525110000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 11, 1003, 'US', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525215100', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 21, 1004, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525215200', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 21, 1005, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525215300', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 21, 1006, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525215400', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 21, 1007, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525215500', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 21, 1008, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525220000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 22, 1004, 'CA', 3, 'Apple' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250525230000', 'YYYYMMDDHH24MISS'), 2025, 5, 25, 23, 1004, 'CA', 4, 'LG' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526000000', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 0, 1005, 'CA', 4, 'LG' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526010000', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 1, 1005, 'CA', 4, 'LG' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526020000', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 2, 1005, 'CA', 4, 'LG' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526030000', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 3, 1006, 'CA', 5, 'Sony' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526040000', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 4, 1006, 'CA', 5, 'Sony' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526085100', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 8, 1006, 'CA', 5, 'Sony' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526085200', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 8, 1007, 'CA', 5, 'Sony' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526085300', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 8, 1008, 'CA', 5, 'Sony' union all
SELECT '192.168.0.101', TO_TIMESTAMP('20250526085400', 'YYYYMMDDHH24MISS'), 2025, 5, 26, 8, 1009, 'CA', 5, 'Sony'
)
,
grid AS
(
SELECT user_ip,datetime, year,month,day,hour,
Coalesce( Case When LAG(hour) Over(Partition By user_ip Order By datetime, hour) IN (hour, hour - 1)
OR Min(hour) Over(Partition By user_ip) = hour
Then 'Y'
End, Cast(hour as Varchar(2))
) as consequtive_hour,
hour - LAG(hour) Over(Partition By user_ip Order By datetime, hour) as diff_hours,
Row_Number() Over(Partition By user_ip Order By datetime, hour
Rows Between Unbounded Preceding And Current Row) as rn,
tag_id,country,device_type,brand
from sample_data
)
Select user_ip,datetime, year,month,day,hour,
Case When Count(consequtive_hour)
Over(Partition By user_ip, consequtive_hour
Order By datetime, hour
Rows Between Unbounded Preceding And Current Row) >= 4
AND consequtive_hour = 'Y' And Sum(Case When consequtive_hour = 'Y' Then diff_hours Else 0 End) Over(Partition By user_ip, consequtive_hour
Order By datetime, hour
Rows Between Unbounded Preceding And Current Row) >= 3
Then 'TRUE'
Else 'FALSE'
End as is_active_min_4hr,
consequtive_hour, diff_hours,rn,tag_id,country,device_type,brand
From grid
Order By user_ip, datetime, hour, rn
fiddle : https://dbfiddle.uk/Q0vyCsJN
Expected result:
- is_active_min_4hr : should be 'False' for Record 14-17 (hour=21), given hour 21 is not continuous due to no activity for hour 18,19,20
- is_active_min_4hr : should be 'False' for Record 18 (hour=22) , due to no activity for hour 19,20
- is_active_min_4hr : should be 'False' for Record 19 (hour=23) , due to no activity for hour 20
- is_active_min_4hr : should be 'TRUE' for Record 20-24 (hour 0-4 next day) , due to continuous activity . for ex: hour 00 (2025-05-26) as continuous activity from prior ours (21,22,23 previous day 20250526)
tag_id, country, device_type, brandhave matter to the question(answer)?21:51:00is counted as21:00:00)? Because the intuitive meaning would be 4 "real" hours, which would prevent00:00:00from being flagged (because 4 hours after21:51:00would only have elapsed at00:51:00, thus only01:00:00would be the first flagged timestamp). Note that @d r's answer is right in that second meaning (see his results table:00:00:00is flaggedfalse).datetimeby adate_trunc('hour', datetime)(and this would work equally well for @d-r's answer). Note that to properly test for this subtlety, your test should stress this rule, by transforming2025-05-25 03:00to2025-05-25 03:33: for now your only test having minutes is21:51, which only has to compare to its successor22:00(always>= 22:00 - 1:00, be it complete with minutes or truncated to21:00), while3:33, having both a predecessor '2:00` and successor4:00, would test both a less-than-1 h interval and a more-than-1 h one.