I have a table looks like this
CREATE TABLE myTable
(
userid text,
webid text,
"timestamp" timestamp
);
INSERT INTO myTable
("userid", "webid", "timestamp")
VALUES ('A', '34', '2023-01-31 17:34:49.000'),
('A', '73', '2023-01-31 17:34:50.000'),
('A', '97', '2023-01-31 17:34:58.000'),
('A', '17', '2023-01-31 17:35:02.000'),
('A', '17', '2023-01-31 17:35:07.000'),
('A', '17', '2023-01-31 17:35:18.000'),
('A', '17', '2023-01-31 17:35:30.000'),
('A', '1', '2023-01-31 17:35:37.000'),
('A', '1', '2023-01-31 17:35:38.000'),
('A', '77', '2023-01-31 17:35:41.000'),
('A', '77', '2023-01-31 17:35:42.000'),
('A', '1', '2023-01-31 17:37:10.000'),
('A', '1', '2023-01-31 17:37:12.000'),
('A', '77', '2023-01-31 17:37:14.000'),
('A', '77', '2023-01-31 17:37:15.000'),
('B', '33', '2023-01-31 06:37:15.000'),
('B', '56', '2023-01-31 06:40:15.000')
;
| userid | webid | timestamp |
|---|---|---|
| A | 34 | 2023-01-31 17:34:49 |
| A | 73 | 2023-01-31 17:34:50 |
| A | 97 | 2023-01-31 17:34:58 |
| A | 17 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:07 |
| A | 17 | 2023-01-31 17:35:18 |
| A | 17 | 2023-01-31 17:35:30 |
| A | 1 | 2023-01-31 17:35:37 |
| A | 1 | 2023-01-31 17:35:38 |
| A | 77 | 2023-01-31 17:35:41 |
| A | 77 | 2023-01-31 17:35:42 |
| A | 1 | 2023-01-31 17:37:10 |
| A | 1 | 2023-01-31 17:37:12 |
| A | 77 | 2023-01-31 17:37:14 |
| A | 77 | 2023-01-31 17:37:15 |
| B | 33 | 2023-01-31 06:37:15 |
| B | 56 | 2023-01-31 06:40:15 |
The correct output I want is this:
| userid | webid | timestamp | first_ts |
|---|---|---|---|
| A | 34 | 2023-01-31 17:34:49 | 2023-01-31 17:34:49 |
| A | 73 | 2023-01-31 17:34:50 | 2023-01-31 17:34:50 |
| A | 97 | 2023-01-31 17:34:58 | 2023-01-31 17:34:58 |
| A | 17 | 2023-01-31 17:35:02 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:07 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:18 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:30 | 2023-01-31 17:35:02 |
| A | 1 | 2023-01-31 17:35:37 | 2023-01-31 17:35:37 |
| A | 1 | 2023-01-31 17:35:38 | 2023-01-31 17:35:37 |
| A | 77 | 2023-01-31 17:35:41 | 2023-01-31 17:35:41 |
| A | 77 | 2023-01-31 17:35:42 | 2023-01-31 17:35:41 |
| A | 1 | 2023-01-31 17:37:10 | 2023-01-31 17:37:10 |
| A | 1 | 2023-01-31 17:37:12 | 2023-01-31 17:37:10 |
| A | 77 | 2023-01-31 17:37:14 | 2023-01-31 17:37:14 |
| A | 77 | 2023-01-31 17:37:15 | 2023-01-31 17:37:14 |
| B | 33 | 2023-01-31 06:37:15 | 2023-01-31 06:37:15 |
| B | 56 | 2023-01-31 06:40:15 | 2023-01-31 06:40:15 |
first_ts means first timestamp. If a userid visits consecutive webid, their first_ts should be the first timestamp of this webid. If the userid visits webid let's say webid = 77 and jumps to another webid = 1 and then jumps back to webid = 77, the first_ts should be refreshed, it won't be the very first one. I used following code to do that, but unable to get correct first_ts for userid who jumps back to previous webid.
select userid, webid,
lag(webid,1) over(partition by userid order by timestamp desc) as next_webid,
lag(webid,1) over(partition by userid order by timestamp asc) as previous_webid,
timestamp,
case when webid = next_webid then first_value(timestamp) over(partition by userid, webid order by timestamp)
when webid = next_webid and webid = previous_webid then first_value(timestamp) over(partition by userid, webid order by timestamp)
when webid = previous_webid then first_value(timestamp) over(partition by userid, webid order by timestamp)
else timestamp
end as first_ts
from myTable
order by userid, timestamp;
this wrong query will return output
| userid | webid | timestamp | first_ts |
|---|---|---|---|
| A | 34 | 2023-01-31 17:34:49 | 2023-01-31 17:34:49 |
| A | 73 | 2023-01-31 17:34:50 | 2023-01-31 17:34:50 |
| A | 97 | 2023-01-31 17:34:58 | 2023-01-31 17:34:58 |
| A | 17 | 2023-01-31 17:35:02 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:07 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:18 | 2023-01-31 17:35:02 |
| A | 17 | 2023-01-31 17:35:30 | 2023-01-31 17:35:02 |
| A | 1 | 2023-01-31 17:35:37 | 2023-01-31 17:35:37 |
| A | 1 | 2023-01-31 17:35:38 | 2023-01-31 17:35:37 |
| A | 77 | 2023-01-31 17:35:41 | 2023-01-31 17:35:41 |
| A | 77 | 2023-01-31 17:35:42 | 2023-01-31 17:35:41 |
| A | 1 | 2023-01-31 17:37:10 | 2023-01-31 17:35:37 |
| A | 1 | 2023-01-31 17:37:12 | 2023-01-31 17:35:37 |
| A | 77 | 2023-01-31 17:37:14 | 2023-01-31 17:35:41 |
| A | 77 | 2023-01-31 17:37:15 | 2023-01-31 17:35:41 |
| B | 33 | 2023-01-31 06:37:15 | 2023-01-31 06:37:15 |
| B | 56 | 2023-01-31 06:40:15 | 2023-01-31 06:40:15 |
in row 12, 13, 14, 15, the first_ts is wrong. Anyone knows how to modify the query to get correct first_ts? Thanks!