0

Given a table 'points' like:

time  | session_id  | trail_id
------------------------------
  1   |      1      |    1
  2   |      1      |    1
  3   |      1      |    3
  4   |      1      |    3
  5   |      1      |    3
  6   |      1      |    1
  7   |      1      |    1
  8   |      1      |    1
  9   |      1      |    1
  10  |      1      |    1

How can these items be grouped so that I can use an aggregate function on 'session_id' with respect to trail_id, while ordering by time? i.e.) I want the groupings to be split when the trail_id changes over time.

A query like:

SELECT count(session_id), session_id, trail_id
FROM <?>

Would yield:

count  | session_id  | trail_id
-------------------------------
   2   |     1       |    1
   3   |     1       |    3
   5   |     1       |    1

I believe this can be done with a window function, but have been unsuccessful so far.

The following doesn't quite get where I need to be, as it groups all of the trail_ids regardless of time:

SELECT session_id, trail_id, 
  first_value(time) OVER (PARTITION BY session_id, trail_id ORDER BY time) as v
FROM points

Also, in my production use case the 'points' table will be the result of JOINs and consist of a few millions rows. These points will have a PostGIS geometry type and be aggregated with the ST_MakeLine() function. Performance wise, would this be better off attempted in PL/pgSQL?

1 Answer 1

2
with points(time  , session_id  , trail_id) as(
    select 1   ,      1      ,    1 union all
    select  2   ,      1      ,    1 union all
    select  3   ,      1      ,    3 union all
    select  4   ,      1      ,    3 union all
    select  5   ,      1      ,    3 union all
    select  6   ,      1      ,    1 union all
    select  7   ,      1      ,    1 union all
    select  8   ,      1      ,    1 union all
    select  11   ,      1      ,    1 union all
    select  12  ,      1      ,    1  
)

select count(*), session_id, trail_id
from (
    select time, session_id, trail_id,
    row_number() over(order by time) -
    row_number() over(partition by session_id, trail_id order by time) as grp
    from points
)t
group by grp, session_id, trail_id
order by min(time)

Well, this should give result what you need, but if

'points' table will be the result of JOINs and consist of a few millions rows

then may be performance will not so desirable. Just try

Sign up to request clarification or add additional context in comments.

1 Comment

This is clever! And works well. Unfortunately this is for a proof of concept that I won't be able to really test out for a few weeks, so I'll have to see how the performance is then.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.