How to group aggregates in PostgreSQL based on ordering of multiple columns?

Question

Given a table 'points' like:

time  | session_id  | trail_id
------------------------------
  1   |      1      |    1
  2   |      1      |    1
  3   |      1      |    3
  4   |      1      |    3
  5   |      1      |    3
  6   |      1      |    1
  7   |      1      |    1
  8   |      1      |    1
  9   |      1      |    1
  10  |      1      |    1

How can these items be grouped so that I can use an aggregate function on 'session_id' with respect to trail_id, while ordering by time? i.e.) I want the groupings to be split when the trail_id changes over time.

A query like:

SELECT count(session_id), session_id, trail_id
FROM <?>

Would yield:

count  | session_id  | trail_id
-------------------------------
   2   |     1       |    1
   3   |     1       |    3
   5   |     1       |    1

I believe this can be done with a window function, but have been unsuccessful so far.

The following doesn't quite get where I need to be, as it groups all of the trail_ids regardless of time:

SELECT session_id, trail_id, 
  first_value(time) OVER (PARTITION BY session_id, trail_id ORDER BY time) as v
FROM points

Also, in my production use case the 'points' table will be the result of JOINs and consist of a few millions rows. These points will have a PostGIS geometry type and be aggregated with the ST_MakeLine() function. Performance wise, would this be better off attempted in PL/pgSQL?

Oto Shavadze · Accepted Answer · 2017-06-21 19:40:41Z

2

with points(time  , session_id  , trail_id) as(
    select 1   ,      1      ,    1 union all
    select  2   ,      1      ,    1 union all
    select  3   ,      1      ,    3 union all
    select  4   ,      1      ,    3 union all
    select  5   ,      1      ,    3 union all
    select  6   ,      1      ,    1 union all
    select  7   ,      1      ,    1 union all
    select  8   ,      1      ,    1 union all
    select  11   ,      1      ,    1 union all
    select  12  ,      1      ,    1  
)

select count(*), session_id, trail_id
from (
    select time, session_id, trail_id,
    row_number() over(order by time) -
    row_number() over(partition by session_id, trail_id order by time) as grp
    from points
)t
group by grp, session_id, trail_id
order by min(time)

Well, this should give result what you need, but if

'points' table will be the result of JOINs and consist of a few millions rows

then may be performance will not so desirable. Just try

answered Jun 21, 2017 at 19:40

Oto Shavadze

43.3k56 gold badges168 silver badges248 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Vail Over a year ago

This is clever! And works well. Unfortunately this is for a proof of concept that I won't be able to really test out for a few weeks, so I'll have to see how the performance is then.

Collectives™ on Stack Overflow

How to group aggregates in PostgreSQL based on ordering of multiple columns?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related