0

I have a table storing activity information for my application's users.

| username | day |
|----------|-----|
|   u1     |   1 |
|   u1     |   2 |
|   u1     |   3 |
|     u2   |   2 |
|       u3 |   1 |
|       u3 |   4 |

I'd like to be able to get historical data regarding unique and recent users for each day.

  • Unique users for day N are all the distinct users that had any activity between day 0 and day N.
  • Recent users for day N are all the distinct users that had any activity on day N-1 or day N. In the actual application this will be between day N-30 and N.

I'm able to get the list of the users that were active on each specific day, but I'm not sure how I can aggregate this data to get unique or recent users.

SELECT 
day, 
array_agg(username) as day_users
FROM myTable
GROUP BY day
ORDER BY day;

| day | day_users |
|-----|-----------|
|   1 |  u1,   u3 |
|   2 |  u1,u2    |
|   3 |  u1       |
|   4 |        u3 |

For the sample data above, the expected output would be (spacing not required):

| day | unique_users | recent_users 
|-----|--------------|-------------
|   1 |     u1,   u3 |     u1,   u3
|   2 |     u1,u2,u3 |     u1,u2,u3
|   3 |     u1,u2,u3 |     u1,u2
|   4 |     u1,u2,u3 |     u1,   u3

Relevant SQL Fiddle: http://sqlfiddle.com/#!17/b793f/1

2
  • hint: use 'lag' Commented Mar 29, 2018 at 16:22
  • Please always provide your version of Postgres and the table definition (CREATE TABLE statement) showing data types and constraints. Commented Mar 29, 2018 at 23:58

1 Answer 1

1

You need a custom aggregate function:

create or replace function array_union(anyarray, anyarray)
returns anyarray language sql
as $$
    select 
        array(
            select unnest($1)
            union
            select unnest($2)
            order by unnest
        )
$$;

create aggregate array_union_agg (anyarray)
(
    sfunc = array_union,
    stype = anyarray
);

Use the aggregate as a window function in the query based on yours one:

select 
    day, 
    day_users, 
    array_union_agg(day_users) over (order by day) as unique_users,
    array_union_agg(day_users) over (order by day rows between 1 preceding and current row) as recent_users
from (
    select day, array_agg(username) as day_users
    from my_table
    group by day
    order by day
    ) s

 day | day_users | unique_users | recent_users 
-----+-----------+--------------+--------------
   1 | {u1,u3}   | {u1,u3}      | {u1,u3}
   2 | {u1,u2}   | {u1,u2,u3}   | {u1,u2,u3}
   3 | {u1}      | {u1,u2,u3}   | {u1,u2}
   4 | {u3}      | {u1,u2,u3}   | {u1,u3}
(4 rows)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.