I'm trying to print out a list of all activity for a random group of 500 users who starting using my app after Jan 1st.
with random_users as (select distinct id, min(timestamp) as first_event
from log
group by id
having first_event >= '2019-01-01'
order by random()
limit 500)
select random_users.id, log.timestamp, log.event
from random_users left join log on log.id = random_users.id
Getting a random selection of users is easily done using PostgreSQL's random(), but when I try to combine this with the condition of having first_event >= '2019-01-01' I'm getting some problems. Namely, timestamp is actually showing as prior to 2019-01-01 for many users in the final results, something like this:
id timestamp event
5 2018-11-12 click
2 2018-12-27 purchase
7 2019-01-03 click
I am wondering if this is something to do with how the random() function works, as similar queries without this give expected results How can I successfully limit the random() function to groups of users who've used the app after 2019-01-01?