1

I'm trying to print out a list of all activity for a random group of 500 users who starting using my app after Jan 1st.

with random_users as (select distinct id, min(timestamp) as first_event
from log
group by id 
having first_event >= '2019-01-01'
order by random() 
limit 500)

select random_users.id, log.timestamp, log.event
from random_users left join log on log.id = random_users.id

Getting a random selection of users is easily done using PostgreSQL's random(), but when I try to combine this with the condition of having first_event >= '2019-01-01' I'm getting some problems. Namely, timestamp is actually showing as prior to 2019-01-01 for many users in the final results, something like this:

id    timestamp   event
5     2018-11-12  click
2     2018-12-27  purchase
7     2019-01-03  click

I am wondering if this is something to do with how the random() function works, as similar queries without this give expected results How can I successfully limit the random() function to groups of users who've used the app after 2019-01-01?

1 Answer 1

1

Re-thinking this now that I fully understand what you're after. PostgreSQL has DISTINCT ON which you can use to select the first row matching certain conditions:

with user_first_events as (SELECT DISTINCT ON (id) id, timestamp, event
FROM log
WHERE timestamp >= '2019-01-01'
ORDER BY id, timestamp ASC)

SELECT * FROM user_first_events ORDER BY random() LIMIT 500
Sign up to request clarification or add additional context in comments.

5 Comments

That gets 500 random users and then filters out those that didn't have their first event after january 1st, rather than getting 500 random users that did, as the OP wants.
@WouterVerhelst not exactly (since we don't really know what log represents), but I see your point. I've removed the unnecessary scope from the join
yes, and now you've ended up with the OP's original query (apart from the with clause)...
Thanks both for your help! As the IDs in the random_users table are selected from a group that has a min(timestamp) that is greater than Jan 1st, it's still weird that the timestamps pulled from log go back beyond this :-(
that looks much more interesting :-) upvoted rather than downvoted now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.