I'm trying to do the following; let's say I want to partition a table in two partition given a set condition:
SELECT
userid,
ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
) arr,
AVG(userid) OVER (
PARTITION BY userid > 100
) avg
FROM users;
I'll get this:
userid | arr | avg
--------+-----------------------------------------------------------+----------------------
46 | {46,23,69,92} | 57.5000000000000000
23 | {46,23,69,92} | 57.5000000000000000
69 | {46,23,69,92} | 57.5000000000000000
92 | {46,23,69,92} | 57.5000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
All good, but what if instead, for the userids < 100, I wanted to include each userid with the ones > 100:
SELECT
userid,
CASE WHEN userid > 100
THEN ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
)
ELSE ARRAY_AGG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END arr
CASE WHEN userid > 100
THEN AVG(userid) OVER (
PARTITION BY userid > 100
)
ELSE AVG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END avg
FROM users;
All the commented code above is the various tries I've been doing. The best I've got is either just the userid without the ones > 100 or all userids:
userid | arr | avg
--------+-----------------------------------------------------------+----------------------
23 | {23} | 23.0000000000000000
46 | {46} | 46.0000000000000000
69 | {69} | 69.0000000000000000
92 | {92} | 92.0000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
Is there any way to do what I'm looking for? I'm also trying not to use CTE as much as possible, because the actual code as so much technical debt that it will takes a pretty lengthy time to just adapt it with a WITH.
To be clear, here is the expected result:
userid | arr | avg
--------+--------------------------------------------------------------|----------------------
23 | {23,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 588.6000000000000000
46 | {46,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 590.1333333333333334
69 | {69,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 591.6666666666666667
92 | {92,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 593.2000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
Here's the reference for potential future stuff that I've been looking at: nested window functions (but isn't implemented at the moment, as of Postgresql-11)
EDIT: Last but not least, the condition is a placeholder! it may or may not be tied to userids, it is just used here for the sake of the example, it could have been
CUME_DIST() OVER (
PARTITION BY x -- OR CURRENT_USERID
)