How to include current row in PARTITION BY of Postgresql's window function

Question

I'm trying to do the following; let's say I want to partition a table in two partition given a set condition:

SELECT
    userid,
    ARRAY_AGG(userid) OVER (
        PARTITION BY userid > 100
    ) arr,
    AVG(userid) OVER (
        PARTITION BY userid > 100
    ) avg
FROM users;

I'll get this:

 userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
     46 | {46,23,69,92}                                             |  57.5000000000000000
     23 | {46,23,69,92}                                             |  57.5000000000000000
     69 | {46,23,69,92}                                             |  57.5000000000000000
     92 | {46,23,69,92}                                             |  57.5000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
    ... | ...                                                       | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

All good, but what if instead, for the userids < 100, I wanted to include each userid with the ones > 100:

SELECT
    userid,
    CASE WHEN userid > 100
    THEN ARRAY_AGG(userid) OVER (
        PARTITION BY userid > 100
    )
    ELSE ARRAY_AGG(userid) OVER (
        PARTITION BY userid -- OR userid > 100
        -- PARTITION BY userid > 100 OR CURRENT_ROW
        -- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
    )
    END arr
    CASE WHEN userid > 100
    THEN AVG(userid) OVER (
        PARTITION BY userid > 100
    )
    ELSE AVG(userid) OVER (
        PARTITION BY userid -- OR userid > 100
        -- PARTITION BY userid > 100 OR CURRENT_ROW
        -- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
    )
    END avg
FROM users;

All the commented code above is the various tries I've been doing. The best I've got is either just the userid without the ones > 100 or all userids:

 userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
     23 | {23}                                                      |  23.0000000000000000
     46 | {46}                                                      |  46.0000000000000000
     69 | {69}                                                      |  69.0000000000000000
     92 | {92}                                                      |  92.0000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
    ... | ...                                                       | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

Is there any way to do what I'm looking for? I'm also trying not to use CTE as much as possible, because the actual code as so much technical debt that it will takes a pretty lengthy time to just adapt it with a WITH.

To be clear, here is the expected result:

 userid |                             arr                              |         avg
--------+--------------------------------------------------------------|----------------------
     23 | {23,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 588.6000000000000000
     46 | {46,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 590.1333333333333334
     69 | {69,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 591.6666666666666667
     92 | {92,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 593.2000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143
    ... | ...                                                          | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143

Here's the reference for potential future stuff that I've been looking at: nested window functions (but isn't implemented at the moment, as of Postgresql-11)

EDIT: Last but not least, the condition is a placeholder! it may or may not be tied to userids, it is just used here for the sake of the example, it could have been

CUME_DIST() OVER (
    PARTITION BY x -- OR CURRENT_USERID
)

Gordon Linoff · Accepted Answer · 2019-10-11 12:09:59Z

1

This answers the original version of the question.

You seem to want:

select (case when userid < 100
             then array_cat(array[userid],
                            array_agg(userid) filter (where userid > 100) over ()
             else array_agg(userid) filter (where userid > 100) over ()
        end)

edited Oct 11, 2019 at 12:09

answered Oct 10, 2019 at 1:58

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BusyBeingDelicious Over a year ago

While I did used array_agg for the example, it was just as a convenience to display an example. What I'm looking to do is CUME_DIST () OVER (PARTITION BY condition OR userid ORDER BY sortorder)

Gordon Linoff Over a year ago

@BusyBeingDelicious . . . It is only possible to answer the question that you actually ask. If you have a different question, then ask it as a new question, with appropriate explanation, sample data, and desired results.

BusyBeingDelicious Over a year ago

It's actually exactly the same, just not array manipulation, I used array because it's easier to see what's going on then avg or cume_dist, also the title is descriptive enough. I'll update the question to include avg too

Collectives™ on Stack Overflow

How to include current row in PARTITION BY of Postgresql's window function

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related