0

I'm trying to do the following; let's say I want to partition a table in two partition given a set condition:

SELECT
    userid,
    ARRAY_AGG(userid) OVER (
        PARTITION BY userid > 100
    ) arr,
    AVG(userid) OVER (
        PARTITION BY userid > 100
    ) avg
FROM users;

I'll get this:

 userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
     46 | {46,23,69,92}                                             |  57.5000000000000000
     23 | {46,23,69,92}                                             |  57.5000000000000000
     69 | {46,23,69,92}                                             |  57.5000000000000000
     92 | {46,23,69,92}                                             |  57.5000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
    ... | ...                                                       | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

All good, but what if instead, for the userids < 100, I wanted to include each userid with the ones > 100:

SELECT
    userid,
    CASE WHEN userid > 100
    THEN ARRAY_AGG(userid) OVER (
        PARTITION BY userid > 100
    )
    ELSE ARRAY_AGG(userid) OVER (
        PARTITION BY userid -- OR userid > 100
        -- PARTITION BY userid > 100 OR CURRENT_ROW
        -- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
    )
    END arr
    CASE WHEN userid > 100
    THEN AVG(userid) OVER (
        PARTITION BY userid > 100
    )
    ELSE AVG(userid) OVER (
        PARTITION BY userid -- OR userid > 100
        -- PARTITION BY userid > 100 OR CURRENT_ROW
        -- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
    )
    END avg
FROM users;

All the commented code above is the various tries I've been doing. The best I've got is either just the userid without the ones > 100 or all userids:

 userid |                            arr                            |         avg          
--------+-----------------------------------------------------------+----------------------
     23 | {23}                                                      |  23.0000000000000000
     46 | {46}                                                      |  46.0000000000000000
     69 | {69}                                                      |  69.0000000000000000
     92 | {92}                                                      |  92.0000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
    ... | ...                                                       | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143

Is there any way to do what I'm looking for? I'm also trying not to use CTE as much as possible, because the actual code as so much technical debt that it will takes a pretty lengthy time to just adapt it with a WITH.

To be clear, here is the expected result:

 userid |                             arr                              |         avg
--------+--------------------------------------------------------------|----------------------
     23 | {23,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 588.6000000000000000
     46 | {46,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 590.1333333333333334
     69 | {69,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 591.6666666666666667
     92 | {92,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 593.2000000000000000
    552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143
    ... | ...                                                          | ...
    529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529}    | 629.2142857142857143

Here's the reference for potential future stuff that I've been looking at: nested window functions (but isn't implemented at the moment, as of Postgresql-11)

EDIT: Last but not least, the condition is a placeholder! it may or may not be tied to userids, it is just used here for the sake of the example, it could have been

CUME_DIST() OVER (
    PARTITION BY x -- OR CURRENT_USERID
)
0

1 Answer 1

1

This answers the original version of the question.

You seem to want:

select (case when userid < 100
             then array_cat(array[userid],
                            array_agg(userid) filter (where userid > 100) over ()
             else array_agg(userid) filter (where userid > 100) over ()
        end)
Sign up to request clarification or add additional context in comments.

3 Comments

While I did used array_agg for the example, it was just as a convenience to display an example. What I'm looking to do is CUME_DIST () OVER (PARTITION BY condition OR userid ORDER BY sortorder)
@BusyBeingDelicious . . . It is only possible to answer the question that you actually ask. If you have a different question, then ask it as a new question, with appropriate explanation, sample data, and desired results.
It's actually exactly the same, just not array manipulation, I used array because it's easier to see what's going on then avg or cume_dist, also the title is descriptive enough. I'll update the question to include avg too

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.