Using nested window function in Snowflake

Question

I've seen a lot of questions about this general error, but I don't get why I have it, maybe because of nested window functions...

With the below query, I get the error for Col_C, Col_D, ... and almost everything I tried

SQL compilation error: [eachColumn] is not a valid group by expression

SELECT
    Col_A,
    Col_B,
    FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
                                    ORDER BY Col_TimeStamp ASC 
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
    MAX(Col_D)                      OVER (PARTITION BY Col_A, Col_B
                                    ORDER BY Col_TimeStamp ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
    FIRST_VALUE(CASE WHEN Col_T = 'testvalue'
                THEN LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
                                                    ORDER BY Col_TimeStamp DESC 
                                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
                ELSE NULL END) IGNORE NULLS 
                                    OVER (PARTITION BY Col_A, Col_B
                                    ORDER BY Col_TimeStamp ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM mytable

So, is there a way to used nested window functions in Snowflake (with case when ...) and if so, how/what am I doing wrong ?

I've seen only one database (Exasol) that can reuse local calculations on the same level (my colleague told me that Teradata can also do this). In most of the DBMSes this requires rewriting the query in step-by-step manner with CTE (WITH clause) or subquery in FROM, where analytic functions do not use analytic functions inside. So at each new selectstatement you use the result of analytic function of previous level — astentx
– astentx, Commented Apr 8, 2021 at 21:54
You should provide sample data, desired results, and a clear explanation of what you want to do. Nested window functions don't make sense to the compiler. They don't make sense to me either. — Gordon Linoff
– Gordon Linoff, Commented Apr 8, 2021 at 22:57
It makes perfect sense to ME (this is a small version of what I spend my days writing in snowflake) the context would be murky if there was a GROUP BY, at which point the second layer of window functions needs the context to know "over what 'what'". But here the same number of rows is produced. AKA this is just perfectly lovely functional programming, which SF allows. — Simeon Pilgrim
– Simeon Pilgrim, Commented Apr 8, 2021 at 23:56

Simeon Pilgrim · Accepted Answer · 2021-04-08 23:38:14Z

2

So deconstructing your logic to show it's the second FIRST_VALUE that causes the problem

WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
    SELECT * FROM VALUES
        (1,1,1,1,1,'testvalue',10),
        (1,1,2,3,2,'value',11)
)
SELECT
    Col_A,
    Col_B,
    FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B 
                                    ORDER BY Col_TimeStamp ASC 
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
    MAX(Col_D)                      OVER (PARTITION BY Col_A, Col_B
                                    ORDER BY Col_TimeStamp ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
    LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
                                    ORDER BY Col_TimeStamp DESC 
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,   
    IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
    /*,FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B 
                                    ORDER BY Col_TimeStamp ASC 
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as the_problem*/
FROM data
ORDER BY Col_A,Col_B, col_timestamp
;

if we uncomment the_problem we have it.. compare to PostgreSQL (my background) just getting to reuse so many prior results/steps is a gift, so here I just bust out another SELECT layer.

WITH data(Col_A,Col_B,Col_c,col_d, Col_TimeStamp, col_t,col_e) AS (
    SELECT * FROM VALUES
        (1,1,1,1,1,'testvalue',10),
        (1,1,2,3,2,'value',11)
)
SELECT *,
    FIRST_VALUE(if_test_last_e) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B 
                                    ORDER BY Col_TimeStamp ASC 
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as not_a_problem
FROM (
    SELECT
        Col_A,
        Col_B,
        FIRST_VALUE(Col_C) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B 
                                        ORDER BY Col_TimeStamp ASC 
                                        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_c,
        MAX(Col_D)                      OVER (PARTITION BY Col_A, Col_B
                                        ORDER BY Col_TimeStamp ASC
                                        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        LAST_VALUE(Col_E) IGNORE NULLS OVER (PARTITION BY Col_A, Col_B
                                        ORDER BY Col_TimeStamp DESC 
                                        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_e,   
        IFF(Col_T = 'testvalue', last_e, NULL) as if_test_last_e
        ,Col_TimeStamp
    FROM data
)
ORDER BY Col_A,Col_B, Col_TimeStamp

And then it all works. This also happens if you LAG then IFF/FIRST_VALUE and then LAG that second result.

answered Apr 8, 2021 at 23:38

Simeon Pilgrim

26.7k3 gold badges36 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

R3uK Over a year ago

Thanks for your input, but I went with lateral references, as CTE seemed kind of clumsy and not so readable.

Simeon Pilgrim Over a year ago

@R3uK the CTE is only there to provide the "test data" it's not part of the actual solution, but I like to provide solutions that you can just copy into Snowflake and run as they are, so you can play with them to see how it works.

Simeon Pilgrim Over a year ago

also the "lateral reference" is just the techincal name for what you code, my code and Lukasz code are all doing witch, is referring to an output column in the SELECT section while in the same SELECT section, which is not valid ANSI SQL, but insanely helpful to keep the code clean.

Lukasz Szozda · Accepted Answer · 2021-04-09 15:16:44Z

1

"I've seen a lot of questions about this general error, but I don't get why I have it, maybe because of nested window functions..."

Snowflake supports reusing expressions at the same level(sometimes called "lateral column alias reference" )

It is perfectly fine to write:

SELECT 1+1 AS col1,
       col1 *2 AS col2,
       CASE WHEN col1 > col2 THEN 'Y' ELSE 'NO' AS col3
       ...

In standard SQL you will either have to use multiple levels of query(cte) or use LATERAL JOIN. Related: PostgreSQL: using a calculated column in the same query

Unfortunately the same syntax will not work for analytic functions(and I am now aware of any RDMBS that supports it):

SELECT ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) AS rn
      ,MAX(rn) OVER(PARTITION BY <different than prev) AS m
FROM tab;

In the SQL Standard 2016 there is optional feature: T619 Nested window functions.

Here an article how the nested analytic function query could look like: Nested window functions in SQL.

It means that current way to nest windowed function is usage of derived table/cte:

WITH cte AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) AS rn
           ,*
    FROM tab
)
SELECT  *, MAX(rn) OVER(PARTITION BY <different than prev) AS m
FROM cte

edited Apr 9, 2021 at 15:16

answered Apr 9, 2021 at 14:33

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

3 Comments

Simeon Pilgrim Over a year ago

and a CTE is just a sub-select, which is to say, at some points you have to be explicit about the data scope you are using to help the complier know what you mean.

R3uK Over a year ago

That's the way I went, as CTE seemed clumsy and not so readable, I managed to what I need with lateral references and was still pretty much readable and efficient ;)

Eren Over a year ago

Teradata does support nested Window Functions (i.e. you can use the result of one window function inside of another window function in the same SELECT clause).

Collectives™ on Stack Overflow

Using nested window function in Snowflake

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related