Full table scan in nested query in Snowflake

Question

After particular time on solving task and reviewing Snowflake documentation I noticed potential improvement in query for readability and possibly performance improvement. My query is using nested correlated query to check it there is any existing updates for main table using separate table with changes. Both tables don't have explicit PK or any other constraints on allowed values. Here is an example of simplified query:

SELECT a.*
  FROM tableA a
 WHERE EXISTS (
               SELECT 1
                 FROM tableA_CDC a_cdc
                WHERE a.column1 = a_cdc.column1
                  AND a.column2 = a_cdc.column2
                  AND (a.column3 = a_cdc.column3 OR (a.column3 IS NULL AND a_cdc.column3 IS NULL))
              )

I was interested in the last predicate (a.column3 = a_cdc.column3 OR (a.column3 IS NULL AND a_cdc.column3 IS NULL)). For column3 value can be null, so we want to fetch rows from main table. Column1 and column2 cannot have null values and we can ignore null handling.

The problem I found was not only in readability, but as I noticed in performance. Basically if we compare only by '=' or checking if both columns are NULLs - everything works fine (using query profile). The sum of counts of data by each predicate gives correct result. But if we have grouped condition on equals or is null, then we have correct changes count, but query profile shows that full table scan was performed.

In documentation I found function called 'EQUAL_NULL', that allow null-safely compare two expressions. If I modify query by replacing last grouped predicate with EQUAL_NULL, then result is correct and there is no full table scan.

SELECT a.*
  FROM tableA a
 WHERE EXISTS (
               SELECT 1
                 FROM tableA_CDC a_cdc
                WHERE a.column1 = a_cdc.column1
                  AND a.column2 = a_cdc.column2
                  AND EQUAL_NULL(a.column3, a_cdc.column3)
              )

Any ideas why we have full table scan in first case?

the OR can break filtering opitimazation

Simeon Pilgrim
– Simeon Pilgrim

2024-10-15 01:26:43 +00:00
Commented Oct 15, 2024 at 1:26 — Simeon Pilgrim
– Simeon Pilgrim, Commented Oct 15, 2024 at 1:26

Bryan Crystal-Thurston · Accepted Answer · 2024-10-16 17:25:21Z

2

I had this problem too. It's hard to know exactly what's going on with your query without some information from the query profile. However, when I experienced this problem, the OR in my WHERE clause was acting almost like a cartesian join. Because I was comparing the columns from two different tables to each other (similar to a join), the query appeared to be comparing the value from one table to all the values in the other (similar to the issues experienced in a query like the one in this post). I solved this by unioning the tables together like this:

SELECT 1
FROM tableA_CDC a_cdc
WHERE a.column1 = a_cdc.column1
              AND a.column2 = a_cdc.column2
              AND a.column3 = a_cdc.column3

UNION ALL

SELECT 1
FROM tableA_CDC a_cdc
WHERE a.column1 = a_cdc.column1
              AND a.column2 = a_cdc.column2
              AND a.column3 IS NULL AND a_cdc.column3 IS NULL

This drastically improved my own query, and it mirrors the answer from the link I've shared above (plus a lot of others out there).

Also, just thinking about the function of this table, I always think of streams when I think of checking for table updates. Not sure how familiar you are with them, but depending on what you're going to do with these updates, I highly recommend streams and tasks.

answered Oct 16, 2024 at 17:25

Bryan Crystal-Thurston

1961 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Arthur Shtypuliak Over a year ago

I've discovered article on snowflake community that is closely related to post's problem. As I understand due to OR predicate snowflake query planner decides to check all partitions and thus make full scan. A little bit more detailed cause is in the article, and yeah, your solution and article's one are similar, seems like enough people faced with the same problem)). Thanks for response

Collectives™ on Stack Overflow

Full table scan in nested query in Snowflake

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related