Automatically add partition conditions to WHERE clause

Question

I have a columnar table that is partitioned by day and hour. It is stored on S3 in parquet files to be queried by Athena. Here is the CREATE TABLE:

CREATE EXTERNAL TABLE foo (
  -- other columns here
  dt timestamp,
  day string,
  hour string
)
PARTITIONED BY (day string, hour string)
STORED AS parquet
LOCATION 's3://foo/foo'

And the layout on S3 is like:

s3://foo/foo/day=2021-10-10/hh=00/*.parquet
s3://foo/foo/day=2021-10-10/hh=01/*.parquet
...etc
s3://foo/foo/day=2021-10-10/hh=23/*.parquet

So a query like the following will be fast because it only scans over one hour of parquet files because the partition columns are being used to filter it:

-- fast, easy to write
SELECT * FROM foo WHERE day = '2021-10-10' AND hour = '00'

However, the table also includes the full datetime dt. Usually we want to write queries for ranges that don't align to a day/hour boundary, and/or are in a different timezone.

For example, this will scan ALL parquet files and be very slow:

-- slow, easy to write
SELECT * FROM foo WHERE dt > '2021-10-09 23:05:00' AND dt < '2021-10-11 01:00:00'

It can be improved by manually calculating the day and hour that minimally enclose the time period:

-- fast, painful to write
SELECT * FROM foo
WHERE
  ((day, hh) IN (('2021-10-09', '23'), ('2021-10-11', '00')) OR day = '2021-10-10')
AND
  dt > '2021-10-09 23:05:00' AND dt < '2021-10-11 01:00:00'

Ideally this extra condition could be added transparently by the database so as to avoid having to manually add the ((day,hh) IN (...)).

Is this possible somehow with Athena?

I can't visualize the table structure and its column. Can you add a few rows of sample data? — The Impaler
– The Impaler, Commented Oct 23, 2021 at 14:04
I think you may find better answers to this question at dba.stackexchange.com — The Impaler
– The Impaler, Commented Oct 24, 2021 at 1:35

Theo · Accepted Answer · 2021-10-25 08:40:11Z

0

I've wished for this feature many times, but unfortunately Athena does not support it. You have to include both the predicate for the dt column and the day and hour partition keys.

answered Oct 25, 2021 at 8:40

Theo

133k22 gold badges170 silver badges213 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ajajaja1 Over a year ago

I don't have enough reputation to upvote, but thanks!

Collectives™ on Stack Overflow

Automatically add partition conditions to WHERE clause

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related