1

I have a columnar table that is partitioned by day and hour. It is stored on S3 in parquet files to be queried by Athena. Here is the CREATE TABLE:

CREATE EXTERNAL TABLE foo (
  -- other columns here
  dt timestamp,
  day string,
  hour string
)
PARTITIONED BY (day string, hour string)
STORED AS parquet
LOCATION 's3://foo/foo'

And the layout on S3 is like:

s3://foo/foo/day=2021-10-10/hh=00/*.parquet
s3://foo/foo/day=2021-10-10/hh=01/*.parquet
...etc
s3://foo/foo/day=2021-10-10/hh=23/*.parquet

So a query like the following will be fast because it only scans over one hour of parquet files because the partition columns are being used to filter it:

-- fast, easy to write
SELECT * FROM foo WHERE day = '2021-10-10' AND hour = '00'

However, the table also includes the full datetime dt. Usually we want to write queries for ranges that don't align to a day/hour boundary, and/or are in a different timezone.

For example, this will scan ALL parquet files and be very slow:

-- slow, easy to write
SELECT * FROM foo WHERE dt > '2021-10-09 23:05:00' AND dt < '2021-10-11 01:00:00'

It can be improved by manually calculating the day and hour that minimally enclose the time period:

-- fast, painful to write
SELECT * FROM foo
WHERE
  ((day, hh) IN (('2021-10-09', '23'), ('2021-10-11', '00')) OR day = '2021-10-10')
AND
  dt > '2021-10-09 23:05:00' AND dt < '2021-10-11 01:00:00'

Ideally this extra condition could be added transparently by the database so as to avoid having to manually add the ((day,hh) IN (...)).

Is this possible somehow with Athena?

3
  • I can't visualize the table structure and its column. Can you add a few rows of sample data? Commented Oct 23, 2021 at 14:04
  • @TheImpaler I made an edit to hopefully clarify that! Commented Oct 23, 2021 at 14:25
  • I think you may find better answers to this question at dba.stackexchange.com Commented Oct 24, 2021 at 1:35

1 Answer 1

0

I've wished for this feature many times, but unfortunately Athena does not support it. You have to include both the predicate for the dt column and the day and hour partition keys.

Sign up to request clarification or add additional context in comments.

1 Comment

I don't have enough reputation to upvote, but thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.