3,630 questions
Advice
0
votes
1
replies
34
views
Query S3 from Athena without a table
Is there a way to query data from a S3 bucket through Athena without creating a table?
Something like this:
with
table1 as (select * from real_table),
table2 as (create table from s3 location)...
0
votes
0
answers
38
views
Query Timeout in Athena while querying partition key names
I have an Athena table in my AWS account and I am using an S3 buckets as a database for which i have defined the schema in AWS glue. I currently have Hive style partitioning(manual partitionig) of ...
1
vote
0
answers
49
views
DynamoDB to Athena - dynamic schema detection for evolving "map" field
In AWS, I have a DynamoDB table with one of the fields as a map (my_map). This table is exposed to Athena, so data can be queried there (and eventually in Grafana).
The Athena table shows a schema ...
1
vote
4
answers
137
views
Get previous latest if required latest not available
I have some data like this:
Id
timestamp
100
2025-01-27 10:00:00
100
2025-01-26 10:00:00
100
2025-01-25 10:00:00
100
2024-04-20 10:00:00
100
2024-03-25 10:00:00
100
2023-05-05 10:00:00
100
2022-08-01 ...
0
votes
1
answer
94
views
AWS Athena cross-account and cross-region output location - "Unable to verify/create output bucket" error
Description:
I'm trying to configure AWS Athena to write query results to an S3 bucket that is in a different AWS account AND different region, but I'm getting the "Unable to verify/create output ...
0
votes
0
answers
107
views
How to avoid a full table scan with Iceberg "merge"
I am currently running into an issue with using Athena's iceberg "merge" where it ends up scanning the entire source + target table.
For example, I have a source table and a target table.
...
0
votes
0
answers
76
views
Error fetching data catalog from AWS Athena through boto3 in python: botocore.errorfactory.InvalidRequestException
I'm trying to work with AWS' client API in python using boto3. I've been trying get_data_catalog() but it throws an error:
File ".../.venv/lib/python3.13/site-packages/botocore/client.py", ...
0
votes
0
answers
82
views
Query the size of files per year of an S3 (since its création) using Athena
I am looking to compute the total size of my files per year inside a given S3 bucket. I have been trying multiple methods:
I tried to use a goto3 script but I constantly ran into issues or ...
1
vote
1
answer
55
views
AWS Athena - Create Table As - default storage format
When creating a table in AWS - Athena with a CTAS statement, I'm trying to understand how the table is stored in S3 and how I can convert the output to CSV.
create table1 as
select col1, col2 from ...
1
vote
1
answer
84
views
Duplicate Records in Parquet (Processed) Table after AWS Glue Job execution
We have an AWS Glue pipeline where:
A crawler populates a raw database table from partitioned JSON files in S3.
S3 structure:
raw/
├── org=21/
│ └── 221.json
└── org=23/
└── 654.json
...
1
vote
0
answers
439
views
How can I repair or recreate an Iceberg table in Athena without losing data?
I am currently working with an Iceberg table in Athena. However, I am facing an issue where some of the data files have become corrupted or deleted due to incorrect lifecycle policies in s3. When I ...
1
vote
1
answer
390
views
How to query cross-account, cross-region S3 data in Athena via PrivateLink? [closed]
I'm setting up AWS Athena to query data stored in an S3 bucket that is:
in a different AWS account
in a different region
and accessible via a VPC endpoint (PrivateLink) to S3
here's what I've done ...
0
votes
1
answer
124
views
AWS Athena is not processing any data from glue table if partition projection is enabled
I have a glue table that is fed by partitioned data in s3. The issue at hand is in Athena that if the partition projection is turned off, and I run MSCK REPAIR TABLE <my table>; and SELECT * ...
0
votes
1
answer
71
views
Tune spark sql due to skewed data
The SQL in question:
WITH
first_cte AS (
SELECT
s.two_id,
s.d_month,
COUNT(*) AS total,
COUNT(DISTINCT one_id) AS unique_one_id_count
FROM ...
1
vote
0
answers
24
views
Drop / hide fields from s3 based data lake consumable Athena table
I have a data lake implemented using AWS s3.
Bronze and silver layer are implemented, with data in silver layer exposed for access via API and also via jdbc/odbc based sql client.
We have a ...
0
votes
1
answer
75
views
Can't Query AWS Athena Presto Table Because of Dash Character in Column name
I have a file in S3 with the following contents:
{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}...
1
vote
2
answers
112
views
Amazon Athena - SQL Query to Return all rows for ID where one row meets a condition and does not meet a condition
I am trying to write a query to return ALL rows for an ID where a condition is met and a condition is not met for each ID on the Order table.
The conditions I want are to return all rows where the ID'...
1
vote
1
answer
288
views
Is it possible to create Iceberg tables through Athena using Flyway
We are planning to use Iceberg tables instead of PostgreSQL. For PostgreSQL, we were using Flyway for database migrations. So, I wonder if it is possible to do it for Iceberg tables, as well. ChatGPT ...
0
votes
2
answers
86
views
Remove duplicate record using Unnest | Aws Athena
I am facing issue while filtering data with array
I have columns userid,event_name,attributes,ti
Attributes column have value like this
{"bool_sample":true,"array_int":[10,20,25,38]...
0
votes
0
answers
50
views
Athena SQL Mismatched Input
I get the following error
Failed to start Athena query: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:748: mismatched input '.'. Expecting: '='
...
0
votes
1
answer
200
views
AWS Redshift spectrum not able to return data for external table where data type timestamp
I'm trying to query data through Redshift Spectrum using an external schema from the Glue catalog but encountering an issue with a column that has a timestamp data type. When I run the query SELECT * ...
1
vote
0
answers
26
views
Load all subpartitions within a specific partition dynamically AWS athena present in s3
CREATE EXTERNAL TABLE testpart
(id bigint,
eventday bigint,
eventhour bigint
PARTITIONED BY (eventday smallint,eventhour bigint)
ROW FORMAT SERDE'org.apache.hadoop.hive.ql.io.parquet.serde....
0
votes
2
answers
99
views
Amazon Athena: SQL Query for Column values with different character legnths
I have two tables an "Enrolled Table" and a "Customer Table".
I only want to show all records on the Customer Table where the enrolled_no on the Customer table matches the ...
0
votes
0
answers
109
views
SQLAlchemy and Athena
I am trying to connect to athena database using SQLAlchemy create_engine api. The datasource name is Athena-xxxx, the database name is amazon_security_lake_glue_db_ca_central_1 and the primary ...
2
votes
0
answers
206
views
ICEBERG_BAD_DATA with Firehose Iceberg table destination
We have been trying Firehose for Iceberg Tables. The source is Kinesis stream attached to DynamoDB tables with some Lambda processing in between.
Table has been successfully filled by Firehose, but ...