Newest 'aws-glue' Questions

0 votes

0 answers

11 views

Can I output Salesforce object data as csv to S3 bucket using AWS Glue zero ETL?

I've been looking at better ways to extract Salesforce data for our organization and found the announcement on AWS Glue zero ETL now using the Salesforce bulk api and the performance results sound ...

user3165854

1,637

asked yesterday

Advice

0 votes

0 replies

19 views

Applying a Single AWS Glue Data Quality Ruleset to Multiple Glue Jobs with Dynamic Column Input

Team, We are implementing a new requirement to integrate Data Quality (DQ) rules within AWS Glue Studio. We have successfully created DQ rules using the DQDL builder, leveraging built-in rulesets, and ...

Prainika

15

asked Nov 13 at 3:35

3 votes

0 answers

87 views

How to convert epoch to datetime in Datadog dashboard?

I have a Datadog dashboard displaying the metrics we get for our AWS Glue Zero-ETL integrations. One of those is lastSyncTimestamp, the epoch timestamp until which source has been synced to target. I ...

dan

31

asked Nov 1 at 19:16

0 votes

0 answers

50 views

Is it possible to update script section for AWS Glue ETL or Glue streaming Jobs using AWS CLI?

Version my python script for each change and push to S3 with new version aws s3 cp aws_glue_script_v1.0.3_1.py s3://mytestcicdglue/glue-scripts/aws_glue_script_v1.0.3_1.py I have skeleton json of ...

aparnagottumukkala

3

asked Oct 15 at 8:51

1 vote

0 answers

32 views

Version control Athena queries

I'm running a data pipeline through glue notebooks that references Athena saved queries and runs them sequentially.The pipeline is working well but there is no version control for the Athena queries. ...

user1783504

475

asked Aug 29 at 20:12

0 votes

0 answers

43 views

AWS Glue and IAM conditional access

How to write an AWS IAM Policy document such that it does the following: { "Version": "2012-10-17", "Statement": [ { "Action": "ec2:...

Kojimba

125

asked Aug 26 at 11:24

0 votes

0 answers

24 views

Unable to connect my S3 bucket to my data source for my AWS glue ETL job

I am trying to start an ETL job on AWS GLUE Visual editor which is fairly intuitive however with my very first step, I wanted to connect to my S3 bucket as my data source. So my first step was to ...

Mind Yours

1

asked Aug 23 at 14:07

0 votes

0 answers

115 views

How to configure AWS Glue to trust custom SSL certificate for SAP OData connection?

Code I’m running: connection_type="sapodata", connection_options={ "ENABLE_CDC": "false", "connectionName": "sapodata-connection&...

Lintang Gilang Pratama

97

asked Aug 22 at 8:11

1 vote

1 answer

73 views

How to avoid full table scan in Glue's create_dynamic_frame.from_options for dynamodb

My Dynamodb table has both PK and SK. it has huge data set(500 GB). I'm using below syntax for querying data based on PK in Glue, But it does a full table scan leading to the glue timeout. Have ...

mariz

541

asked Aug 11 at 17:12

0 votes

0 answers

147 views

Athena is appending UTC to an iceberg timestamp results, why? how to fix it?

I am storing a simple datetime value (e.g 2025-01-24 13:58:14.000) from SQL to an iceberg table using glue catelog. I don't want anything with timezones. We only work in EST so all our datetimes don't ...

Chaitanya Kulkarni

23

asked Aug 4 at 11:28

0 votes

1 answer

49 views

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...

Roobal Jindal

294

asked Jul 9 at 7:00

0 votes

0 answers

48 views

Using AWS Glue, how can I process different file types in a folder to their own Glue table

I am new to AWS Glue, Apache Spark and all things big data. I have files being delivered to S3 with the following structure. s3://raw-data/dd-mm-yyyy/<source>/<product>/<reportType>/[...

jambit

313

asked Jun 28 at 4:48

1 vote

0 answers

43 views

Manually add Oracle Procedures as a 'data job' nodes in DataHub lineage models

We're trialling Datahub for the first time, and have used AWS Glue Data Catalog to connect to our Oracle database, and then connected Datahub to our Glue Data Catalog to pull the table/column metadata ...

Jon295087

751

asked Jun 18 at 12:14

0 votes

0 answers

56 views

Return value from Glue job to Xcom

Is there any way to return any value from Glue ETL job to airflow’s task (Xcom) which triggers that glue job ? Thanks

Kate

285

asked May 18 at 16:13

0 votes

0 answers

22 views

Glue Job Fails When Exporting from DocumentDB to Azure Blob Storage Due to Mongo Spark Connector Schema Inference

I'm using AWS Glue 4.0 to export data from AWS DocumentDB to Azure Blob Storage. The job is written in PySpark and uses the MongoDB Spark Connector. Below are the jars added to the Glue job: mongo-...

SONIA_29

58

asked May 16 at 9:03

-1 votes

1 answer

214 views

AnalysisException: This Delta operation requires the SparkSession to be configured [closed]

I have a PySpark script using Glue 4.0 which reads parquet and write Delta Lake. It works well. Here is my PySpark script: import logging import os import sys from awsglue.context import GlueContext ...

Hongbo Miao

50.7k

asked May 13 at 22:06

0 votes

1 answer

44 views

duplicate removal from grouped and merged data frame fails generating duplicates in final JSON

I have two dataframes as below: DataFrame 1: df1 UniqueId VendorId Fname LName VendorAccNo 001 12 ABC XYZ 8787888 002 13 XYZ FFF 8787888 003 14 PQR ZZZ 8787888 005 16 MMM TTT 5432100 006 17 BBB XXX ...

Virendra Wadekar

35

asked May 11 at 4:57

0 votes

1 answer

44 views

Data transformation in AWS

I have two years of IOT telemetry data in a S3 bucket (json format). I want to transform with Glue in the as mentioned below to another S3 in the data lake. Structure is : year, month, day, hour, ...

Ramanarao

1

asked May 3 at 4:42

0 votes

1 answer

62 views

Write partitioned col in s3 file too

I’m writing to glue table, where I’m having (country and state) as a partition column. But If I read directly from s3 bucket ( base of Athena table), I’m not seeing these partition columns ( country ...

Ashish Jangra

35

asked Apr 25 at 11:41

0 votes

0 answers

81 views

Unable to use pyarrow optimization in AWS Glue

In my AWS Glue (4.0 which supports spark 3.3), I am trying to optimize by using this: spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true") but it gives me a warning /...

Karim Baig

1

asked Apr 23 at 22:32

2 votes

1 answer

653 views

Read incremental data from iceberg tables using Spark SQL

I am trying to read incremental data between two snapshots I have last processed snapshot (my day0 load) and below is my code snippet to read incremental data incremental_df = spark.read.format("...

Abhi5421

33

asked Apr 16 at 8:26

1 vote

1 answer

91 views

Is Data catalog and Crawler mandatory for Glue

I am reading about the use of AWS Glue for ETL. https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html In Data Discovery and cataloging, AWS talks about creating a Crawler for Data cataloging. ...

Kul

119

asked Apr 14 at 3:42

0 votes

0 answers

29 views

aws list_findings parameters changed in request

I am currently using boto3 list findings to return all findings for various aws accounts. I am getting the following error sporadically (Service: MandoFindings, Status Code: 400,) Pagination token ...

em456

441

asked Apr 10 at 9:12

0 votes

1 answer

124 views

AWS Athena is not processing any data from glue table if partition projection is enabled

I have a glue table that is fed by partitioned data in s3. The issue at hand is in Athena that if the partition projection is turned off, and I run MSCK REPAIR TABLE <my table>; and SELECT * ...

Raisin

21

asked Apr 3 at 12:43

0 votes

0 answers

309 views

AWS Glue 5.0 "Installation of Python modules timed out after 10 minutes"

I have an AWS Glue 5.0 job where I am specifying --additional-python-modules s3://my-dev/other-dependencies /MyPackage-0.1.1-py3-none-any.whl in my job options. My glue job itself is just a print(&...

Martin

1,582

asked Mar 27 at 19:27

Collectives™ on Stack Overflow

Can I output Salesforce object data as csv to S3 bucket using AWS Glue zero ETL?

Applying a Single AWS Glue Data Quality Ruleset to Multiple Glue Jobs with Dynamic Column Input

How to convert epoch to datetime in Datadog dashboard?

Is it possible to update script section for AWS Glue ETL or Glue streaming Jobs using AWS CLI?

Version control Athena queries

AWS Glue and IAM conditional access

Unable to connect my S3 bucket to my data source for my AWS glue ETL job

How to configure AWS Glue to trust custom SSL certificate for SAP OData connection?

How to avoid full table scan in Glue's create_dynamic_frame.from_options for dynamodb

Athena is appending UTC to an iceberg timestamp results, why? how to fix it?

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

Using AWS Glue, how can I process different file types in a folder to their own Glue table

Manually add Oracle Procedures as a 'data job' nodes in DataHub lineage models

Return value from Glue job to Xcom

Glue Job Fails When Exporting from DocumentDB to Azure Blob Storage Due to Mongo Spark Connector Schema Inference

AnalysisException: This Delta operation requires the SparkSession to be configured [closed]

duplicate removal from grouped and merged data frame fails generating duplicates in final JSON

Data transformation in AWS

Write partitioned col in s3 file too

Unable to use pyarrow optimization in AWS Glue

Read incremental data from iceberg tables using Spark SQL

Is Data catalog and Crawler mandatory for Glue

aws list_findings parameters changed in request

AWS Athena is not processing any data from glue table if partition projection is enabled

AWS Glue 5.0 "Installation of Python modules timed out after 10 minutes"

Hot Network Questions