Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
11 views

I've been looking at better ways to extract Salesforce data for our organization and found the announcement on AWS Glue zero ETL now using the Salesforce bulk api and the performance results sound ...
user3165854's user avatar
  • 1,637
Advice
0 votes
0 replies
19 views

Team, We are implementing a new requirement to integrate Data Quality (DQ) rules within AWS Glue Studio. We have successfully created DQ rules using the DQDL builder, leveraging built-in rulesets, and ...
Prainika's user avatar
3 votes
0 answers
87 views

I have a Datadog dashboard displaying the metrics we get for our AWS Glue Zero-ETL integrations. One of those is lastSyncTimestamp, the epoch timestamp until which source has been synced to target. I ...
dan's user avatar
  • 31
0 votes
0 answers
50 views

Version my python script for each change and push to S3 with new version aws s3 cp aws_glue_script_v1.0.3_1.py s3://mytestcicdglue/glue-scripts/aws_glue_script_v1.0.3_1.py I have skeleton json of ...
aparnagottumukkala's user avatar
1 vote
0 answers
32 views

I'm running a data pipeline through glue notebooks that references Athena saved queries and runs them sequentially.The pipeline is working well but there is no version control for the Athena queries. ...
user1783504's user avatar
0 votes
0 answers
43 views

How to write an AWS IAM Policy document such that it does the following: { "Version": "2012-10-17", "Statement": [ { "Action": "ec2:...
Kojimba's user avatar
  • 125
0 votes
0 answers
24 views

I am trying to start an ETL job on AWS GLUE Visual editor which is fairly intuitive however with my very first step, I wanted to connect to my S3 bucket as my data source. So my first step was to ...
Mind Yours's user avatar
0 votes
0 answers
115 views

Code I’m running: connection_type="sapodata", connection_options={ "ENABLE_CDC": "false", "connectionName": "sapodata-connection&...
Lintang Gilang Pratama's user avatar
1 vote
1 answer
73 views

My Dynamodb table has both PK and SK. it has huge data set(500 GB). I'm using below syntax for querying data based on PK in Glue, But it does a full table scan leading to the glue timeout. Have ...
mariz's user avatar
  • 541
0 votes
0 answers
147 views

I am storing a simple datetime value (e.g 2025-01-24 13:58:14.000) from SQL to an iceberg table using glue catelog. I don't want anything with timezones. We only work in EST so all our datetimes don't ...
Chaitanya Kulkarni's user avatar
0 votes
1 answer
49 views

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
Roobal Jindal's user avatar
0 votes
0 answers
48 views

I am new to AWS Glue, Apache Spark and all things big data. I have files being delivered to S3 with the following structure. s3://raw-data/dd-mm-yyyy/<source>/<product>/<reportType>/[...
jambit's user avatar
  • 313
1 vote
0 answers
43 views

We're trialling Datahub for the first time, and have used AWS Glue Data Catalog to connect to our Oracle database, and then connected Datahub to our Glue Data Catalog to pull the table/column metadata ...
Jon295087's user avatar
  • 751
0 votes
0 answers
56 views

Is there any way to return any value from Glue ETL job to airflow’s task (Xcom) which triggers that glue job ? Thanks
Kate's user avatar
  • 285
0 votes
0 answers
22 views

I'm using AWS Glue 4.0 to export data from AWS DocumentDB to Azure Blob Storage. The job is written in PySpark and uses the MongoDB Spark Connector. Below are the jars added to the Glue job: mongo-...
SONIA_29's user avatar
-1 votes
1 answer
214 views

I have a PySpark script using Glue 4.0 which reads parquet and write Delta Lake. It works well. Here is my PySpark script: import logging import os import sys from awsglue.context import GlueContext ...
Hongbo Miao's user avatar
  • 50.7k
0 votes
1 answer
44 views

I have two dataframes as below: DataFrame 1: df1 UniqueId VendorId Fname LName VendorAccNo 001 12 ABC XYZ 8787888 002 13 XYZ FFF 8787888 003 14 PQR ZZZ 8787888 005 16 MMM TTT 5432100 006 17 BBB XXX ...
Virendra Wadekar's user avatar
0 votes
1 answer
44 views

I have two years of IOT telemetry data in a S3 bucket (json format). I want to transform with Glue in the as mentioned below to another S3 in the data lake. Structure is : year, month, day, hour, ...
Ramanarao's user avatar
0 votes
1 answer
62 views

I’m writing to glue table, where I’m having (country and state) as a partition column. But If I read directly from s3 bucket ( base of Athena table), I’m not seeing these partition columns ( country ...
Ashish Jangra's user avatar
0 votes
0 answers
81 views

In my AWS Glue (4.0 which supports spark 3.3), I am trying to optimize by using this: spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true") but it gives me a warning /...
Karim Baig's user avatar
2 votes
1 answer
653 views

I am trying to read incremental data between two snapshots I have last processed snapshot (my day0 load) and below is my code snippet to read incremental data incremental_df = spark.read.format("...
Abhi5421's user avatar
1 vote
1 answer
91 views

I am reading about the use of AWS Glue for ETL. https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html In Data Discovery and cataloging, AWS talks about creating a Crawler for Data cataloging. ...
Kul's user avatar
  • 119
0 votes
0 answers
29 views

I am currently using boto3 list findings to return all findings for various aws accounts. I am getting the following error sporadically (Service: MandoFindings, Status Code: 400,) Pagination token ...
em456's user avatar
  • 441
0 votes
1 answer
124 views

I have a glue table that is fed by partitioned data in s3. The issue at hand is in Athena that if the partition projection is turned off, and I run MSCK REPAIR TABLE <my table>; and SELECT * ...
Raisin's user avatar
  • 21
0 votes
0 answers
309 views

I have an AWS Glue 5.0 job where I am specifying --additional-python-modules s3://my-dev/other-dependencies /MyPackage-0.1.1-py3-none-any.whl in my job options. My glue job itself is just a print(&...
Martin's user avatar
  • 1,582

1
2 3 4 5
98