44 questions
0
votes
1
answer
49
views
Unable to register database/table in aws glue when hudi job is submitted from emrserverless
I am using emr 6.15 and hudi 0.14
I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
0
votes
0
answers
42
views
Setting JAR in EMR workspace using EMR Serverless application
I have since switched from EMR on EC2 to EMR serverless. I used to use interactive notebooks with EMR on EC2.
I am trying to use the EMR studio workspace (notebooks) with EMR serverless application ...
0
votes
0
answers
99
views
How to get the EMR Serverless Job URL with EMR Studio Information Missing from the Event?
I'm working with AWS EMR Serverless, and I need to construct a job URL for an EMR Serverless job to be sent in a message notification in case of state change. The desired URL includes the associated ...
1
vote
0
answers
70
views
Optimizing PySpark Feature Engineering with Over a Billion Rows on EMR
I’m working with a large transaction dataset (~1 billion rows) in PySpark on AWS EMR. My goal is to perform feature engineering where I compute statistics like sum, mean, standard deviation, and ...
0
votes
1
answer
206
views
Is it possible to submit a Docker image as a Spark job to EMR Serverless?
I have a Docker image that contains some application code that interacts with Spark.
Is it possible to submit this image to a Spark cluster for execution?
If so, how?
# Not a real command
$ aws emr-...
1
vote
1
answer
624
views
Does EMR Serverless support Bootstrap action
When EMR cluster is created, there is a provision to provide bootstrap actions as shown below.
aws emr create-cluster --name "Test cluster" --release-label emr-7.1.0
--use-default-roles --...
0
votes
1
answer
843
views
How do I pass xcom to traditional operator from task created using @task decorator?
I'm trying to get my head around the TaskFlow API & XCom in airflow and am getting stuck, hoping someone here can help. I'm using EmrServerlessCreateApplicationOperator and I want to pass a value ...
0
votes
1
answer
356
views
Inexplicable PySpark SQL array indexing error: Index 1 out of bounds for length 1
I'm seeing an inexplicable array index reference error,
Index 1 out of bounds for length 1
... which I can't explain because I don't see any relevant arrays being referenced in my context of an AWS ...
0
votes
1
answer
1k
views
EMRserverless is allocating half of the memory to the executors than what we actually define in spark jobs
When I define an spark's executor's memory to 12gb, it actually allocates almost half of it like 6.7gb.
Tried setting 20gb as well, then it allocates close to 11gb, half of it.
I have defined ...
0
votes
0
answers
396
views
(spark jdbc) SQLRecoverableException: I/O Exception: Connection reset
I have been working on a request that extracts data from an ORACLE 19c instance, and that is processed using aws emr-serveless using spark jdbc connections.
The big picture is that I can't connect to ...
0
votes
1
answer
240
views
EmrServerlessCreateApplicationOperator networkConfiguration with multiple subnetIds
If I pass more than one subnet Id to EmrServerlessCreateApplicationOperator via the networkConfiguration attribute, I receive an error.
If I use a single subnet Id the operator works fine. This is the ...
1
vote
2
answers
2k
views
botocore.exceptions.NoRegionError: You must specify a region for EmrServerlessCreateApplicationOperator
I am trying to create a emr-serverless application through the EmrServerlessCreateApplicationOperator but I keep facing the error botocore.exceptions.NoRegionError: You must specify a region.
I am ...
1
vote
1
answer
2k
views
EMR serverless- Pass jars in console
I'm new with EMR-serverless and I want to know how to pass, in a spark application, jar and packages as for example:
spark-submit --deploy-mode client --jars /usr/lib/hudi/hudi-spark3.3-bundle_2.12-0....
1
vote
1
answer
3k
views
AWS EMR serverless - how to submit pyspark jobs (using console) with multiple files?
Hi i am new to EMR serverless and trying to learn. I have a pyspark project which i want to run using EMR serverless. I tried using console but it is not letting me provide folder location as input. i ...
0
votes
1
answer
281
views
How to delete AWSServiceRoleForAmazonEMRServerless?
I am new to AWS and my account got hacked and in order to secure the account I have been advised to delete IAM roles. There is one role called AWSServiceRoleForAmazonEMRServerless that I am unable to ...
1
vote
1
answer
874
views
EMR Serverless Airflow Operator not allowing EMR custom images
I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark ...
0
votes
2
answers
1k
views
How to run existing EMR serverless job with boto3?
From boto3 doc for the start_job_run, it seems like I have to create job run every time I want to trigger a job. Does it really have to work that way? Can't I take the ID of the existing job, which ...
1
vote
0
answers
2k
views
AWS EMR serverless connect to jdbc SQL Server
I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark.
So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an ...
1
vote
2
answers
4k
views
EMR Serverless Spark Executors Timeout
I have an EMR Serverless application that is getting stuck in executions timeouts for some reason. I have tested all s3 connections and it's working. The problem is happening during the execution of a ...
1
vote
1
answer
4k
views
Virtualenv in aws emr-serverless
I'm trying to run some jobs on aws cli using a virtual environment where I installed some libraries. I followed this guide; the same is here.
But when I run the job I have this error:
Job execution ...
3
votes
0
answers
2k
views
AWS EMR Serverless spark properties delimter
I'm trying to run a spark job using EMR Serverless but the issue is I cannot pass the list of jars and archives to the spark job.
The spark properties section does not seem to allow passing in a comma ...
0
votes
1
answer
444
views
How to run the map reduce jobs on EMRserverless?
Based on the documentation, Amazon EMR serverless seems to accepts only Spark and Hive as job driver. Is there any support for custom Hadoop jar for map reduce jobs on serverless similar to EMR ?