Spark: Executor Lost Failure (After adding groupBy job)

Question

I’m trying to run Spark job on Yarn client. I have two nodes and each node has the following configurations.

I’m getting “ExecutorLostFailure (executor 1 lost)”.

I have tried most of the Spark tuning configuration. I have reduced to one executor lost because initially I got like 6 executor failures.

These are my configuration (my spark-submit) :

HADOOP_USER_NAME=hdfs spark-submit --class genkvs.CreateFieldMappings --master yarn-client --driver-memory 11g --executor-memory 11G --total-executor-cores 16 --num-executors 15 --conf "spark.executor.extraJavaOptions=-XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" --conf spark.akka.frameSize=1000 --conf spark.shuffle.memoryFraction=1 --conf spark.rdd.compress=true --conf spark.core.connection.ack.wait.timeout=800 my-data/lookup_cache_spark-assembly-1.0-SNAPSHOT.jar -h hdfs://hdp-node-1.zone24x7.lk:8020 -p 800

My data size is 6GB and I’m doing a groupBy in my job.

def process(in: RDD[(String, String, Int, String)]) = {
    in.groupBy(_._4)
}

I’m new to Spark, please help me to find my mistake. I’m struggling for at least a week now.

Thank you very much in advance.

WestCoastProjects · Accepted Answer · 2015-11-11 05:37:45Z

1

Two issues pop out:

the spark.shuffle.memoryFraction is set to 1. Why did you choose that instead of leaving the default 0.2 ? That may starve other non shuffle operations
You only have 11G available to 16 cores. With only 11G I would set the number of workers in your job to no more than 3 - and initially (to get past the executors lost issue) just try 1. With 16 executors each one gets like 700mb - which then no surprise they are getting OOME / executor lost.

answered Nov 11, 2015 at 5:37

WestCoastProjects

63.9k109 gold badges368 silver badges636 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

Renien Over a year ago

Initially I got this following error :Missing an output location for shuffle 0. So I got to know I need to increase my shuffle memory fraction. Ok thanks for your suggestions. I will try and get back to you.

Renien Over a year ago

still it fails - Spark: Executor Lost Failure

Renien Over a year ago

when I go through the stack it shows : spark java.lang.OutOfMemoryError: Java heap space exception. I think because of this reason it's loosing the executors. Any suggestions ?

WestCoastProjects Over a year ago

I mentioned that in the answer to try a single executor. You don't have enough memory available to the cluster in general.

Renien Over a year ago

Yes, I tried it. Even with single executor same issue.

|

Collectives™ on Stack Overflow

Spark: Executor Lost Failure (After adding groupBy job)

1 Answer 1

15 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Your Answer

Sign up or log in

Post as a guest

Related