2

I am deploying a spark job in AWS EMR and packaging all my dependencies using docker. My pythonized spark submit command looks like this

    ...
    cmd = (
            f"spark-submit --deploy-mode cluster "
            f"spark-submit --deploy-mode {deploy_mode} "
            f"--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker "
            f"--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE={docker_image} "
            f"--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG={config} "
            f"--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro "
            f"--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker "
            f"--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE={docker_image} "
            f"--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG={config} "
            f"--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro "
            f"{path}"
        )
    ...

It worked as expected when my deploy_mode is cluster but I don't see any of my docker dependency when deploy_mode is client. Can anyone help why this is happening and is it normal?

1 Answer 1

2

The docker containers are managed by Yarn on EMR: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-docker.html

In client mode, your Spark driver doesn't run in a docker container because that process is not managed by Yarn, it is directly executed on the node that runs the spark-submit command. In cluster mode your driver is managed by Yarn and as so executed inside a docker container.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.