5

I am initializing PySpark from within a Jupyter Notebook as follows:

from pyspark import SparkContext
#
conf = SparkConf().setAppName("PySpark-testing-app").setMaster("yarn")
conf = (conf.set("deploy-mode","client")
       .set("spark.driver.memory","20g")
       .set("spark.executor.memory","20g")
       .set("spark.driver.cores","4")
       .set("spark.num.executors","6")
       .set("spark.executor.cores","4"))

sc = SparkContext(conf=conf)
sqlContext = SQLContext.getOrCreate(sc)

However, when I launch YARN GUI and look into "RUNNING Applications" I see my session being allocated with 1 container, 1 vCPU, and 1GB of RAM, i.e. the default values! Can I get the desired, passing values as listed above?

2 Answers 2

6

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be setted with class 'sparkConf'. you must set it in command line.

Take a look at official doc's explains at memory's setting:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

there is another way that can make it.

import os
memory = '20g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

So, other config should be taked with same way like above.

Sign up to request clarification or add additional context in comments.

Comments

0

Execute

    %%configure -f
{
    "driverMemory" : "20G",
    "executorMemory": "20G"
}

On the top of all cells (before Spark initializes)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.