Setting PySpark executor.memory and executor.core within Jupyter Notebook

Question

I am initializing PySpark from within a Jupyter Notebook as follows:

from pyspark import SparkContext
#
conf = SparkConf().setAppName("PySpark-testing-app").setMaster("yarn")
conf = (conf.set("deploy-mode","client")
       .set("spark.driver.memory","20g")
       .set("spark.executor.memory","20g")
       .set("spark.driver.cores","4")
       .set("spark.num.executors","6")
       .set("spark.executor.cores","4"))

sc = SparkContext(conf=conf)
sqlContext = SQLContext.getOrCreate(sc)

However, when I launch YARN GUI and look into "RUNNING Applications" I see my session being allocated with 1 container, 1 vCPU, and 1GB of RAM, i.e. the default values! Can I get the desired, passing values as listed above?

Jack_H · Accepted Answer · 2018-11-14 08:49:13Z

6

Jupyter notebook will launch the pyspark with yarn-client mode, the driver memory and some configs cannot be setted with class 'sparkConf'. you must set it in command line.

Take a look at official doc's explains at memory's setting:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.

there is another way that can make it.

import os
memory = '20g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

So, other config should be taked with same way like above.

edited Nov 14, 2018 at 8:49

answered Nov 14, 2018 at 8:10

Jack_H

611 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Oresto · Accepted Answer · 2021-09-09 12:52:44Z

0

Execute

    %%configure -f
{
    "driverMemory" : "20G",
    "executorMemory": "20G"
}

On the top of all cells (before Spark initializes)

answered Sep 9, 2021 at 12:52

Oresto

3451 gold badge4 silver badges12 bronze badges

Collectives™ on Stack Overflow

Setting PySpark executor.memory and executor.core within Jupyter Notebook

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related