0

I am getting this error running locally in pyCharm and tried all options:

Caused by: java.io.IOException: Cannot run program "/usr/local/Cellar/apache-spark/3.0.1/libexec/bin": error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
    at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:131)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

./bash_profile: export SPARK_HOME=/usr/local/opt/apache-spark/libexec/ export PYTHONPATH=/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/opt/apache-spark/libexec/python/:/usr/local/lib/python3.9:$PYTHONP$ export PATH=$SPARK_HOME/bin/:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PATH #export PATH=$SPARK_HOME/python:$PATH

ls -lrt /usr/local/opt/apache-spark:

/usr/local/opt/apache-spark -> ../Cellar/apache-spark/3.0.1

Python Interpretor in PyCharm: /usr/local/bin/python3

In my code:

if __name__ == '__main__':
    #import os
    #import sys
    #os.environ['SPARK_HOME'] = "/usr/local/opt/apache-spark/libexec/"
    #sys.path.append("/usr/local/opt/apache-spark/libexec/python")
    #sys.path.append("/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip")
    #findspark.init()
    #conf = SparkConf()
    #conf.set("fs.defaultFS", "file:///")
    spark = SparkSession.builder.master("local").appName("SyslogMaskUtility").getOrCreate()
    sc = spark.sparkContext
    #sc.setLogLevel("WARN")
    rdd_raw = sc.textFile('/Users/abcd/PycharmProjects/SyslogToJson/SyslogParser/syslog_event.txt')
    print(rdd_raw.count())
    spark.stop()

I followed: https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f

and referred: Spark installation seems ok but when running program I'm having issues with environment variables. Is this .bash_profile correct?

All directories and files under /usr/local/opt/apache-spark/libexec/ have all permissions:

drwxrwxrwx   13 abcd  admin   416 Oct 29 17:34 bin

Any help please since I am struggling with this. The same code works when I run from the pyspark command line.

Thanks.

2
  • Try installing Spark without homebrew Commented Apr 26, 2021 at 6:52
  • Okay. I will follow this: medium.com/luckspark/… Commented Apr 26, 2021 at 8:00

1 Answer 1

0

On my Mac, I install Spark and Hadoop separately

# install PySpark
pip3 install pyspark

# download and extract Hadoop 3.2.2
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
unzip hadoop-3.2.2.tar.gz

# setup environment variables
export JAVA_HOME='/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/'
export SPARK_DIST_CLASSPATH="hadoop-3.2.2/share/hadoop/tools/lib/*"

# run Python
python3
from pyspark.sql import SparkSession
from pyspark.sql import types as T

spark = (SparkSession
    .builder
    .master('local[*]')
    .appName('SO')
    .getOrCreate()
)
# <pyspark.sql.session.SparkSession object at 0x10f9e7220>
Sign up to request clarification or add additional context in comments.

4 Comments

I ran thru the steps as above. From the terminal I am able to run but from PyCharm I get the same error - Caused by: java.io.IOException: Cannot run program "/usr/local/Cellar/apache-spark/3.0.1/libexec/bin": error=13, Permission denied
export JAVA_HOME=/Library/Java/JavaVirtualMachiexport SPARK_HOME=/usr/local/opt/apache-spark/libexec/ export SPARK_DIST_CLASSPATH="/Users/rm185431/hadoop-3.2.2/share/hadoop/tools/lib/*" export PYTHONPATH=/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/opt/apache-spark/libexec/python/:/usr/local/lib/python3.9:$PYTHONP$ export SPARK_HOME=/usr/local/opt/apache-spark/libexec/ export PATH=$SPARK_HOME/bin/:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PATH export PATH=$SPARK_HOME/python:$PATH
The above are the path I have added to my .bash_profile
Environment variable in Edit configuration has: PYTHONUNBUFFERED=1;PYSPARK_PYTHON=/usr/local/Cellar/apache-spark/3.0.1/libexec/bin;PYSPARK_DRIVER_PYTHON=/usr/local/Cellar/apache-spark/3.0.1/libexec/bin;PYTHONPATH=/usr/local/Cellar/apache-spark/3.0.1/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/Cellar/apache-spark/3.0.1/libexec/python/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.