i am new to pyspark.
i have installed java 17 and made sure it works
C:\Windows\System32>java -version
java version "17.0.12" 2024-07-16 LTS
installed python 3.9 and made sure it works
C:\Windows\System32>python --version
Python 3.9.13
copied winutils.exe and placed in a folder C:\winutils\bin
set HADOOP_HOME = C:\winutils
then i ran
C:\Windows\System32>pip -install pyspark
C:\Windows\System32>pip -install "pyspark[sql]"
C:\Windows\System32>pip -install findspark
then i ran
C:\Windows\System32>pyspark
and got a spark session going
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/11/15 14:01:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.1
/_/
Using Python version 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022 16:36:42)
Spark context Web UI available at http://LAPTOP-FE5VVC1N:4040
Spark context available as 'sc' (master = local[*], app id = local-1763244089545).
SparkSession available as 'spark'.
>>>
at the prompt i ran the following
\>>> import findspark
\>>> findspark.init()
\>>> data = [("Alice", 25), ("Bob", 30), ("Cathy", 29)]
\>>> columns = ["name", "age"]
\>>> df = spark.createDataFrame(data, columns)
\>>>
everything is fine upto this point.
now if i try to run either df.show() or df.count()
i get the py4j.protocol.Py4JJavaError
my environment variables are as follows
HADOOP_HOME=C:\winutils
JAVA_HOME=C:\Program Files\Java\jdk-17
PYSPARK_DRIVER_PYTHON=python
PYSPARK_PYTHON=pythonC:\Program Files\Python39\Scripts\
my path variable has the following entries
C:\Program Files\Python39\Scripts\
C:\Program Files\Python39\
C:\winutils\bin
C:\Program Files\Java\jdk-17\bin
any help will be appreciated