0

I am able to run simple Hello World program through Spark on standalone machine. But when I run a word count program using Spark Context and run it using pyspark I get the following error. ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist. I am on Mac OS X. I installed Spark through brew by brew install apache-spark. Any ideas now whats going wrong?

Using Spark's default log4j profile:

org/apache/spark/log4j-defaults.properties
16/07/19 23:18:45 INFO SparkContext: Running Spark version 1.6.2
16/07/19 23:18:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/19 23:18:45 INFO SecurityManager: Changing view acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: Changing modify acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tanyagupta); users with modify permissions: Set(tanyagupta)
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriver' on port 59226.
16/07/19 23:18:46 INFO Slf4jLogger: Slf4jLogger started
16/07/19 23:18:46 INFO Remoting: Starting remoting
16/07/19 23:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59227]
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 59227.
16/07/19 23:18:46 INFO SparkEnv: Registering MapOutputTracker
16/07/19 23:18:46 INFO SparkEnv: Registering BlockManagerMaster
16/07/19 23:18:46 INFO DiskBlockManager: Created local directory at /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/blockmgr-812de6f9-3e3d-4885-a7de-fc9c2e181c64
16/07/19 23:18:46 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/07/19 23:18:46 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/19 23:18:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/19 23:18:46 INFO SparkUI: Started SparkUI at http://192.168.0.5:4040
16/07/19 23:18:46 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file  file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO SparkUI: Stopped Spark web UI at http://192.168.0.5:4040
16/07/19 23:18:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/19 23:18:47 INFO MemoryStore: MemoryStore cleared
16/07/19 23:18:47 INFO BlockManager: BlockManager stopped
16/07/19 23:18:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/19 23:18:47 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/07/19 23:18:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/19 23:18:47 INFO SparkContext: Successfully stopped SparkContext

Traceback (most recent call last): 
File "/Users/tanyagupta/Documents/Internship/Zyudly Labs/Tanya-Programs/word_count.py", line 7, in <module>
sc=SparkContext(appName="WordCount_Tanya")
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file     file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)

16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/19 23:18:47 INFO ShutdownHookManager: Shutdown hook called
16/07/19 23:18:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/spark-f69e5dfc-6561-4677-9ec0-03594eabc991
1
  • see question which has exactly same error, according to which you have to add Tanya_programs directory to your PYTHONPATH variable. Commented Jul 20, 2016 at 9:26

2 Answers 2

1

Adding __init__.py file in my folder worked for me!

Thanks!

Sign up to request clarification or add additional context in comments.

1 Comment

can you please elobarate it how to do
0

This is seen because of the space in the path. I was able to resolve this after removing the space from the path. I hope it helps.

Remove the space - /Zyudly%20Labs/ and try

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.