3

I'm developing a web application retrieving data from data lake, the data is stored in HDFS and I want to use pyspark to perform some analysis. In other words we have a script within ipython notebook and we want to use it with Django. I see that pyspark is also available at pypi, so I installed it with pip and the same script is imported as .py file from notebook is running fine, when I run it as python myscript.py it works fine. Hence, it should also work fine if I import that script within Django. So, is it the correct method, or I will have to run spark-submit myscript.py? I want to use Spark in cluster mode.

4
  • Did you found the way to run it? I got stuck with the same problem. Commented Mar 16, 2018 at 7:10
  • @AshrithGande use findspark , github.com/minrk/findspark Commented Mar 16, 2018 at 9:10
  • @AshrithGande stackoverflow.com/a/34763240/2214674 Commented Mar 16, 2018 at 9:10
  • I'm using findspark, but I cannot load my model model = RandomForestRegressionModel.load('model/') what I that spark-submit you mentioned? Commented Jul 20, 2019 at 16:58

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.