1

I created a set of algorithms and helpers in Scala for Spark working with different formats of measured data. They are all based on Hadoop's FileInputFormat. I also created some helpers to ease working with time series data from a Cassandra database. I now need some advanced functions which are already present in Thunder, plus some of my colleagues who are to work with these helper functions want to use Python. Is it somehow possible to use these helper functions from python or do I have to reimplement them?

I read through a lot of docs and only found that you can load extra jars with pyspark, but not how to use the functions.

4
  • It is actually possible. Commented Feb 24, 2016 at 13:12
  • @eliasah It depends, doesn't it? You can trigger high level transformations but it is not possible to the same thing from the worker. Commented Feb 24, 2016 at 13:19
  • That's true ! I was thinking of the other way around like what I did here Commented Feb 24, 2016 at 13:26
  • So, if I created the "sc.coolMeasuringDataFile" via an implicit class, can I use that from pyspark and if yes, how do I do that? Commented Feb 24, 2016 at 14:05

1 Answer 1

3

"By accident" I found the solution: It is the "Java Gateway". This is not documented in the Spark documentation (at least I didn't find it).

Here is how it works, using a "GregorianCalendar" as an example

j = sc._gateway.jvm
cal = j.java.util.GregorianCalendar()
print cal.getTimeInMillis()

However, passing the SparkContext does not work directly. The Java SparkContext is in the _jsc field:

ref = j.java.util.concurrent.atomic.AtomicReference()
ref.set(sc)

this fails. However:

ref = j.java.util.concurrent.atomic.AtomicReference()
ref.set(sc._jsc)

works.

However note that sc._jsc returns a Java-based Spark Context, i.e., a JavaSparkContext. To get the original Scala SparkContext, you have to use:

sc._jsc.sc()
Sign up to request clarification or add additional context in comments.

1 Comment

Good one ! Nevertheless it isn't documented in Spark because it's not Spark related by rather Java/Python interoperability related

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.