5

I have been able to load this MongoDB database before, but am now receiving an error I haven't been able to figure out.

Here is how I start my Spark session:

spark = SparkSession.builder \
        .master("local[*]") \
        .appName("collab_rec") \
        .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/example.collection") \
        .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/example.collection") \
        .getOrCreate()

I run this script so that I can interact with spark through ipython which loads the mongo spark connector package:

#!/bin/bash
export PYSPARK_DRIVER_PYTHON=ipython

${SPARK_HOME}/bin/pyspark \
--master local[4] \
--executor-memory 1G \
--driver-memory 1G \
--conf spark.sql.warehouse.dir="file:///tmp/spark-warehouse" \
--packages com.databricks:spark-csv_2.11:1.5.0 \
--packages com.amazonaws:aws-java-sdk-pom:1.10.34 \
--packages org.apache.hadoop:hadoop-aws:2.7.3 \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0\

Spark loads fine and it appears the package is loading correctly as well.

Here is how I attempt to load that database into a dataframe:

df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()

However, on that line, I receive the following error:

Py4JJavaError: An error occurred while calling o46.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonTypeOfTwo()Lscala/Function2;
    at com.mongodb.spark.sql.MongoInferSchema$.com$mongodb$spark$sql$MongoInferSchema$$compatibleType(MongoInferSchema.scala:132)
    at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)
    at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)

From what I can see through the following documentation/tutorial I am attempting to load the dataframe correctly:

https://docs.mongodb.com/spark-connector/master/python-api/

I am using Spark 2.2.0. Note that I have been able to replicate this error on both my Mac and Linux through AWS.

2 Answers 2

6

I figured out the answer to my question. This was a compatibility issue with the Mongo-Spark connector and the version of Spark that I upgraded to. Specifically, the findTightestCommonTypeOfTwo value was renamed in the PR:

https://github.com/apache/spark/pull/16786/files

For Spark 2.2.0 the compatible Mongo-Spark connector is also 2.2.0, thus in my example, the package would be loaded like this:

--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0\

This could change in the future so when using the connector, you should check for compatibility with the version of Spark being used.

Sign up to request clarification or add additional context in comments.

Comments

0

The mongo-spark-connector JAR file needs to be compatible with the version of Spark being used.

Different versions of the JAR file can be downloaded here. And the XML file on the download page indicates the required dependencies for

  • Apache-Spark
  • Scala
  • and other dependency packages

Make sure this matches what is installed in the local environment.

To check which version of Spark is installed, run spark-submit after having added Spark in the path.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.