I am using below Python code for reading data from "MongoDB" in Spark and converting it DataFrame:
from pyspark.sql import SparkSession
# Initialize a Spark session
spark = SparkSession.builder \
.appName("MongoDB Spark Connector Example") \
.config("spark.mongodb.read.connection.uri", "mongodb://localhost:23017/") \
.config("spark.mongodb.read.database", "db_name") \
.config("spark.mongodb.read.collection", "coll_name") \
.config("spark.sql.debug.maxToStringFields", 1000) \
.getOrCreate()
df = spark.read.format("mongodb").load()
df.createOrReplaceTempView("temp")
sqlDf = spark.sql("SELECT id from temp")
sqlDf.show()
I am using:
- Spark Version: 3.2.4
- Mongo DB Version: 6
- Scala Version: 2.12.15
- Java 1.8
- Python 3.8
- Ubuntu 20.0.4
- mongo-spark-connector_2.12:10.2.0
I am using below command to run above code
spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.12:10.2.0 test.py
I am getting the below error:
py4j.protocol.Py4JJavaError: An error occurred while calling o67.showString.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.toAttributes()Lscala/collection/immutable/Seq;
at com.mongodb.spark.sql.connector.schema.InternalRowToRowFunction.<init>(InternalRowToRowFunction.java:46)
at com.mongodb.spark.sql.connector.schema.RowToBsonDocumentConverter.<init>(RowToBsonDocumentConverter.java:84)
at com.mongodb.spark.sql.connector.read.MongoScanBuilder.<clinit>(MongoScanBuilder.java:72)
at com.mongodb.spark.sql.connector.MongoTable.newScanBuilder(MongoTable.java:121)