2

I am using pyspark first time. I am trying to pull data from RDS MySQL database using below code. I have referred to the following links pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver, https://www.supergloo.com/fieldnotes/spark-sql-mysql-python-example-jdbc/ and many more. But no luck.

from  pyspark.sql import SparkSession
from pyspark.sql import SQLContext

spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate()
sqlContext = SQLContext(spark)

hostname='abc.rds.amazonaws.com'
jdbcPort=3306
dbname='mydb'
username='user'
password='password'

jdbc_url = "jdbc:mysql://{0}:{1}/{2}".format(hostname, jdbcPort, dbname)


connectionProperties = {
  "user" : username,
  "password" : password
}

    df=spark.read.jdbc(url=jdbc_url, table='test', properties= connectionProperties)
    df.show()

But I am getting the below error:

    ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-16-319dff08eefb> in <module>()
     21 #pushdown_query = "(select * from gwdd_data) test"
     22 #df = spark.read.jdbc(url=url,table=pushdown_query, properties=connectionProperties)
---> 23 df=spark.read.jdbc(url=jdbc_url, table='test', properties= connectionProperties)
     24 df.limit(10).show()

~\opt\spark\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
    438             jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
    439             return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 440         return self._df(self._jreader.jdbc(url, table, jprop))
    441 
    442 

~\opt\spark\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~\opt\spark\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

~\opt\spark\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
    317                 raise Py4JJavaError(
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:
    321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o326.jdbc.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.hive.execution.HiveFileFormat not found
    at java.util.ServiceLoader.fail(ServiceLoader.java:239)
    at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
    at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
    at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:550)
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

I don't know how to resolve this error. I have checked that mysql-conenctor-java-5.1.45-bin.jar is present in the SPARK_HOME/jars.

And I tried the above code to implement using SQLCOntext, there also I am getting error.

Can anyone help me with this?

Thanks

1 Answer 1

0

Your error is a parameter error for the jdbc connection. The parameter dbtable doesn't exists. First take a look in the usage of the jdbc connector for spark

And after that you need to connect correctly, here is how you are going to do:

my_df = spark.read.jdbc(url=jdbc_url, table='gwdd_data', properties= connectionProperties)
my_df.limit(10).show()

This should work for you.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for correcting me. I implemented that but now I am getting some other error. I have updated question. Please have a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.