1

I have a simple pyspark code but i can't run it. I try to run it on Ubuntu system and I use PyCharm IDE. I would like to connect to Oracle XE Database and I want to print my test table.

Here comes my spark python code:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()

sqlContext = SQLContext(sc)

demoDf = sqlContext.read.format("jdbc").options(
    url="jdbc:oracle:thin:@10.10.10.10:1521:XE",
    driver="oracle.jdbc.driver.OracleDriver",
    table="tst_table",
    user="xxx",
    password="xxx").load()

demoDf.show()

And this is my trace:

Traceback (most recent call last):
  File "/home/kebodev/PycharmProjects/spark_tst/cucc_spark.py", line 13, in <module>
    password="xxx").load()
  File "/home/kebodev/spark-2.0.1/python/pyspark/sql/readwriter.py", line 153, in load
    return self._df(self._jreader.load())
  File "/home/kebodev/spark-2.0.1/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/home/kebodev/spark-2.0.1/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/kebodev/spark-2.0.1/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.load.
: java.lang.RuntimeException: Option 'dbtable' not specified
    at scala.sys.package$.error(package.scala:27)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$2.apply(JDBCOptions.scala:30)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$2.apply(JDBCOptions.scala:30)
    at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.getOrElse(ddl.scala:117)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:30)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:33)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)


Process finished with exit code 1

Can Anybody help me?

2 Answers 2

1

Change to dbtable from table like this,

demoDf = sqlContext.read.format("jdbc").options(
    url="jdbc:oracle:thin:@10.10.10.10:1521:XE",
    driver="oracle.jdbc.driver.OracleDriver",
    dbtable="tst_table",
    user="xxx",
    password="xxx").load()
Sign up to request clarification or add additional context in comments.

5 Comments

ohh thank you.. now i get : java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver I'm going to apply your answer, but can you help me? where to put oracle driver? :) thank you!
you have to add the oracle jdbc driver to your project
stackoverflow.com/a/33831421/5673997 @solarenqu can you check this answer ?
thank you i read that, and i added into my spark-defaults.conf this line: spark.driver.extraClassPath /Users/gabor_dev/Documents/ojdbc/ojdbc6.jar but I still get this error.
if i run this way: sh spark-submit --jars /Users/gabor_dev/Documents/ojdbc/ojdbc6.jar /Users/gabor_dev/PycharmProjects/spark_new_test/load.py this way its working.. :) but no works with pycharm :(
0

Try something like this:

def testQuery(query):
        df = sqlContext.read.format("jdbc").options(
             url="jdbc:oracle:thin:@10.10.10.10:1521:XE",
             driver="oracle.jdbc.driver.OracleDriver",
             dbtable="( " + query + " ) as temp",
             user="xxx",
             password="xxx").load()
        return df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.