3

I have a PySpark Code which writes into SQL Server database like this

 df.write.jdbc(url=url, table="AdventureWorks2012.dbo.people", properties=properties)

However problem is that I want to keep writing in the table people even if the table exist and I see in the Spark Document that there are possible options error, append, overwrite and ignore for mode and all of them throws error, the object already exist if the table already exist in the database.

Spark throw following error py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc. com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'people' in the database

Is there way to write data into the table even if the table already exits ? Please let me know you need more explanation

2 Answers 2

1

For me the issue was with Spark 1.5.2. The way it checks if the table exists (here) is by running SELECT 1 FROM $table LIMIT 1. If the query fails, the tables doesn't exist. That query failed even when the table was there.

This was changed to SELECT * FROM $table WHERE 1=0 in 1.6.0 (here).

Sign up to request clarification or add additional context in comments.

Comments

0

So append and overwrite mode will not throw an error when the table already exists. From the spark documentation ( http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes ) SaveMode.Append will "When saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data." and SaveMode.Overwrite will "Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame." Depending on how you want to handle the existing table one of these two should likely meet your needs.

3 Comments

@ Holden I am loading data like this df.write.jdbc(url=url, table="AdventureWorks2012.dbo.people", mode="overwrite", properties=properties) Is there something wrong because it still give error of : com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'people' in the database ? I am using spark 1.5
In save mode overwrite Spark will drop the table if it exists, does the user running the query have permission to drop the table?
@ Holden Yes the user has permission to drop the table. I test the permission even by dropping the table by same username what I am using to connect..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.