How to write into Microsoft SQL Server table even if table exist using PySpark

Question

I have a PySpark Code which writes into SQL Server database like this

 df.write.jdbc(url=url, table="AdventureWorks2012.dbo.people", properties=properties)

However problem is that I want to keep writing in the table people even if the table exist and I see in the Spark Document that there are possible options error, append, overwrite and ignore for mode and all of them throws error, the object already exist if the table already exist in the database.

Spark throw following error py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc. com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'people' in the database

Is there way to write data into the table even if the table already exits ? Please let me know you need more explanation

RihardsPo · Accepted Answer · 2016-02-10 10:00:47Z

1

For me the issue was with Spark 1.5.2. The way it checks if the table exists (here) is by running SELECT 1 FROM $table LIMIT 1. If the query fails, the tables doesn't exist. That query failed even when the table was there.

This was changed to SELECT * FROM $table WHERE 1=0 in 1.6.0 (here).

answered Feb 10, 2016 at 10:00

RihardsPo

1512 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Holden · Accepted Answer · 2015-10-12 18:21:06Z

0

So append and overwrite mode will not throw an error when the table already exists. From the spark documentation ( http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes ) SaveMode.Append will "When saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data." and SaveMode.Overwrite will "Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame." Depending on how you want to handle the existing table one of these two should likely meet your needs.

answered Oct 12, 2015 at 18:21

Holden

7,4421 gold badge29 silver badges33 bronze badges

3 Comments

van Over a year ago

@ Holden I am loading data like this df.write.jdbc(url=url, table="AdventureWorks2012.dbo.people", mode="overwrite", properties=properties) Is there something wrong because it still give error of : com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'people' in the database ? I am using spark 1.5

Holden Over a year ago

In save mode overwrite Spark will drop the table if it exists, does the user running the query have permission to drop the table?

van Over a year ago

@ Holden Yes the user has permission to drop the table. I test the permission even by dropping the table by same username what I am using to connect..

Collectives™ on Stack Overflow

How to write into Microsoft SQL Server table even if table exist using PySpark

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related