0

I'm using Azure Databricks and pyspark to process data using dataframes and I use Azure SQL Database to store the data after it's been processed. I have created the output tables using ordinary CREATE TABLE scripts in SQL, but I realized that the dataframe write method overwrites the table format. E.g. all the string columns become nvarchar(max). Is there any way to keep the table format as specified in the CREATE TABLE script?

Example of my write statement in pyspark:

df.write
  .mode("overwrite")
  .format("jdbc")
  .option("url", f"jdbc:sqlserver://myserver.database.windows.net;databaseName=mydatabase;")
  .option("dbtable", "mytable")
  .option("user", jdbcUsername)
  .option("password", jdbcPassword)
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .save()
2
  • Change the mode to append instead of overwrite. See here: Save modes Commented May 24, 2020 at 8:22
  • Thanks, but I want to replace the data already in the table. If I'm using append, is there some way to truncate the table first? Commented May 24, 2020 at 10:18

1 Answer 1

0

You can specify the schema using customSchema

Below is an example:

df.write
  .mode("overwrite")
  .format("jdbc")
  .option("url", f"jdbc:sqlserver://myserver.database.windows.net;databaseName=mydatabase;")
  .option("dbtable", "mytable")
  .option("user", jdbcUsername)
  .option("password", jdbcPassword)
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .option("customSchema", "your_str_col_here STRING")
  .save()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.