Performance issues while writing delta table to Azure-SQL database using databricks

Question

I am trying to load roughly 20 million records from the Delta table in Databricks to the Azure SQL database using the Apache Spark connector: SQL Server & Azure SQL supporting Python API and Spark 3.0.

Below is the code which I am using. Do you think I am missing something here? The same code executes fine if I am using the write format as jdbc.

df.write \
  .format("com.microsoft.sqlserver.jdbc.spark") \
  .mode("overwrite") \  
  .option("truncate", "true") \
  .option("url", url) \
  .option("dbtable", Tablenamewithschema) \
  .option("user", user) \
  .option("password", password) \
  .option("reliabilityLevel", "BEST_EFFORT") \
  .option("tableLock", "True") \
  .option("isolationLevel", "True") \
  .option("batchsize", "100000") \
  .option("schemaCheckEnabled", "false") \
  .save()

I am getting below error if using the mentioned connector.

    Error while Loading the data into a table -
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 2.0 failed 4 times, most recent failure: Lost task 28.3 in stage 2.0 (TID 54):
    com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

I have created a new cluster with the below configuration and only one library is installed in that cluster which is from maven coordinate - com.microsoft.azure:spark-mssql-connector_2.12:1.2.0

8 Worker nodes with 14GB memory and 4 cores.
Databricks Runtime Version - 9.1 LTS (includes Apache Spark 3.1.2, scala 2.12).

Is there something else I can do to improve the performance as for around 100 million record old jdbc driver is taking around 1 hour.

Saurabh Chakraborty · Accepted Answer · 2023-08-21 15:37:55Z

1

The issue is because of the datetime datatype. SQL Server Datetime data type only allows 3 digits fractional seconds while spark dataframe might have more digits. Using datetime2 for SQL column instead of datetime resolved the issue.

Reference

answered Aug 21, 2023 at 15:37

Saurabh Chakraborty

1212 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Performance issues while writing delta table to Azure-SQL database using databricks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related