Read and write to/from SQL databases with Apache Spark

Question

I would like to understand why when working with Apache Spark we don't explicitly close JDBC connections.

See: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector or https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Is this due to the fact, that when we do

val collection = sqlContext.read.sqlDB(config)

or

jdbcDF.write
  .format("jdbc")
   (...)
  .save()

we don't really open the connection but merely specify a DAG stage? And then under the hood Spark establishes the connection and closes it?

mazaneicha · Accepted Answer · 2020-03-02 16:24:01Z

1

That's correct, Spark takes care of opening/closing JDBC connections to relational data sources during plan execution phase. This allows it to maintain level of abstraction required to support a multitude of various DataSource types. You can check the source code of JdbcRelationProvider (for read) or JdbcUtils (for save) to review that logic.

answered Mar 2, 2020 at 16:24

mazaneicha

9,5604 gold badges38 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Read and write to/from SQL databases with Apache Spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related