I would like to load data from csv to mySql as a batch. But I could see the tutorials/logic to insert the data from csv to hive database. Could anyone kindly help me to achieve the above integration in spark using scala?
2 Answers
There is a reason why those tutorials don't exist. This task is very straightforward. Here is minimal working example:
val dbStr = "jdbc:mysql://[host1][:port1][,[host2][:port2]]...[/[database]]"
spark
.read
.format("csv")
.option("header", "true")
.load("some/path/to/file.csv")
.write
.mode("overwrite")
.jdbc(dbStr, tablename, props)
Comments
Create the dataframe reading CSV using spark session and write using the method jdbc with mysql Connection properties
val url = "jdbc:mysql://[host][:port][/[database]]"
val table = "mytable"
val property = new Properties()
spark
.read
.csv("some/path/to/file.csv")
.write
.jdbc(url, table, property)
write("jdbc")on a dataset...df.writeinto a new source.jdbcis the format method. Give it your database options