0

I would like to load data from csv to mySql as a batch. But I could see the tutorials/logic to insert the data from csv to hive database. Could anyone kindly help me to achieve the above integration in spark using scala?

5
  • What problems have you had doing this? Are you able to make a JDBC connection to mysql? Then you can write("jdbc") on a dataset... Commented Oct 27, 2017 at 5:27
  • Duplicate stackoverflow.com/questions/36169319/… Commented Oct 27, 2017 at 5:29
  • And you can find much documentation... docs.databricks.com/spark/latest/data-sources/… Commented Oct 27, 2017 at 5:30
  • @cricket_007 Now I am able to have entire data from csv as dataFrame. But I am bit confused to load the same dataFrame to insert it into a mysql database. Commented Oct 27, 2017 at 5:42
  • 1
    As shown twice. You df.write into a new source. jdbc is the format method. Give it your database options Commented Oct 27, 2017 at 5:43

2 Answers 2

6

There is a reason why those tutorials don't exist. This task is very straightforward. Here is minimal working example:

val dbStr = "jdbc:mysql://[host1][:port1][,[host2][:port2]]...[/[database]]"

spark
  .read
    .format("csv")
    .option("header", "true")
    .load("some/path/to/file.csv")
  .write
    .mode("overwrite")
    .jdbc(dbStr, tablename, props)
Sign up to request clarification or add additional context in comments.

Comments

1

Create the dataframe reading CSV using spark session and write using the method jdbc with mysql Connection properties

val url = "jdbc:mysql://[host][:port][/[database]]"
val table = "mytable"
val property = new Properties()

spark
  .read
    .csv("some/path/to/file.csv")
  .write
    .jdbc(url, table, property)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.