2

I ran into a problem using spark dataset! I keep getting the exception about encoders when I want to use case class the code is a simple one below:

case class OrderDataType (orderId: String, customerId: String, orderDate: String)
import spark.implicits._

val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]

I get this exception during compile:

Unable to find encoder for type OrderDataType. An implicit Encoder[OrderDataType] is needed to store OrderDataType instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

I have already added this: import spark.implicits._ but it doesn't solve the problem!

According to spark and scala documentation, the encoding must be done implicitly with scala!

What is wrong with this code and what should I do to fix it!

0

2 Answers 2

2

Define your case class outside of main method then in main method read the csv file and convert to dataset.

Example:

case class OrderDataType (orderId: String, customerId: String, orderDate: String)
  def main(args: Array[String]): Unit = {
val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]
}

//or

def main(args: Array[String]): Unit = {
    val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[(String,String,String)]
    }
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, why I couldn't use case class inside a method?
I faced this but not totally understand it here are the some useful links.. stackoverflow.com/questions/36648128/… jaceklaskowski.gitbooks.io/mastering-spark-sql/…
1

Other way is ... you can use every thing inside object Orders extends App (intelligent enough to identify case class from out side of def main)

mydata/Orders.csv

orderId,customerId,orderDate
1,2,21/08/1977
1,2,21/08/1978

Example code :

package examples

import org.apache.log4j.Level
import org.apache.spark.sql._

object Orders extends App {
  val logger = org.apache.log4j.Logger.getLogger("org")
  logger.setLevel(Level.WARN)


  val spark = SparkSession.builder.appName(getClass.getName)
    .master("local[*]").getOrCreate


  case class OrderDataType(orderId: String, customerId: String, orderDate: String)

  import spark.implicits._

  val ds1 = spark.read.option("header", "true").csv("mydata/Orders.csv").as[OrderDataType]
  ds1.show
}

Result :

+-------+----------+----------+
|orderId|customerId| orderDate|
+-------+----------+----------+
|      1|         2|21/08/1977|
|      1|         2|21/08/1978|
+-------+----------+----------+

Why case class outside of def main ....

Seems like this is by design of the Encoder from annotation @implicitNotFound below

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.