Problem creating dataset in Spark and Scala

Question

I ran into a problem using spark dataset! I keep getting the exception about encoders when I want to use case class the code is a simple one below:

case class OrderDataType (orderId: String, customerId: String, orderDate: String)
import spark.implicits._

val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]

I get this exception during compile:

Unable to find encoder for type OrderDataType. An implicit Encoder[OrderDataType] is needed to store OrderDataType instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

I have already added this: import spark.implicits._ but it doesn't solve the problem!

According to spark and scala documentation, the encoding must be done implicitly with scala!

What is wrong with this code and what should I do to fix it!

notNull · Accepted Answer · 2020-05-15 03:21:21Z

2

Define your case class outside of main method then in main method read the csv file and convert to dataset.

Example:

case class OrderDataType (orderId: String, customerId: String, orderDate: String)
  def main(args: Array[String]): Unit = {
val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]
}

//or

def main(args: Array[String]): Unit = {
    val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[(String,String,String)]
    }

edited May 15, 2020 at 3:21

answered May 15, 2020 at 2:48

notNull

31.8k4 gold badges41 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sasan Ahmadi Over a year ago

thanks, why I couldn't use case class inside a method?

notNull Over a year ago

I faced this but not totally understand it here are the some useful links.. stackoverflow.com/questions/36648128/… jaceklaskowski.gitbooks.io/mastering-spark-sql/…

Ram Ghadiyaram · Accepted Answer · 2020-05-15 05:10:56Z

Other way is ... you can use every thing inside object Orders extends App (intelligent enough to identify case class from out side of def main)

mydata/Orders.csv

orderId,customerId,orderDate
1,2,21/08/1977
1,2,21/08/1978

Example code :

package examples

import org.apache.log4j.Level
import org.apache.spark.sql._

object Orders extends App {
  val logger = org.apache.log4j.Logger.getLogger("org")
  logger.setLevel(Level.WARN)


  val spark = SparkSession.builder.appName(getClass.getName)
    .master("local[*]").getOrCreate


  case class OrderDataType(orderId: String, customerId: String, orderDate: String)

  import spark.implicits._

  val ds1 = spark.read.option("header", "true").csv("mydata/Orders.csv").as[OrderDataType]
  ds1.show
}

Result :

+-------+----------+----------+
|orderId|customerId| orderDate|
+-------+----------+----------+
|      1|         2|21/08/1977|
|      1|         2|21/08/1978|
+-------+----------+----------+

Why case class outside of def main ....

Seems like this is by design of the Encoder from annotation @implicitNotFound below

Collectives™ on Stack Overflow

Problem creating dataset in Spark and Scala

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related