I am trying to create a DataFrame using RDD.
First I am creating a RDD using below code -
val account = sc.parallelize(Seq(
(1, null, 2,"F"),
(2, 2, 4, "F"),
(3, 3, 6, "N"),
(4,null,8,"F")))
It is working fine -
account: org.apache.spark.rdd.RDD[(Int, Any, Int, String)] = ParallelCollectionRDD[0] at parallelize at :27
but when try to create DataFrame from the RDD using below code
account.toDF("ACCT_ID", "M_CD", "C_CD","IND")
I am getting below error
java.lang.UnsupportedOperationException: Schema for type Any is not supported
I analyzed that whenever I put null value in Seq then only I got the error.
Is there any way to add null value?
(1, null: Integer, 2,"F")