I want to convert Array[org.apache.spark.sql.Row] to a DataFrame.
Could anyone suggest me a better way?
I tried to first convert it into RDD and then tried to convert it into Dataframe , but when I perform any operation on the DataFrame , exceptions are shown.
val arrayOfRows = myDataFrame.collect().map(t => myfun(t))
val distDataRDD = sc.parallelize(arrayOfRows)
val newDataframe = sqlContext.createDataFrame(distDataRDD,myschema)
Here myfun() is a function which returns Row (org.apache.spark.sql.Row).
The contents in the array is correct and I am able to print it without any problem.
But when I tried to count the records in the RDD, it gave me the count as well as a warning that one of the stage contains a task of very large size.I guess I am doing something wrong. Please help.