I am trying to write and UDF that would work in Dataframes in Spark SQL.
Here is the code
def Timeformat (timecol1: Int) = {
if (timecol1 >= 1440)
("%02d:%02d".format((timecol1-1440)/60, (timecol1-1440)%60))
else
("%02d:%02d".format((timecol1)/60, (timecol1)%60))
}
sqlContext.udf.register("Timeformat", Timeformat _)
This method works perfectly for the sqlcontext
val dataa = sqlContext.sql("""select Timeformat(abc.time_band) from abc""")
Using DF - Gets an error
val fcstdataa = abc.select(Timeformat(abc("time_band_start")))
This method throws an type mismatch error.
<console>:41: error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
When i have re-written the UDF as below, works perfect for the DF but doesnot work in the Sqlcontext. Is there any way to solve this issue without creating multiple UDF's to do the same thing
val Timeformat = udf((timecol1: Int) =>
if (timecol1 >= 1440)
("%02d:%02d".format((timecol1-1440)/60, (timecol1-1440)%60))
else
("%02d:%02d".format((timecol1)/60, (timecol1)%60))
)
I am pretty new to scala and spark, What is the difference between two declarations. Is one method better than other ?