I am beginner to Scala and wanted to learn about UDF in Spark Scala. I am going to use following example demonstrate my problem. I am using Spark Scala using Data Bricks.
Lets say i have following data frame,
val someDF = Seq(
(1, "bat"),
(4, "mouse"),
(3, "horse")
).toDF("number", "word")
someDF.show()
+------+-----+
|number| word|
+------+-----+
| 1| bat|
| 4|mouse|
| 3|horse|
+------+-----+
I need to create a function to calculate a new column by doing some operations to the number column.
For an example i created this function to calculate 25/(number+1) as follows and it worked.
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.functions._
val caldf = udf { (df: Double) => (25/(df+1)) }
someDF.select($"number", $"word", caldf(col("number")) as "newc").show()
+------+-----+----+
|number| word|newc|
+------+-----+----+
| 1| bat|12.5|
| 4|mouse| 5.0|
| 3|horse|6.25|
+------+-----+----+
But when i tried this with the log operator, it didn't work
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.functions._
val caldf = udf { (df: Double) => log(25/(df+1)) }
command-3140852555505238:3: error: overloaded method value log with alternatives:
(columnName: String)org.apache.spark.sql.Column <and>
(e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
cannot be applied to (Double)
val caldf = udf { (df: Double) => log(25/(df+1)) }
^
Can anyone help me to figure out what may be the reason ? Thank you .