2

I'm writing a UDF in spark SQL and I'm wondering whether there is a place I can read documentation about exactly what is versus what isn't possible in this regard? Or a tutorial? I'm using SQLContext, not HiveContext.

The examples I've seen typically involve passing in a string, transforming it, and then outputting some transformed string of other object which I've managed to do successfully. But what if one wanted to pass in an input that was really some kind of Spark SQL Row object, i.e., or a List of Row objects, each of which have fields with key-value pairs, etc. In my case I'm passing in a List of Row objects by telling the UDF the input is List[Map[String, Any]]. I think the issue is partly that its really some kind of GenericRowWithSchema object not a List or Array.

Also, I noticed the LATERAL VIEW with explode option. I think this would in theory work for my case, but it didn't work for me. I think it may be because I am not using a HiveContext but I can't change that.

2
  • I have never used HiveContext, and I have no issues using explode to create a LATERAL VIEW. Can you post the actual code you are trying? Commented May 28, 2015 at 17:40
  • the question is what are you trying to achieve. Your question is little vague to understand (and answer) Commented May 29, 2015 at 1:26

1 Answer 1

6

What I got from question is first you want to read a row in UDF

Define the UDF

def compare(r:Row) = {r.get(0)==r.get(1)} 

Register the UDF

sqlContext.udf.register("compare", compare _)

Create the dataFrame

val TestDoc = sqlContext.createDataFrame(Seq(("sachin", "sachin"), ("aggarwal", "aggarwal1"))).toDF("text", "text2")

Use the UDF

scala> TestDoc.select($"text", $"text2",callUdf("compare",struct($"text",$"text2")).as("comparedOutput")).show

Result:

+--------+---------+--------------+
|    text|    text2|comparedOutput|
+--------+---------+--------------+
|  sachin|   sachin|          true|
|aggarwal|aggarwal1|         false|
+--------+---------+--------------+

Second question is about LATERAL VIEW with explode option, better to use HiveContext

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.