Creating/Registering a PySpark UDF and apply it to one column

Question

I am just a little confused on how to create the spark udf. I have right now a function parse_xml and do the following:

spark.udf.register("parse_xml_udf", parse_xml)
parsed_df = xml_df.withColumn("parsed_xml", parse_xml_udf(xml_df["raw_xml"]))

where xml_df is the original spark df and raw_xml is the column I want to apply the function on.

I have seen a few places a line like spark_udf = udf(parse_xml, StringType()) -- what is the difference between this and the spark.udf.register line? Additionally, if I apply the function to that one column, is it applying it to each row? In other words, should my UDF be returning the output for one single row?

Ged · Accepted Answer · 2020-06-30 12:14:43Z

2

This spark.udf.register("squaredWithPython", squared) if you want to use with SQL like this: %sql select id, squaredWithPython(id) as id_squared from test
This squared_udf = udf(squared, LongType()) if just with data frame usage like this: display(df.select("id", squared_udf("id").alias("id_squared")))

That's all, but things not always clearly explained in the manuals.

answered Jun 30, 2020 at 12:14

Ged

18.5k8 gold badges53 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

formicaman Over a year ago

So if I want to use it like this: xml_df.withColumn('parsed_xml', parse_xml_udf(xml_df['raw_xml'])), I need to do it as udf(__)?

Ged Over a year ago

Yes, withColumn will apply to all rows unless you filter them.

Ged Over a year ago

Can do both, and see which works and learn...UDF should be enough, but try depends how you call, Spark has all sorts of quirks

Ged Over a year ago

yep from this distance

formicaman Over a year ago

Thanks for the help.

|

Collectives™ on Stack Overflow

Creating/Registering a PySpark UDF and apply it to one column

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related