1

I have a simple function that takes some XML in a field, parses the values, and returns a list:

<data>
   <datas a="1" b="2" c="3">
   <datas a="2" b="3" c="2">
</data>

becomes a nested list [[1,2,3],[2,3,2]]

I've made this a udf, and I'm making this call on my dataframe:

myudf=udf(myparser)
df2=df1.withColumn("newDataColumn",myudf(df1["xmldatafield"]))

this works. Except that newDataColumn is type STRING instead of Array. So I can't use any of the sql Array functions on it to access or work with individual elements.

I've confirmed in python that the function is returning a List type.

Any idea what I'm doing wrong or how I could get this to be an array column type?

1 Answer 1

1

A friend of mine just told me, the solution is passing the datatype to the UDF function. Duh

Sign up to request clarification or add additional context in comments.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.