0

My UDF function returns a json object array as string, how can I expand the array into dataframe rows?

If it isn't possible, is there any other way (like using Struct) to achieve this?

Here is my JSON data:

sample json
{
"items":[ {"Name":"test", Id:"1"}, {"Name":"sample", Id:"2"}]
}

And here is how I want it to end up like:

test, 1
sample, 2
1
  • Wait, so you want it to output the JSON in dataframe rows right? So are you outputting it on an HTML website where the data is handled by JavaScript/jQuery or where do you want to output it? Commented Nov 5, 2019 at 23:59

1 Answer 1

1

The idea is spark can read any paralellized collection hence we take the string and parallelize it and read as a dataset

Code =>

import org.apache.spark.sql.functions._

val sampleJsonStr = """
     | {
     | "items":[ {"Name":"test", "Id":"1"}, {"Name":"sample", "Id":"2"}]
     | }"""

val jsonDf = spark.read.option("multiLine","true").json(Seq(sampleJsonStr).toDS)
//jsonDf: org.apache.spark.sql.DataFrame = [items: array<struct<Id:string,Name:string>>]

// Finally we explode the json array
val explodedDf = jsonDf.
select("items").
withColumn("exploded_items",explode(col("items"))).
select(col("exploded_items.Id"),col("exploded_items.Name"))

Output =>

scala> explodedDf.show(false)
+---+------+
|Id |Name  |
+---+------+
|1  |test  |
|2  |sample|
+---+------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.