How to convert nested JSON to map object in scala

Question

I have the following JSON objects:

{
    "user_id": "123",
    "data": {
        "city": "New York"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}
{
    "user_id": "123",
    "data": {
        "name": "some_name",
        "age": "23",
        "occupation": "teacher"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}

I'm using val df = sqlContext.read.json("json") to read the file to dataframe

Which combines all data attributes into data struct like so:

root
 |-- data: struct (nullable = true)
 |    |-- age: string (nullable = true)
 |    |-- city: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- occupation: string (nullable = true)
 |-- session_id: string (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- user_id: string (nullable = true)

Is it possible to transform data field to MAP[String, String] Data type? And so it only has the same attributes as original json?

Hi! Transforming Spark Dataframe Row into Scala Map is not a straightforward task. I can help you with it but you must specify more details about your use case. What do you want to do with the Map objects? What kind of operation do you want to perform with the nested data? — Álvaro Valencia
– Álvaro Valencia, Commented Jul 15, 2019 at 19:47
Hi @ÁlvaroValencia, I'm looking to generate parquet files from json. I'm using Athena on AWS and need to match the table format to make the data queryable. Thank you — stepandel
– stepandel, Commented Jul 15, 2019 at 20:14

abiratsis · Accepted Answer · 2019-07-16 12:05:06Z

7

Yes you can achieve that by exporting a Map[String, String] from the JSON data as shown next:

import org.apache.spark.sql.types.{MapType, StringType}
import org.apache.spark.sql.functions.{to_json, from_json}

val jsonStr = """{
    "user_id": "123",
    "data": {
        "name": "some_name",
        "age": "23",
        "occupation": "teacher"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}"""

val df = spark.read.json(Seq(jsonStr).toDS)

val mappingSchema = MapType(StringType, StringType)

df.select(from_json(to_json($"data"), mappingSchema).as("map_data"))

//Output
// +-----------------------------------------------------+
// |map_data                                             |
// +-----------------------------------------------------+
// |[age -> 23, name -> some_name, occupation -> teacher]|
// +-----------------------------------------------------+

First we extract the content of the data field into a string with to_json($"data"), then we parse and extract the Map with from_json(to_json($"data"), schema).

edited Jul 16, 2019 at 12:05

answered Jul 16, 2019 at 11:14

abiratsis

7,3414 gold badges31 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

stepandel Over a year ago

This works! Just need to append the column to df and work with one json at the time. Thank you

abiratsis Over a year ago

Yes exactly @stepandel

abiratsis · Accepted Answer · 2019-07-16 11:15:51Z

1

Not sure what you mean to convert it to a Map of (String, String), But see if below can help.

val dataDF = spark.read.option("multiline","true").json("madhu/user.json").select("data").toDF

dataDF
.withColumn("age", $"data"("age")).withColumn("city", $"data"("city"))
.withColumn("name", $"data"("name"))
.withColumn("occupation", $"data"("occupation"))
.drop("data")
.show

edited Jul 16, 2019 at 11:15

abiratsis

7,3414 gold badges31 silver badges49 bronze badges

answered Jul 16, 2019 at 9:37

msrv499

3391 silver badge5 bronze badges

Collectives™ on Stack Overflow

How to convert nested JSON to map object in scala

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related