0

I've the JSON source data file as like below and i'll need the Expected Results in a quite different format which is also shown below, is there a way i can achieve this using Spark Scala. Appreciate your help on this

JSON source data file

{
  "APP": [
    {
      "E": 1566799999225,
      "V": 44.0
    },
    {
      "E": 1566800002758,
      "V": 61.0
    }
  ],
  "ASP": [
    {
      "E": 1566800009446,
      "V": 23.399999618530273
    }
  ],
  "TT": 0,
  "TVD": [
    {
      "E": 1566799964040,
      "V": 50876515
    }
  ],
  "VIN": "FU74HZ501740XXXXX"
}

Expected Results:

enter image description here

JSON Schema:

|-- APP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ASP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ATO: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- MSG_TYPE: string (nullable = true)
|-- RPM: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- TT: long (nullable = true)
|-- TVD: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: long (nullable = true)
|-- VIN: string (nullable = true)
0

2 Answers 2

2

You can start by reading your json file:

  val inputDataFrame: DataFrame = sparkSession
    .read
      .option("multiline", true)
      .json(yourJsonPath)

Then you can create a simple rule to get APP, ASP, ATO, since it's the only fields in the input that have a struct datatype:

val inputDataFrameFields: Array[StructField] = inputDataFrame.schema.fields

  var snColumn = new Array[String](inputDataFrame.schema.length)

   for( x <- 0 to (inputDataFrame.schema.length -1)) {

    if(inputDataFrameFields.apply(x).dataType.isInstanceOf[ArrayType] && !inputDataFrameFields.apply(x).name.isEmpty) {
     snColumn(x) = inputDataFrameFields.apply(x).name
    }
  }

Then you create your empty dataframe as follow and populate it:

  val outputSchema = StructType(
    List(
      StructField("VIN", StringType, true),
      StructField(
        "EVENTS",
        ArrayType(
          StructType(Array(
            StructField("SN", StringType, true),
            StructField("E", IntegerType, true),
            StructField("V", DoubleType, true)
          )))),
      StructField("TT", StringType, true)
    )
  )

  val outputDataFrame = sparkSession.createDataFrame(sparkSession.sparkContext.emptyRDD[Row], outputSchema)

Then you need to create some udfs to parse your input and do the correct mapping.

Hope this helps

Sign up to request clarification or add additional context in comments.

Comments

-1

Here is a solution to parse a json to a spark dataframe adapted to your data :

    val input = "{\"APP\":[{\"E\":1566799999225,\"V\":44.0},{\"E\":1566800002758,\"V\":61.0}],\"ASP\":[{\"E\":1566800009446,\"V\":23.399999618530273}],\"TT\":0,\"TVD\":[{\"E\":1566799964040,\"V\":50876515}],\"VIN\":\"FU74HZ501740XXXXX\"}"

    import sparkSession.implicits._

    val outputDataFrame = sparkSession.read.option("multiline", true).option("mode","PERMISSIVE")
      .json(Seq(input).toDS)
        .withColumn("APP", explode(col("APP")))
      .withColumn("ASP", explode(col("ASP")))
      .withColumn("TVD", explode(col("TVD")))
        .select(
          col("VIN"),col("TT"),
          col("APP").getItem("E").as("APP_E"),
          col("APP").getItem("V").as("APP_V"),
          col("ASP").getItem("E").as("ASP_E"),
          col("ASP").getItem("V").as("ASP_E"),
          col("TVD").getItem("E").as("TVD_E"),
          col("TVD").getItem("V").as("TVD_E")
        )

    outputDataFrame.show(truncate = false)

    /*
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|VIN              |TT |APP_E        |APP_V|ASP_E        |ASP_E             |TVD_E        |TVD_E   |
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|FU74HZ501740XXXXX|0  |1566799999225|44.0 |1566800009446|23.399999618530273|1566799964040|50876515|
|FU74HZ501740XXXXX|0  |1566800002758|61.0 |1566800009446|23.399999618530273|1566799964040|50876515|
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
     */

1 Comment

Hello @SimbaPK, i no need data in structured format. I need the data in a JSON format as shown in the Expected Results. Any help would be much appreciated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.