0

I have dataframe , wanted to convert into JSON ARRAY Please find the example below

Dataframe

+------------+--------------------+----------+----------------+------------------+--------------
|     Name|                  id|request_id|create_timestamp|deadline_timestamp|
+------------+--------------------+----------+----------------+------------------+--------------
|    Freeform|59bbe3ad-f487-44| htvjiwmfe|   1589155200000|   1591272659556
|         D23|59bbe3ad-f487-44| htvjiwmfe|   1589155200000|   1591272659556
|      Stores|59bbe3ad-f487-44| htvjiwmfe|   1589155200000|   1591272659556
|VacationClub|59bbe3ad-f487-44| htvjiwmfe|   1589155200000|   1591272659556

Wanted in Json Like below:


[
   {
      "testname":"xyz",
      "systemResponse":[
         {
            "name":"FGH",
            "id":"59bbe3ad-f487-44",
            "request_id":1590791280,
            "create_timestamp":1590799280

         },
         {
           "name":"FGH",
            "id":"59bbe3ad-f487-44",
            "request_id":1590791280,
            "create_timestamp":1590799280,
         }
      ]
   }
]

2
  • where testname column in dataframe ?? Commented Jun 4, 2020 at 14:55
  • NO , its added extra on fly Commented Jun 4, 2020 at 14:57

2 Answers 2

1
  • You can define 2 beans
  • Create Array from the 1st DF as Array of inner Beans
  • Define a parent bean with testname and requestDetailArray as Array

Please also find code inline comments

object DataToJsonArray {

  def main(args: Array[String]): Unit = {

    val spark = Constant.getSparkSess

    import spark.implicits._

    //Load you dataframe
    val requestDetailArray = List(
      ("Freeform", "59bbe3ad-f487-44", "htvjiwmfe", "1589155200000", "1591272659556"),
      ("D23", "59bbe3ad-f487-44", "htvjiwmfe", "1589155200000", "1591272659556"),
      ("Stores", "59bbe3ad-f487-44", "htvjiwmfe", "1589155200000", "1591272659556"),
      ("VacationClub", "59bbe3ad-f487-44", "htvjiwmfe", "1589155200000", "1591272659556")
    ).toDF
      //Map your Dataframe to RequestDetails bean
      .map(row => RequestDetails(row.getString(0), row.getString(1), row.getString(2), row.getString(3), row.getString(4)))
      //Collect it as Array
      .collect() 

    //Create another data frme with List[BaseClass] and set the (testname,Array[RequestDetails])
    List(BaseClass("xyz", requestDetailArray)).toDF()
      .write
      //Output your Dataframe as JSON
      .json("/json/output/path")
  }

}

case class RequestDetails(Name: String, id: String, request_id: String, create_timestamp: String, deadline_timestamp: String)

case class BaseClass(testname: String = "xyz", systemResponse: Array[RequestDetails])
Sign up to request clarification or add additional context in comments.

Comments

0

Check below code.

import org.apache.spark.sql.functions._

df.withColumn("systemResponse",
     array(
           struct("id","request_id","create_timestamp","deadline_timestamp").as("data")
         )
)
.select("systemResponse")
.toJSON
.select(col("value").as("json_data"))
.show(false)

+-----------------------------------------------------------------------------------------------------------------------------------------------+
|json_data                                                                                                                                      |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
|{"systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
+-----------------------------------------------------------------------------------------------------------------------------------------------+

Updated

scala> :paste
// Entering paste mode (ctrl-D to finish)

df.withColumn("systemResponse",
     array(
           struct("id","request_id","create_timestamp","deadline_timestamp").as("data")
         )
)
.withColumn("testname",lit("xyz"))
.select("testname","systemResponse")
.toJSON
.select(col("value").as("json_data"))
.show(false)

// Exiting paste mode, now interpreting.

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|json_data                                                                                                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"testname":"xyz","systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"testname":"xyz","systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"testname":"xyz","systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
|{"testname":"xyz","systemResponse":[{"id":"59bbe3ad-f487-44","request_id":"htvjiwmfe","create_timestamp":"1589155200000","deadline_timestamp":"1591272659556"}]}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

4 Comments

How can I add extra column which is parallel to systemResponse
do you want to add this "testname":"xyz" ?
Updated Code, Please check once
Perfect , Thank for your help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.