0

I tried different ways to get this done, but unable to do so. Please help here. I am trying to explode a nested json list using python pd.json_normalize. After exploding the output is coming in two rows causing duplicate PKeys. My data is below and i am trying to get a single record with "id" as pkey and values in the nested list

{
"data": [
    {
        "id": "123456",
        "created_by": "[email protected]",
        "created_at": "2023-12-11",
        "modified_by": "[email protected]",
        "modified_at": "2023-12-11",
        "rg": {
            "id": "xyz234",
            "parent_id": "cpqrst",
            "account_id": "ntb1we"
        },
        "values": [
            {
                "value1": 100,
                "value2": 101,
                "value10": {
                    "valueid": "btqns"
                }
            },
            {
                "value1": 200,
                "value2": 201,
                "value10": {
                    "valueid": "apqns"
                }
            }
        ],
        "page_info": {
            "page": 1,
            "per_page": 100,
            "total_pages": 286,
            "total_count": 28580
        }
    }
]

}

my code as below to normalize and explode

response = requests.get(getURL, params={"page": page}, headers=headers)
    if response.status_code == 200:
        json_data = response.json()
        target_schema  = StructType([StructField(.................. StructField('version', LongType(), True)])

        normalized_data = pd.json_normalize(json_data,["data"],"page_info"]).explode("values").reset_index(drop=True)
        
        JSONdf = json.dumps(json.loads(normalized_data.to_json(orient="records")))
        results.append(JSONdf)

###########################modified code that worked################################
       data_df = json_normalize(json_data['data'],sep='_')
        data_df.columns = [f"{col}_data" for col in data_df.columns]

        df = 
        json_normalize(json_data['data'],'values',sep='_')
        df.columns = [f"{col}_tran" for col in df.columns]
        
        result_df = pd.concat([data_df,df],axis=1)
         ###############################################

This provides data in the format as below with id value 123456 repeating twice for values

Current output with id 123456 repeating in 2 rows for values.value1 and values.value2

current_output

desired output is to have a single row with id 123456 for values column having the entire list

desired_output

Any help greatly appreciated in getting the desired output. I tried all possible ways but could not get the desired output.

10
  • Can you create valid json? Commented Dec 20, 2023 at 6:53
  • Apologize there is an error in the Json, i modified it. The rg ends and values start as a separete list. So values is the list inside the data with a unique id - 123456 Commented Dec 20, 2023 at 15:40
  • Unfortunately json is still invalid Commented Dec 21, 2023 at 6:47
  • Here you go, validated the json.. Its correct now Commented Dec 21, 2023 at 13:55
  • Super, tested my solution and working well Commented Dec 21, 2023 at 13:58

1 Answer 1

0

IIUC use:

out = pd.json_normalize(json_data["data"])

print(out)
       id     created_by  created_at    modified_by modified_at   rg.id  \
0  123456  [email protected]  2023-12-11  [email protected]  2023-12-11  xyz234   

  rg.parent_id rg.account_id  \
0       cpqrst        ntb1we   

                                           rg.values  page_info.page  \
0  [{'value1': 100, 'value2': 101, 'value10': {'v...               1   

   page_info.per_page  page_info.total_pages  page_info.total_count  
0                 100                    286                  28580  

If need separator _:

out = pd.json_normalize(json_data["data"], sep='_')

print(out)
       id     created_by  created_at    modified_by modified_at   rg_id  \
0  123456  [email protected]  2023-12-11  [email protected]  2023-12-11  xyz234   

  rg_parent_id rg_account_id  \
0       cpqrst        ntb1we   

                                           rg_values  page_info_page  \
0  [{'value1': 100, 'value2': 101, 'value10': {'v...               1   

   page_info_per_page  page_info_total_pages  page_info_total_count  
0                 100                    286                  28580  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.