I tried different ways to get this done, but unable to do so. Please help here. I am trying to explode a nested json list using python pd.json_normalize. After exploding the output is coming in two rows causing duplicate PKeys. My data is below and i am trying to get a single record with "id" as pkey and values in the nested list
{
"data": [
{
"id": "123456",
"created_by": "[email protected]",
"created_at": "2023-12-11",
"modified_by": "[email protected]",
"modified_at": "2023-12-11",
"rg": {
"id": "xyz234",
"parent_id": "cpqrst",
"account_id": "ntb1we"
},
"values": [
{
"value1": 100,
"value2": 101,
"value10": {
"valueid": "btqns"
}
},
{
"value1": 200,
"value2": 201,
"value10": {
"valueid": "apqns"
}
}
],
"page_info": {
"page": 1,
"per_page": 100,
"total_pages": 286,
"total_count": 28580
}
}
]
}
my code as below to normalize and explode
response = requests.get(getURL, params={"page": page}, headers=headers)
if response.status_code == 200:
json_data = response.json()
target_schema = StructType([StructField(.................. StructField('version', LongType(), True)])
normalized_data = pd.json_normalize(json_data,["data"],"page_info"]).explode("values").reset_index(drop=True)
JSONdf = json.dumps(json.loads(normalized_data.to_json(orient="records")))
results.append(JSONdf)
###########################modified code that worked################################
data_df = json_normalize(json_data['data'],sep='_')
data_df.columns = [f"{col}_data" for col in data_df.columns]
df =
json_normalize(json_data['data'],'values',sep='_')
df.columns = [f"{col}_tran" for col in df.columns]
result_df = pd.concat([data_df,df],axis=1)
###############################################
This provides data in the format as below with id value 123456 repeating twice for values
Current output with id 123456 repeating in 2 rows for values.value1 and values.value2
desired output is to have a single row with id 123456 for values column having the entire list
Any help greatly appreciated in getting the desired output. I tried all possible ways but could not get the desired output.

