Converting CSV to Hierarchical JSON output

Question

I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.

gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms

The expected Output format should be in the following format

{
     "name": "A1BG",
     "children": [
      {"name": "Adenocarcinoma"},
      {"name": "apnea"},
      {"name": "Athritis"}
      ]
    },

{
     "name": "A2M",
     "children": [
      {"name": "Asthma"},
      {"name": "Astrocytoma"},
      {"name": "Diabetes"}
      ]
    },


{
     "name": "NAT1",
     "children": [
      {"name": "polyps"},
      {"name": "lymphoma"},
      {"name": "neoplasms"}
      ]
    }

The python code I have written is below. let me know where I need to change to get the desired output.

import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])

for key, value in grouped:

    dictionary = {}
    dictList = []
    anotherDict = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['name'] = j.at[0, 'gene']

    for i in j.index:    
        anotherDict['disease'] = j.at[i, 'disease']
        dictList.append(anotherDict)

    dictionary['children'] = dictList
    finalList.append(dictionary)

with open('outputresult3.json', "w") as out:
    json.dump(finalList,out)

Cameron Riddell · Accepted Answer · 2020-09-15 06:21:56Z

1

import json

json_data = []

# group the data by each unique gene
for gene, data in df.groupby(["gene"]):

    # obtain a list of diseases for the current gene
    diseases = data["disease"].tolist()

    # create a new list of dictionaries to satisfy json requirements
    children = [{"name": disease} for disease in diseases]
    
    entry = {"name": gene, "children": children}
    json_data.append(entry)
    
with open('outputresult3.json', "w") as out:
    json.dump(json_data, out)

answered Sep 15, 2020 at 6:21

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2020-09-15 06:47:53Z

Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:

L = (df.rename(columns={'disease':'name'})
       .groupby('gene')
       .apply(lambda x: x[['name']].to_dict('records'))
       .reset_index(name='children')
       .rename(columns={'gene':'name'})
       .to_dict('records')
       )
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
                               {'name': 'apnea'}, 
                               {'name': 'Athritis'}]}, 
 {'name': 'A2M', 'children': [{'name': 'Asthma'}, 
                              {'name': 'Astrocytoma'}, 
                              {'name': 'Diabetes'}]}, 
 {'name': 'NAT1', 'children': [{'name': 'polyps'},
                               {'name': 'lymphoma'}, 
                               {'name': 'neoplasms'}]}]

with open('outputresult3.json', "w") as out:
    json.dump(L,out)

Collectives™ on Stack Overflow

Converting CSV to Hierarchical JSON output

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related