0

I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.

gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms

The expected Output format should be in the following format

{
     "name": "A1BG",
     "children": [
      {"name": "Adenocarcinoma"},
      {"name": "apnea"},
      {"name": "Athritis"}
      ]
    },

{
     "name": "A2M",
     "children": [
      {"name": "Asthma"},
      {"name": "Astrocytoma"},
      {"name": "Diabetes"}
      ]
    },


{
     "name": "NAT1",
     "children": [
      {"name": "polyps"},
      {"name": "lymphoma"},
      {"name": "neoplasms"}
      ]
    }
   

The python code I have written is below. let me know where I need to change to get the desired output.

import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])

for key, value in grouped:

    dictionary = {}
    dictList = []
    anotherDict = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['name'] = j.at[0, 'gene']

    for i in j.index:    
        anotherDict['disease'] = j.at[i, 'disease']
        dictList.append(anotherDict)

    dictionary['children'] = dictList
    finalList.append(dictionary)

with open('outputresult3.json', "w") as out:
    json.dump(finalList,out)

2 Answers 2

1
import json

json_data = []

# group the data by each unique gene
for gene, data in df.groupby(["gene"]):

    # obtain a list of diseases for the current gene
    diseases = data["disease"].tolist()

    # create a new list of dictionaries to satisfy json requirements
    children = [{"name": disease} for disease in diseases]
    
    entry = {"name": gene, "children": children}
    json_data.append(entry)
    
with open('outputresult3.json', "w") as out:
    json.dump(json_data, out)
Sign up to request clarification or add additional context in comments.

Comments

1

Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:

L = (df.rename(columns={'disease':'name'})
       .groupby('gene')
       .apply(lambda x: x[['name']].to_dict('records'))
       .reset_index(name='children')
       .rename(columns={'gene':'name'})
       .to_dict('records')
       )
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
                               {'name': 'apnea'}, 
                               {'name': 'Athritis'}]}, 
 {'name': 'A2M', 'children': [{'name': 'Asthma'}, 
                              {'name': 'Astrocytoma'}, 
                              {'name': 'Diabetes'}]}, 
 {'name': 'NAT1', 'children': [{'name': 'polyps'},
                               {'name': 'lymphoma'}, 
                               {'name': 'neoplasms'}]}]

with open('outputresult3.json', "w") as out:
    json.dump(L,out)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.