0

I have a json data which can be represented as the tree structure with each node has four attributes: name,id,child,parentid(pid) (for leaf node it has only three attributes: id,pid,name).

{'child': [{'id': '','child':[{'id': '','child':['name':'','id':'','pid':''], 'name': '', 'pid':''}], 'name': '', 'pid': ''}],'name':'','pid':'','id':''}

I want to convert it to a dataframe with three columns like:

    id, pid, name
1   .., ..., ....
2   .., ..., ....

With the data from all layers in three attributes (id,pid,name)

I have tried pandas.read_json with the default parameters but it seems that it cannot iterate the whole layers and the output is just like:

    id, pid, name, child
1   .., ..., ...., {'id':'','pid': '','name': '', 'child':[{...}]}
2   .., ..., ...., {'id':'','pid': '','name': '', 'child':[{...}]}

I am wondering whether there are some easy methods to solve this problem with or without pandas.

2
  • 1
    Try using the json_normalize() function or, depending on the complexity of your data, have a look at the flatten library (blog post). Commented May 23, 2017 at 8:45
  • Thank you for your reply. It seems that json_normalize() not work for me (maybe I set the wrong parameter) and flatten just returns too many columns. Commented May 23, 2017 at 9:05

1 Answer 1

1

I use a recursion to fulfill it and I have proved that it works on my data.

import json
import pandas as pd


def test_iterate(df):
    global total_data
    total_data = total_data.append(df[['id','pid','name']])
    try:
        df['child'].apply(lambda x:test_iterate(pd.DataFrame(x)))
    except Exception as inst:
        print(inst)
        pass

if __name__ == '__main__':
    total_data = pd.DataFrame()
    loaddata = json.load(open('test.json'))
    df = pd.DataFrame(loaddata)
    test_iterate(df)
    total_data.to_csv('test.csv',index=None)
Sign up to request clarification or add additional context in comments.

1 Comment

I had a similar problem and was able to get the output using your answer. For every children data frame if I want to add a column to have it's parentID, how do I do it? Kindly help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.