How to convert a json tree data into dataframe in Python?

Question

I have a json data which can be represented as the tree structure with each node has four attributes: name,id,child,parentid(pid) (for leaf node it has only three attributes: id,pid,name).

{'child': [{'id': '','child':[{'id': '','child':['name':'','id':'','pid':''], 'name': '', 'pid':''}], 'name': '', 'pid': ''}],'name':'','pid':'','id':''}

I want to convert it to a dataframe with three columns like:

    id, pid, name
1   .., ..., ....
2   .., ..., ....

With the data from all layers in three attributes (id,pid,name)

I have tried pandas.read_json with the default parameters but it seems that it cannot iterate the whole layers and the output is just like:

    id, pid, name, child
1   .., ..., ...., {'id':'','pid': '','name': '', 'child':[{...}]}
2   .., ..., ...., {'id':'','pid': '','name': '', 'child':[{...}]}

I am wondering whether there are some easy methods to solve this problem with or without pandas.

Try using the json_normalize() function or, depending on the complexity of your data, have a look at the flatten library (blog post). — DocZerø
– DocZerø, Commented May 23, 2017 at 8:45
Thank you for your reply. It seems that json_normalize() not work for me (maybe I set the wrong parameter) and flatten just returns too many columns. — natsuapo
– natsuapo, Commented May 23, 2017 at 9:05

natsuapo · Accepted Answer · 2017-05-23 13:13:05Z

1

I use a recursion to fulfill it and I have proved that it works on my data.

import json
import pandas as pd


def test_iterate(df):
    global total_data
    total_data = total_data.append(df[['id','pid','name']])
    try:
        df['child'].apply(lambda x:test_iterate(pd.DataFrame(x)))
    except Exception as inst:
        print(inst)
        pass

if __name__ == '__main__':
    total_data = pd.DataFrame()
    loaddata = json.load(open('test.json'))
    df = pd.DataFrame(loaddata)
    test_iterate(df)
    total_data.to_csv('test.csv',index=None)

answered May 23, 2017 at 13:13

natsuapo

61310 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jennifer Therese Over a year ago

I had a similar problem and was able to get the output using your answer. For every children data frame if I want to add a column to have it's parentID, how do I do it? Kindly help

Collectives™ on Stack Overflow

How to convert a json tree data into dataframe in Python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related