How do I convert this nested json to CSV using Python and Panda?

Question

So I am fairly new to python and panda, and I am trying to convert these json files to CSV

The json file I have managed to flatten:

        "empnum1244": {
        "user_name": "keane@a",
        "name": "Keane",
        "flag": true,
        "list": x
    },
    "empnum1255": {
        "user_name": "julia@a",
        "name": "Julia",
        "flag": true,
        "list": x
    },

I have no trouble converting these through reset index:

df = pd.read_json ('dups3.json',orient='index')
df.reset_index(level=0,inplace=True)

The double nested json I am having trouble with:

{    "[email protected]": {
        "employeenumber5566": {
            "user_name": "1234sysidAaron",
            "name": "Aaron",
            "flag": true,
            "list": x
        },
        "employeenumber6677": {
            "user_name": "[email protected]",
            "name": "Aaron",
            "flag": true,
            "list": x
        }
    },
    "[email protected]": {
        "employeenumber890": {
            "user_name": "144sysidAhmish",
            "name": "Ahmish",
            "flag": true,
            "list": x
        },
        "employeenumber23457": {
            "user_name": "[email protected]",
            "name": "ahmish",
            "flag": true,
            "list": x
        }
    }
    
}

How do I flatten these out with 2 level of indexes? My desired output is:

Email            |   EmpID             |       User_name     |      Name       |   flag    | list
[email protected]   employeenumber890     144sysidAhmish          Ahmish           True         x
[email protected]   employeenumber23457   [email protected]        ahmish           True         x

Shubham Sharma · Accepted Answer · 2021-11-17 03:42:17Z

1

We can use list comprehension to flatten the nested data

import json

d  = json.loads(data)
df = pd.DataFrame([{'Email': email, 'EmpID': empid, **y}
                   for email, v in d.items() for empid, y in v.items()])

print(df)

              Email                EmpID         user_name    name  flag list
0   [email protected]   employeenumber5566    1234sysidAaron   Aaron  True    x
1   [email protected]   employeenumber6677   [email protected]   Aaron  True    x
2  [email protected]    employeenumber890    144sysidAhmish  Ahmish  True    x
3  [email protected]  employeenumber23457  [email protected]  ahmish  True    x

answered Nov 17, 2021 at 3:42

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hightide_fatwaves Over a year ago

Hey really appreciate the input! I have some questions in line 4, df = pd.DataFrame([{'Email': email, 'EmpID': empid, **y} , I suppose we are labeling email to "Email" and empid to "EmpID", but how are we doing that? The json file doesn't have a column name for "[email protected]:", "employeenumber5566": and "employeenumber6677": , they are all unique values for each of those json object

Shubham Sharma Over a year ago

We are not labeling email to Email, here email is a variable name whose value is obtained by iterating over the key-value pairs for the top level dictionary. So if you consider first dictionary the key would be "[email protected]" and therefore the value of email would be the same as that key.

Collectives™ on Stack Overflow

How do I convert this nested json to CSV using Python and Panda?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related