2

So I am fairly new to python and panda, and I am trying to convert these json files to CSV

The json file I have managed to flatten:

        "empnum1244": {
        "user_name": "keane@a",
        "name": "Keane",
        "flag": true,
        "list": x
    },
    "empnum1255": {
        "user_name": "julia@a",
        "name": "Julia",
        "flag": true,
        "list": x
    },

I have no trouble converting these through reset index:

df = pd.read_json ('dups3.json',orient='index')
df.reset_index(level=0,inplace=True)

The double nested json I am having trouble with:

{    "[email protected]": {
        "employeenumber5566": {
            "user_name": "1234sysidAaron",
            "name": "Aaron",
            "flag": true,
            "list": x
        },
        "employeenumber6677": {
            "user_name": "[email protected]",
            "name": "Aaron",
            "flag": true,
            "list": x
        }
    },
    "[email protected]": {
        "employeenumber890": {
            "user_name": "144sysidAhmish",
            "name": "Ahmish",
            "flag": true,
            "list": x
        },
        "employeenumber23457": {
            "user_name": "[email protected]",
            "name": "ahmish",
            "flag": true,
            "list": x
        }
    }
    
}

How do I flatten these out with 2 level of indexes? My desired output is:

Email            |   EmpID             |       User_name     |      Name       |   flag    | list
[email protected]   employeenumber890     144sysidAhmish          Ahmish           True         x
[email protected]   employeenumber23457   [email protected]        ahmish           True         x

1 Answer 1

1

We can use list comprehension to flatten the nested data

import json

d  = json.loads(data)
df = pd.DataFrame([{'Email': email, 'EmpID': empid, **y}
                   for email, v in d.items() for empid, y in v.items()])

print(df)

              Email                EmpID         user_name    name  flag list
0   [email protected]   employeenumber5566    1234sysidAaron   Aaron  True    x
1   [email protected]   employeenumber6677   [email protected]   Aaron  True    x
2  [email protected]    employeenumber890    144sysidAhmish  Ahmish  True    x
3  [email protected]  employeenumber23457  [email protected]  ahmish  True    x
Sign up to request clarification or add additional context in comments.

2 Comments

Hey really appreciate the input! I have some questions in line 4, df = pd.DataFrame([{'Email': email, 'EmpID': empid, **y} , I suppose we are labeling email to "Email" and empid to "EmpID", but how are we doing that? The json file doesn't have a column name for "[email protected]:", "employeenumber5566": and "employeenumber6677": , they are all unique values for each of those json object
We are not labeling email to Email, here email is a variable name whose value is obtained by iterating over the key-value pairs for the top level dictionary. So if you consider first dictionary the key would be "[email protected]" and therefore the value of email would be the same as that key.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.