Python Pandas - Json to DataFrame

Question

I have a complicated Json File that looks like this:

{
  "User A" : {
     "Obj1" : {
        "key1": "val1",
        "key2": "val2",
        "key3": "val3",
     }
     "Obj2" : {
        "key1": "val1",
        "key2": "val2",
        "key3": "val3"
     }
  }
  "User B" : {
     "Obj1" : {
        "key1": "val1",
        "key2": "val2",
        "key3": "val3",
        "key4": "val4"
     }
  }
}

And I want to turn it into a dataframe that looks like this:

                key1   key2   key3   key4
User A   Obj1   val1   val2   val3    NaN
         Obj2   val1   val2   val3    NaN
User B   Obj1   val1   val2   val3    val4

Is this possible with pandas? If so, how can I manage to do it?

If it's easier, I don't mind removing the first two columns of the User and the Obj, and just remain with the columns of the keys.

jezrael · Accepted Answer · 2016-12-14 06:29:06Z

3

You can first read file to dict:

with open('file.json') as data_file:    
    dd = json.load(data_file)

print(dd)
{'User B': {'Obj1': {'key2': 'val2', 'key4': 'val4', 'key1': 'val1', 'key3': 'val3'}}, 
'User A': {'Obj1': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}, 
'Obj2': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}}}

And then use dict comprehension with concat:

df = pd.concat({key:pd.DataFrame(dd[key]).T for key in dd.keys()})
print (df)
             key1  key2  key3  key4
User A Obj1  val1  val2  val3   NaN
       Obj2  val1  val2  val3   NaN
User B Obj1  val1  val2  val3  val4

Another solution with read_json, but first need reshape by unstack and remove NaN rows by dropna. Last need DataFrame.from_records:

df = pd.read_json('file.json').unstack().dropna()
print (df)
User A  Obj1     {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}
        Obj2     {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}
User B  Obj1    {'key2': 'val2', 'key4': 'val4', 'key1': 'val1...
dtype: object

df1 = pd.DataFrame.from_records(df.values.tolist())
print (df1)
   key1  key2  key3  key4
0  val1  val2  val3   NaN
1  val1  val2  val3   NaN
2  val1  val2  val3  val4

df1 = pd.DataFrame.from_records(df.values.tolist(), index = df.index)
print (df1)
             key1  key2  key3  key4
User A Obj1  val1  val2  val3   NaN
       Obj2  val1  val2  val3   NaN
User B Obj1  val1  val2  val3  val4

edited Dec 14, 2016 at 6:29

answered Dec 14, 2016 at 6:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

TheDaJon Over a year ago

you are so helpful thank you! can't imagine I worked an hour for something that can be done with two lines of code, so elegant... Is there a simple way to also save this df as an excel file?

jezrael Over a year ago

Thank you for accepting! Sure, use to_excel - df1.to_excel('file.xlsx') or df1.to_excel('file.xlsx', index=False) if need remove index.

Collectives™ on Stack Overflow

Python Pandas - Json to DataFrame

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related