How to read a JSON file with nested objects as a pandas DataFrame?

Question

I am curious how I can use pandas to read nested json of the following structure:

{
    "number": "",
    "date": "01.10.2016",
    "name": "R 3932",
    "locations": [
        {
            "depTimeDiffMin": "0",
            "name": "Spital am Pyhrn Bahnhof",
            "arrTime": "",
            "depTime": "06:32",
            "platform": "2",
            "stationIdx": "0",
            "arrTimeDiffMin": "",
            "track": "R 3932"
        },
        {
            "depTimeDiffMin": "0",
            "name": "Windischgarsten Bahnhof",
            "arrTime": "06:37",
            "depTime": "06:40",
            "platform": "2",
            "stationIdx": "1",
            "arrTimeDiffMin": "1",
            "track": ""
        },
        {
            "depTimeDiffMin": "",
            "name": "Linz/Donau Hbf",
            "arrTime": "08:24",
            "depTime": "",
            "platform": "1A-B",
            "stationIdx": "22",
            "arrTimeDiffMin": "1",
            "track": ""
        }
    ]
}

This here keeps the array as json. I would rather prefer it to be expanded into columns.

pd.read_json("/myJson.json", orient='records')

edit

Thanks for the first answers. I should refine my question: A flattening of the nested attributes in the array is not mandatory. It would be ok to just [A, B, C] concatenate the df.locations['name'].

My file contains multiple JSON objects (1 per line) I would like to keep number, date, name, and locations column. However, I would need to join the locations.

allLocations = ""
isFirst = True
for location in result.locations:
    if isFirst:
        isFirst = False
        allLocations = location['name']
    else:
        allLocations += "; " + location['name']
allLocations

My approach here does not seem to be efficient / pandas style.

Trenton McKinney · Accepted Answer · 2024-04-20 15:34:42Z

85

You can use json_normalize:

import json

with open('myJson.json') as data_file:    
    data = json.load(data_file)  

df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], 
                    record_prefix='locations_')
print (df)
  locations_arrTime locations_arrTimeDiffMin locations_depTime  \
0                                                        06:32   
1             06:37                        1             06:40   
2             08:24                        1                     

  locations_depTimeDiffMin           locations_name locations_platform  \
0                        0  Spital am Pyhrn Bahnhof                  2   
1                        0  Windischgarsten Bahnhof                  2   
2                                    Linz/Donau Hbf               1A-B   

  locations_stationIdx locations_track number    name        date  
0                    0          R 3932         R 3932  01.10.2016  
1                    1                         R 3932  01.10.2016  
2                   22                         R 3932  01.10.2016

EDIT:

You can use read_json with parsing name by DataFrame constructor and last groupby with apply join:

df = pd.read_json("myJson.json")
df.locations = pd.DataFrame(df.locations.values.tolist())['name']
df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index()
print (df)
        date    name number                                          locations
0 2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...

edited Apr 20, 2024 at 15:34

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Nov 14, 2016 at 12:41

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

vishnu prashanth Over a year ago

Let us continue this discussion in chat.

chromebookdev · Accepted Answer · 2022-08-26 02:14:48Z

11

Another option if anyone finds this, as I was working through a notebook. Read the file in as a df with

df = pd.read_json('filename.json')
df2 = pd.DataFrame.from_records(df['nest_level_1']['nest_level_2'])

Happy coding

answered Aug 26, 2022 at 2:14

chromebookdev

5145 silver badges17 bronze badges

Comments

iEriii · Accepted Answer · 2020-06-23 11:15:01Z

A possible alternative to pandas.json_normalize is to build your own dataframe by extracting only the selected keys and values from the nested dictionary. The main reason for doing this is because json_normalize gets slow for very large json file (and might not always produce the output you want).

So, here is an alternative way to flatten the nested dictionary in pandas using glom. The aim is to extract selected keys and value from the nested dictionary and save them in a separate column of the pandas dataframe (:

Here is a step by step guide: https://medium.com/@enrico.alemani/flatten-nested-dictionaries-in-pandas-using-glom-7948345c88f5

import pandas as pd
from glom import glom
from ast import literal_eval


target = {
    "number": "",
    "date": "01.10.2016",
    "name": "R 3932",
    "locations":
        {
            "depTimeDiffMin": "0",
            "name": "Spital am Pyhrn Bahnhof",
            "arrTime": "",
            "depTime": "06:32",
            "platform": "2",
            "stationIdx": "0",
            "arrTimeDiffMin": "",
            "track": "R 3932"
        }
}   



# Import data
df = pd.DataFrame([str(target)], columns=['target'])

# Extract id keys and save value into a separate pandas column
df['id'] = df['target'].apply(lambda row: glom(literal_eval(row), 'locations.name'))

Ravi Shankar · Accepted Answer · 2023-03-14 20:46:16Z

0

I have a mutiline Json having one json object every line {'a':'b','scope':{'eid':123213}} {'a':'d','scope':{'eid':1343213}}

NO comma seperated. Each line is indepoendent

i used following logic to read nested structure

threshold = pd.read_json(r"/content/data.json",lines=True)

threshold = pd.read_json(r"/content/data.json",lines=True)
threshold['entityId'] = pd.DataFrame.from_records(threshold['scope'])['entityId']
threshold.head()

answered Mar 14, 2023 at 20:46

Ravi Shankar

766 bronze badges

Collectives™ on Stack Overflow

How to read a JSON file with nested objects as a pandas DataFrame?

edit

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

edit

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related