Python: parce json with 2 arrays via json_normalize

Question

Would you help, please, to parce 2-arrayed json via python, json_normalize.

Here is the code:

import json
from pandas.io.json import json_normalize


data5 = {
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Food" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]
} 


df2 = json_normalize(data5
                      , record_path = ['topping']
                             , meta = ['id', 'type', 'name', 'ppu', 'batters']
                     , record_prefix='_'
                     , errors='ignore'
                   )

This parces "topping" object but doesn't parce the "batters". To parce the "batters" may be applied the code:

# parce the part of json string into another dataframe
df3 = json_normalize(data5
              ,record_path = ['batters', 'batter'])

# cross join 2 dataframes
df2['key_'] = 1
df3['key_'] = 1

result = pd.merge(df2, df3, on ='key_').drop("key_", 1)

But this looks complicated. Is it possible to combine 2 steps above in one query? E.g.:

   df2 = json_normalize(data5
                          , record_path = ['topping', ['batters', 'batter']]
                                 , meta = ['id', 'type', 'name', 'ppu', ]
                         , record_prefix='_'
                         , errors='ignore'
                       )

Thank you.

Tranbi · Accepted Answer · 2023-01-23 13:08:15Z

1

I don't think you can specify that within json_normalize. However, you can avoid creating the key_ column by specifying how="cross" in pd.merge (also no need to keep batters in df2):

import pandas as pd

df2 = pd.json_normalize(data5
                      , record_path = ['topping']
                             , meta = ['id', 'type', 'name', 'ppu']
                     , record_prefix='_'
                   )
df3 = pd.json_normalize(data5
              ,record_path = ['batters', 'batter'])

pd.merge(df2, df3, how="cross")

answered Jan 23, 2023 at 13:08

Tranbi

12.8k6 gold badges19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Semyon-coder Over a year ago

Tranbi, thank you. Your solution looks much lite and easy.

Collectives™ on Stack Overflow

Python: parce json with 2 arrays via json_normalize

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related