-1

I'm trying to get the metadata out from a json using pandas json_normalize, but it does not work as expected.

I have a json fine with the following structure

data=[
    {'a':'aa',
    'b':{'b1':'bb1','b2':'bb2'},
    'c':[{
        'ca':[{'ca1':'caa1'
            }]
        }]
    }]  

I'd like to get the following

ca1 a b.b1
caa1 aa bb1

I would expect this to work

pd.json_normalize(data, record_path=['c','ca'], meta = ['a',['b','b1']])

but it doesn't find the key b1. Strangely enough if my record_path is 'c' alone it does find the key. I feel I'm missing something here, but I can't figure out what. I appreciate any help!

2 Answers 2

1

Going down first level you grab the meta as a list of columns you want to keep. Record path use a list to map levels that you want to go down. Finally column b is a dict you can apply to a Series concat back into df and pop to remove unpacked dict column.

df = pd.json_normalize(
    data=data,
    meta=['a', 'b'],
    record_path=['c', 'ca']
)
df = pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)
print(df)

Output:

     ca1   a   b1   b2
0  caa1  aa  bb1  bb2
Sign up to request clarification or add additional context in comments.

Comments

0

This is a workaround I used eventually

data=[
    {'a':'aa',
    'b':{'b1':'bb1','b2':'bb2'},
    'c':[{
        'ca':[{'ca1':'caa1'
              }]
    }]
    }]
  

df = pd.json_normalize(data, record_path=['c','ca'], meta = ['a',['b']]
                 )

df = pd.concat([df,pd.json_normalize(df['b'])],axis = 1)
df.drop(columns='b',inplace = True)

I still think there should be a better way, but it works

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.