Python: json_normalize a pandas series gives TypeError

Question

I have tens of thousands rows of json snippets like this in a pandas series df["json"]

[{
    'IDs': [{
        'lotId': '1',
        'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 2,
}, {
    'IDs': [{
        'lotId': '2',
        'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 4,
}, {
    'IDs': [{
         'lotId': '3',
         'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 8,
}]

Sample of the original file:

{"type": "OPEN","title": "rainbow","json": [{"IDs": [{"lotId": "1","Id": "123456"}],"date": "2009-04-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "123456"}],"date": "2009-04-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "123456"}],"date": "2009-04-17","bidsCount": 8,}]}
{"type": "CLOSED","title": "clouds","json": [{"IDs": [{"lotId": "1","Id": "23345"}],"date": "2009-05-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "23345"}],"date": "2009-05-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "23345"}],"date": "2009-05-17","bidsCount": 8,}]}


df = pd.read_json("file.json", lines=True)

I am trying to make them into a data frame, something like

Id      lotId      bidsCount    date
123456  1          2            2009-04-17
123456  2          4            2009-04-17
123456  3          8            2009-04-17

by using

json_normalize(df["json"])

However I get

AttributeError: 'list' object has no attribute 'values'

I guess the json snippet is seen as a list, however I can not figure out how to make it work otherwise. Help appreciated!

Please paste your data frame's head here. Is your jsons column a string? — cs95
– cs95, Commented Jul 26, 2017 at 11:28
zufanka first of all as the documentation says, the df['jsons'] should be a dict or list of dict. Then you could do result = json_normalize(data, 'IDs', ['date', 'bidsCount']) like this to get your desired result. I did same in my answer, don't know why people like to downvote. hope this helps — user2906838
– user2906838, Commented Jul 26, 2017 at 11:54
I create the df from an enormous json file through pd.read_json("file.json", lines=True) . The json column is one of the files nested parts, not a string. I can try to recreate the file, as the data is confidential if that would help. — zufanka
– zufanka, Commented Jul 26, 2017 at 11:55
zufanka, yes. just to type(df['json']) to make sure that its a dict, or list of dict to work with json_normalize(). If you could tell how you're creating the df['json'] then it would help. You don't need to recreate the whole data just a sample would be great. — user2906838
– user2906838, Commented Jul 26, 2017 at 11:59

Bharath M Shetty · Accepted Answer · 2017-07-26 13:10:13Z

18

I think your df['json'] is a nested list. You can use a for loop and concatenate the dataframe to get the big dataframe i.e

Data:

{"type": "OPEN","title": "rainbow","json": [{"IDs": [{"lotId": "1","Id": "123456"}],"date": "2009-04-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "123456"}],"date": "2009-04-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "123456"}],"date": "2009-04-17","bidsCount": 8,}]}
{"type": "CLOSED","title": "clouds","json": [{"IDs": [{"lotId": "1","Id": "23345"}],"date": "2009-05-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "23345"}],"date": "2009-05-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "23345"}],"date": "2009-05-17","bidsCount": 8,}]}

df = pd.read_json("file.json", lines=True)

DataFrame:

new_df = pd.concat([pd.DataFrame(json_normalize(x)) for x in df['json']],ignore_index=True)

Output:

                                IDs  bidsCount        date
0  [{'Id': '123456', 'lotId': '1'}]          2  2009-04-17
1  [{'Id': '123456', 'lotId': '2'}]          4  2009-04-17
2  [{'Id': '123456', 'lotId': '3'}]          8  2009-04-17
3   [{'Id': '23345', 'lotId': '1'}]          2  2009-05-17
4   [{'Id': '23345', 'lotId': '2'}]          4  2009-05-17
5   [{'Id': '23345', 'lotId': '3'}]          8  2009-05-17

If you want the keys of IDs as columns then you use

new_df['lotId'] = [x[0]['lotId'] for x in new_df['IDs']]
new_df['IDs'] = [x[0]['Id'] for x in new_df['IDs']]

      IDs  bidsCount        date lotId
0  123456          2  2009-04-17     1
1  123456          4  2009-04-17     2
2  123456          8  2009-04-17     3
3   23345          2  2009-05-17     1
4   23345          4  2009-05-17     2
5   23345          8  2009-05-17     3

edited Jul 26, 2017 at 13:10

answered Jul 26, 2017 at 11:50

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

zufanka Over a year ago

does exactly what I need, many thanks! Just needed to add df['json'].dropna() as some of the data is missing.

Bharath M Shetty Over a year ago

Glad it helped!

Moh Over a year ago

Any more efficient approaches to this?

Collectives™ on Stack Overflow

Python: json_normalize a pandas series gives TypeError

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related