How to split a column with json string into their own columns

Question

I have a dataframe like (with one example row):

raw_data = [{'id': 1, 'name': 'FRANK', 'attributes': '{"deleted": false, "rejected": true, "handled": true, "order": "37"}'}]
raw_df = pd.DataFrame(raw_data)

I would like to break the json in the attributes column into their own columns with each of their values so that the resulting dataframe looks like:

new_data = [{'id': 1, 'name': 'FRANK', 'deleted': 'false', 'rejected': 'true', 'handled': 'true', 'order': 37}]
new_df = pd.DataFrame(new_data)

Is there a way I can break up the json to achieve this? Thanks!

How did you end up with raw_df? Maybe read_json can help. Also json_normalize. — Quang Hoang
– Quang Hoang, Commented Feb 22, 2023 at 16:25
This is an example line from a csv that I am reading using pd.read_csv — Angie
– Angie, Commented Feb 22, 2023 at 16:41

Corralien · Accepted Answer · 2023-02-22 20:33:04Z

0

You can convert your json string with json.loads then use pd.json_normalize to convert as dataframe:

You are NaN values in attributes column

It gives back 66 for the entire column

Am i not able to split the json if some of the rows in that column are NaN?

To fix NaN values, you can replace NaN by empty dict string '{}'

import json

attr_df = pd.json_normalize(raw_df.pop('attributes').fillna('{}').map(json.loads))
new_df = pd.concat([raw_df, attr_df], axis=1)
print(new_df)

# Output
   id   name  deleted  rejected  handled order
0   1  FRANK    False      True     True    37

edited Feb 22, 2023 at 20:33

answered Feb 22, 2023 at 16:27

Corralien

121k8 gold badges43 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Angie Over a year ago

I am getting this error TypeError: the JSON object must be str, bytes or bytearray, not float do I have to convert the column to a string somehow? My raw data is formatted in this way

raw_data = [{'id': 1, 'name': 'FRANK', 'attributes': '{\n "deleted": false,\n "rejected": true,\n "handled": true,\n "order": "37"}'}]

and then i turned it into a dataframe using pd.DataFrame(raw_data)

Corralien Over a year ago

I think you are NaN values in attributes column. What is the output of print(df['attributes'].isna().sum())?

Angie Over a year ago

Ah yes it gives back 66 for the entire column

Angie Over a year ago

Am i not able to split the json if some of the rows in that column are NaN?

Corralien Over a year ago

No you can't. A possible solution is to modify the code with attr_df = pd.json_normalize(raw_df.pop('attributes').fillna('{}').map(json.loads)): replace nan by empty string dict.

Collectives™ on Stack Overflow

How to split a column with json string into their own columns

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related