0

I have a dataframe like (with one example row):

raw_data = [{'id': 1, 'name': 'FRANK', 'attributes': '{"deleted": false, "rejected": true, "handled": true, "order": "37"}'}]
raw_df = pd.DataFrame(raw_data)

I would like to break the json in the attributes column into their own columns with each of their values so that the resulting dataframe looks like:

new_data = [{'id': 1, 'name': 'FRANK', 'deleted': 'false', 'rejected': 'true', 'handled': 'true', 'order': 37}]
new_df = pd.DataFrame(new_data)

Is there a way I can break up the json to achieve this? Thanks!

2
  • How did you end up with raw_df? Maybe read_json can help. Also json_normalize. Commented Feb 22, 2023 at 16:25
  • This is an example line from a csv that I am reading using pd.read_csv Commented Feb 22, 2023 at 16:41

1 Answer 1

0

You can convert your json string with json.loads then use pd.json_normalize to convert as dataframe:

You are NaN values in attributes column

It gives back 66 for the entire column

Am i not able to split the json if some of the rows in that column are NaN?

To fix NaN values, you can replace NaN by empty dict string '{}'

import json

attr_df = pd.json_normalize(raw_df.pop('attributes').fillna('{}').map(json.loads))
new_df = pd.concat([raw_df, attr_df], axis=1)
print(new_df)

# Output
   id   name  deleted  rejected  handled order
0   1  FRANK    False      True     True    37
Sign up to request clarification or add additional context in comments.

5 Comments

I am getting this error TypeError: the JSON object must be str, bytes or bytearray, not float do I have to convert the column to a string somehow? My raw data is formatted in this way raw_data = [{'id': 1, 'name': 'FRANK', 'attributes': '{\n "deleted": false,\n "rejected": true,\n "handled": true,\n "order": "37"}'}] and then i turned it into a dataframe using pd.DataFrame(raw_data)
I think you are NaN values in attributes column. What is the output of print(df['attributes'].isna().sum())?
Ah yes it gives back 66 for the entire column
Am i not able to split the json if some of the rows in that column are NaN?
No you can't. A possible solution is to modify the code with attr_df = pd.json_normalize(raw_df.pop('attributes').fillna('{}').map(json.loads)): replace nan by empty string dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.