1

I have a pandas dataframe column which contains the below json formats:

{"events": null, "game": "yes", "catch": "yes", "throw": null}

{"events": null, "game": "yes", "catch": "no", "throw": null}

print(df_merge['PDH_Value'].head().to_dict())

{0: '{"roth": null, "pretax": null, "catchup": "false", "aftertax": null}', 1: '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}', 2: '', 3: '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}', 4: '{"roth": "true", "pretax": "true", "catchup": "true", "aftertax": "true"}'}

I wanted to iterate and fetch only the catch value ie "yes","no" from the json and store it in the same dataframe column.

Desired output:

df_merge['PDH_Value']

true

false

Tried the code and getting the below error:

pd.json_normalize(df_merge['PDH_Value'].apply(json.loads))['catch']

raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
5
  • Hi. Could you please say the name of the column with JSON data? I tried your version of the code and everything works fine: df = pd.DataFrame({"col": ['{"events": null, "game": "yes", "catch": "yes", "throw": null}', '{"events": null, "game": "yes", "catch": "no", "throw": null}']}) df['col2'] = pd.json_normalize(df['col'].apply(json.loads))['catch'] df['col2'] has yes and no values. Commented Jun 30, 2021 at 13:04
  • df_merge is my dataframe name. df_merge['PDH_Value'] is my column name of the dataframe. I have posted the head of the dict as well in the question. Commented Jun 30, 2021 at 13:21
  • As I can see your json does not have catch property but has catchup. Is it ok? Commented Jun 30, 2021 at 13:33
  • Also it look like that not all of rows in dataframe have valid JSON strings in the column. Try to check it out. May be some of them are empty? Commented Jun 30, 2021 at 13:36
  • Yes you are right catch and catchup are same. Some rows may be empty sometimes. Commented Jun 30, 2021 at 13:53

2 Answers 2

1
from ast import literal_eval

Let's start from here:

data=df_merge['PDH_Value'].to_dict()
data={k:v.replace('null','"null"') for k,v in data.items()} 
df_merge['PDH_Value']=pd.Series(data)

Explainaiton:

  • In the above code we are converting df_merge['PDH_Value'] to dictionary of strings(the values inside are string)
  • Then we are replacing null to "null" inside the string value of dictionary because if we don't do that we can't convert that string to real dict type
  • then we are making series of data and assigning that data back to df_merge['PDH_Value']

Then:

df_merge['PDH_Value']=df_merge['PDH_Value'].where(df_merge['PDH_Value'].str.startswith('{'),"{'catchup':'None'}")   
    
df_merge['PDH_Value']=df_merge['PDH_Value'].astype(str).map(lambda x:literal_eval(x) if x!='nan' else float('NaN'))
    
df_merge['PDH_Value']=df_merge['PDH_Value'].map(lambda x:x['catchup'])

Explaination:

  • Since the values of df_merge['PDH_Value'] are still string so we are checking If the values inside Series i.e df_merge['PDH_Value'] starts with { or not If It starts with { then we are not making any change to it but If If It doesn't start then we are replacing it to "{'catchup':'None'}"...In other words we are replacing empty character '' to "{'catchup':'None'}" because you are only interested in 'catchup'

  • After that making use of astype() we are ensuring that every thing is string and after that passing values to literal_eval() via map() method so now the strings inside df_merge['PDH_Value'] are converted to actual dictionary.

  • Since it's now actual dictionary so we are grabbing the value of 'catchup' key via map() method

Finally:

df_merge['PDH_Value']=df_merge['PDH_Value'].str.title().map({'True':'yes','False':'no','None':float('nan')})

Explaination:

  • Since we are using str.title() that's why True and False isin Uppercase T and F(1st letter upercase then rest lowercase)

  • Since you just want yes and no and if you are sure that all values inside dict are lowercase then you can remove str.title() from the above method,so it becomes:

df_merge['PDH_Value']=df_merge['PDH_Value'].map({'true':'yes','false':'no','None':float('nan')})

  • Finally we are mapping values via map() i.e it's similar to replacing you can also use replace() in place of map() so we are changing true to yes and vice versa and None to NaN
Sign up to request clarification or add additional context in comments.

18 Comments

Tried via map. Getting an error: df_merge['PDH_Value']=df_merge['PDH_Value'].map(lambda x:x['catchup']) TypeError: string indices must be integers
I have added the real data in the question.
Can you pls explain what we are doing ?
data={k:v.replace(null,"'null'") for k,v in data.items()} NameError: name 'null' is not defined
@Karthikchengalvarayan Then skip this line and proceed further and see if you are getting error or not
|
0

My approach is similar to @Anurag Dabas but with json.loads:

import pandas as pd
import numpy as np
import json

# load json data
# empty string and 'null' are valid json values
json_data = {
    "PDH_Value": [
        '{"roth": null, "pretax": null, "catchup": "false", "aftertax": null}', 
        '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}',
        '',
        '{"roth": null, "pretax": "true", "catchup": "true", "aftertax": null}',
        '{"roth": "true", "pretax": "true", "catchup": "true", "aftertax": "true"}',
        '{"roth": "true", "pretax": "true", "aftertax": "true"}',
        'null'
    ]}
df_merge = pd.DataFrame(json_data)

# replace empty strings or whitespace strings with NaN...
df_merge['PDH_Value'] = df_merge['PDH_Value'].replace(r'^\s*$', np.nan, regex=True)

# replace NaN-s with valid JSON with null value "catchup"
df_merge['PDH_Value'] = df_merge['PDH_Value'].fillna('{"catchup": null}')

# parse json values in the columns
df_merge['PDH_Value'] = df_merge['PDH_Value'].apply(json.loads)

# select only "catchup" property from the json if `x` is the dict and has `catchup` property 
df_merge['PDH_Value'] = df_merge['PDH_Value'].apply(lambda x: x['catchup'] if type(x) == dict and 'catchup' in x else None)

print(df_merge)

>>>          PDH_Value
>>>   0      false
>>>   1      true
>>>   2      None
>>>   3      true
>>>   4      true
>>>   5      None
>>>   6      None

3 Comments

Is my following code is right ? As i already have the data in the dataframe column called df_merge['PDH_Value'] instead of json_data. In my case how i have to proceed ? df_merge['PDH_Value'] = pd.DataFrame(df_merge['PDH_Value']) Rest all your code.
@Karthikchengalvarayan nope, you just skip first lines and can use the code from the # replace empty strings or whitespace strings with NaN... comment
TypeError: object of type 'NoneType' has no len()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.