3

I have a dataset from a json file like this format:

data = {'data': {'content': [{'gender': 'Female',
    'id': 'covid-1004200003256',
    'state_code': '3272',
    'district_code': '3272040',
    'subdistrict_code': '3272040004',
    'latitude': -6.906,
    'longitude': 106.923,
    'state_name': 'KOTA SUKABUMI',
    'district_name': 'Gunungpuyuh',
    'subdistrict_name': 'Karamat',
    'stage': 'Isolated',
    'status': 'SUSPECT'},
   {'gender': 'Female',
    'id': 'covid-1004200003255',
    'state_code': '3272',
    'district_code': '3272040',
    'subdistrict_code': '3272040004',
    'latitude': -6.906,
    'longitude': 106.923,
    'state_name': 'KOTA SUKABUMI',
    'district_name': 'Gunungpuyuh',
    'subdistrict_name': 'Karamat',
    'stage': 'Isolated',
    'status': 'SUSPECT',
    }]}}

So I want to make a dataframe using json_normalize

df = pd.json_normalize(data, 'content')
df.head(10)

But it returns:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-4d8ad8c8743a> in <module>()
----> 1 df = pd.json_normalize(data, 'content')
      2 df.head(10)

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    334                 records.extend(recs)
    335 
--> 336     _recursive_extract(data, record_path, {}, level=0)
    337 
    338     result = DataFrame(records)

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _recursive_extract(data, path, seen_meta, level)
    307         else:
    308             for obj in data:
--> 309                 recs = _pull_records(obj, path[0])
    310                 recs = [
    311                     nested_to_record(r, sep=sep, max_level=max_level)

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_records(js, spec)
    246         if has non iterable value.
    247         """
--> 248         result = _pull_field(js, spec)
    249 
    250         # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_field(js, spec)
    237                 result = result[field]
    238         else:
--> 239             result = result[spec]
    240         return result
    241 

KeyError: 'content'

Any ideas how to fix this?

2 Answers 2

2

Your command is failing because you are trying to pass 2nd level of nested key(content). You can only pass first level of nested keys.

So, you need to pass data['data'], like below:

In [934]: df = pd.json_normalize(data['data'], 'content')

In [934]: df
Out[934]: 
   gender                   id state_code district_code subdistrict_code  latitude  longitude     state_name district_name subdistrict_name     stage   status
0  Female  covid-1004200003256       3272       3272040       3272040004    -6.906    106.923  KOTA SUKABUMI   Gunungpuyuh          Karamat  Isolated  SUSPECT
1  Female  covid-1004200003255       3272       3272040       3272040004    -6.906    106.923  KOTA SUKABUMI   Gunungpuyuh          Karamat  Isolated  SUSPECT
Sign up to request clarification or add additional context in comments.

Comments

1

Try passing the array of records in directly:

df = pd.json_normalize(data['data']['content'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.