3

I am trying to convert a nested json array to a pandas data frame.

The data looks something like this in list format:

 [{u'analysis': {u'active': u'Y',
  u'dpv_cmra': u'N',
  u'dpv_footnotes': u'AAN1',
  u'dpv_match_code': u'D',
  u'dpv_vacant': u'N',
  u'footnotes': u'H#'},
  u'candidate_index': 0,
  u'components': 
    {u'city_name': u'City',
     u'delivery_point': u'Variable',
     u'delivery_point_check_digit': u'8',
     u'plus4_code': u'Variable',
     u'primary_number': u'Variable',
     u'state_abbreviation': u'Variable',
     u'street_name': u'Variable',
     u'street_predirection': u'Variable',
     u'street_suffix': u'Variable',
     u'zipcode': u'Variable'},
  u'delivery_line_1': u'Variable',
  u'delivery_point_barcode': u'Variable',
  u'input_id': u'Variable',
  u'input_index': Variable,
  u'last_line': u'Variable',
  u'metadata': 
    {u'building_default_indicator': u'Variable',
     u'carrier_route': u'Variable',
     u'congressional_district': u'Variable',
     u'county_fips': u'Variable',
     u'county_name': u'Variable',
     u'dst': True,
     u'zip_type': u'Variable'}}],

Any suggests how I can convert this to a data frame and take care of empty values? I've tried using try / except to handle the missing values, but I my data frame is then made up of tuples.

Thank You

1 Answer 1

4

There is a json_normalize function inside pd.io.json.

d = {u'analysis': {u'active': u'Y', u'dpv_cmra': u'N', u'dpv_footnotes': u'AAN1', u'dpv_match_code': u'D', u'dpv_vacant': u'N', u'footnotes': u'H#'}, u'candidate_index': 0, u'components': {u'city_name': u'City', u'delivery_point': u'Variable', u'delivery_point_check_digit': u'8', u'plus4_code': u'Variable', u'primary_number': u'Variable', u'state_abbreviation': u'Variable', u'street_name': u'Variable', u'street_predirection': u'Variable', u'street_suffix': u'Variable', u'zipcode': u'Variable'}, u'delivery_line_1': u'Variable', u'delivery_point_barcode': u'Variable', u'input_id': u'Variable', u'input_index': u'Variable', u'last_line': u'Variable', u'metadata': {u'building_default_indicator': u'Variable', u'carrier_route': u'Variable', u'congressional_district': u'Variable', u'county_fips': u'Variable', u'county_name': u'Variable', u'dst': True, u'zip_type': u'Variable'}}

>>> pd.io.json.json_normalize(d)
  analysis.active analysis.dpv_cmra analysis.dpv_footnotes analysis.dpv_match_code analysis.dpv_vacant analysis.footnotes  candidate_index components.city_name components.delivery_point components.delivery_point_check_digit        ...         \
0               Y                 N                   AAN1                       D                   N                 H#                0                 City                  Variable                                     8        ...          

   input_id input_index last_line metadata.building_default_indicator metadata.carrier_route metadata.congressional_district metadata.county_fips metadata.county_name metadata.dst metadata.zip_type  
0  Variable    Variable  Variable                            Variable               Variable                        Variable             Variable             Variable         True          Variable  

[1 rows x 29 columns]
Sign up to request clarification or add additional context in comments.

4 Comments

very cool ... once again pandas io is miles ahead of anything else
this seems to work, but I'm getting list index out of range?
never mind, straightened it out. thank you for the assistance!
too bad it can't handle arrays/lists

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.