Pandas: Retrieving nested data from JSON File

Question

I am parsing nested JSON data from here. Some of the files within this file have more than one committee_id associated with them. I need all of the committees associated with each file. I'm not sure, but I imagine that would mean writing a new row for each committee_id. My code follows:

import os.path
import csv
import json

path = '/home/jayaramdas/anaconda3/Thesis/govtrack/bills109/hr'
dirs = os.listdir(path)
outputfile = open('df/h109_s_b', 'w', newline='')                            
outputwriter = csv.writer(outputfile)

for dir in dirs:
    with open(path + "/" + dir + "/data.json", "r") as f:
        data = json.load(f)

        a = data['introduced_at']
        b = data['bill_id']
        c = data['sponsor']['thomas_id']
        d = data['sponsor']['state']
        e = data['sponsor']['name']
        f = data['sponsor']['type']
        i = data['subjects_top_term']   
        j = data['official_title']               

        if data['committees']:
            g = data['committees'][0]['committee_id']
        else:
            g = "None"                      
    outputwriter.writerow([a, b, c, d, e, f, g, i, j])
outputfile.close()

The problem I am having is that my code is only collecting the first committee_id listed. For example, file hr145 looks like this:

 "committees": [
{
  "activity": [
    "referral", 
    "in committee"
  ], 
  "committee": "House Transportation and Infrastructure", 
  "committee_id": "HSPW"
}, 
{
  "activity": [
    "referral"
  ], 
  "committee": "House Transportation and Infrastructure", 
  "committee_id": "HSPW", 
  "subcommittee": "Subcommittee on Economic Development, Public Buildings and Emergency Management", 
  "subcommittee_id": "13"
}, 
{
  "activity": [
    "referral", 
    "in committee"
  ], 
  "committee": "House Financial Services", 
  "committee_id": "HSBA"
}, 
{
  "activity": [
    "referral"
  ], 
  "committee": "House Financial Services", 


  "committee_id": "HSBA", 
  "subcommittee": "Subcommittee on Domestic and International Monetary Policy, Trade, and Technology", 
  "subcommittee_id": "19"
}

This is where it is a little bit tricky because I also want the subcommittee_id associated with the committee_id when the bill gets passed to a subcommittee:

bill_iid    committee   subcommittee    introduced at   Thomas_id   state   name
hr145-109   HSPW          na             "2005-01-4"         73      NY "McHugh, John M."
hr145-109   HSPW          13             "2005-01-4"         73      NY "McHugh, John M."
hr145-109   HSBA          na             "2005-01-4"         73      NY "McHugh, John M."
hr145-109   HSBA          19             "2005-01-4"         73      NY "McHugh, John M."

Any ideas?

MaxU - stand with Ukraine · Accepted Answer · 2016-04-04 14:44:30Z

1

you can do it this way:

In [111]: with open(fn) as f:
   .....:     data = ujson.load(f)
   .....:

In [112]: committees = pd.io.json.json_normalize(data, 'committees')

In [113]: committees
Out[113]:
             activity                                committee committee_id                            subcommittee subcommittee_id
0          [referral]                House Energy and Commerce         HSIF                                     NaN             NaN
1          [referral]                House Energy and Commerce         HSIF  Subcommittee on Energy and Air Quality              03
2          [referral]        House Education and the Workforce         HSED                                     NaN             NaN
3          [referral]                 House Financial Services         HSBA                                     NaN             NaN
4          [referral]                        House Agriculture         HSAG                                     NaN             NaN
5  [referral, markup]                          House Resources         HSII                                     NaN             NaN
6          [referral]                            House Science         HSSY                                     NaN             NaN
7          [referral]                     House Ways and Means         HSWM                                     NaN             NaN
8          [referral]  House Transportation and Infrastructure         HSPW                                     NaN             NaN

UPDATE:

if you want to have all your data in one DF you can do it this way:

import os
import ujson
import pandas as pd

start_path = '/home/jayaramdas/anaconda3/Thesis/govtrack/bills109/hr'

def get_merged_json(start_path):
    return [ujson.load(open(os.path.join(path, f)))
            for p, _, files in os.walk(start_path)
            for f in files
            if f.endswith('.json')
           ]

df = pd.read_json(ujson.dumps(data))

PS it will put all committees in one column as JSON data though

edited Apr 4, 2016 at 14:44

answered Apr 4, 2016 at 12:16

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Collective Action Over a year ago

Thanks again MaxU! I have a small question: what should fnbe pointing to? Wait, I think I got it. fn= filename.

MaxU - stand with Ukraine Over a year ago

@MichaelPerdue, yes, it should be full or relative path to your file including its name

Collective Action Over a year ago

I have applied your code with one exception. I have substituted json for ujson, as I was getting a NameError: name 'ujson' is not defined . However, it is only returning one row. As fn I am using (path + "/" + dir + "/data.json", "r") I can probably tool around with it to get it working, but would you have an idea of what that is?

MaxU - stand with Ukraine Over a year ago

@MichaelPerdue, the number of rows will vary depending of number of elements in the committees list in each file

MaxU - stand with Ukraine Over a year ago

@MichaelPerdue, i've updated my answer - please check. I would also open a new question about how to expand a JSON column into multiple columns, because it might be tricky

|

Collectives™ on Stack Overflow

Pandas: Retrieving nested data from JSON File

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related