I am parsing nested JSON data from here. Some of the files within this file have more than one committee_id associated with them. I need all of the committees associated with each file. I'm not sure, but I imagine that would mean writing a new row for each committee_id. My code follows:
import os.path
import csv
import json
path = '/home/jayaramdas/anaconda3/Thesis/govtrack/bills109/hr'
dirs = os.listdir(path)
outputfile = open('df/h109_s_b', 'w', newline='')
outputwriter = csv.writer(outputfile)
for dir in dirs:
with open(path + "/" + dir + "/data.json", "r") as f:
data = json.load(f)
a = data['introduced_at']
b = data['bill_id']
c = data['sponsor']['thomas_id']
d = data['sponsor']['state']
e = data['sponsor']['name']
f = data['sponsor']['type']
i = data['subjects_top_term']
j = data['official_title']
if data['committees']:
g = data['committees'][0]['committee_id']
else:
g = "None"
outputwriter.writerow([a, b, c, d, e, f, g, i, j])
outputfile.close()
The problem I am having is that my code is only collecting the first committee_id listed. For example, file hr145 looks like this:
"committees": [
{
"activity": [
"referral",
"in committee"
],
"committee": "House Transportation and Infrastructure",
"committee_id": "HSPW"
},
{
"activity": [
"referral"
],
"committee": "House Transportation and Infrastructure",
"committee_id": "HSPW",
"subcommittee": "Subcommittee on Economic Development, Public Buildings and Emergency Management",
"subcommittee_id": "13"
},
{
"activity": [
"referral",
"in committee"
],
"committee": "House Financial Services",
"committee_id": "HSBA"
},
{
"activity": [
"referral"
],
"committee": "House Financial Services",
"committee_id": "HSBA",
"subcommittee": "Subcommittee on Domestic and International Monetary Policy, Trade, and Technology",
"subcommittee_id": "19"
}
This is where it is a little bit tricky because I also want the subcommittee_id associated with the committee_id when the bill gets passed to a subcommittee:
bill_iid committee subcommittee introduced at Thomas_id state name
hr145-109 HSPW na "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSPW 13 "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSBA na "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSBA 19 "2005-01-4" 73 NY "McHugh, John M."
Any ideas?