0

I'm trying to convert a flat CSV to a nested JSON format. This is my data:

# data.csv
company_id,company_name,income_type,income_amt
1,"Foobar Inc","royalties",5000000
2,"ACME Corp","sales",3000000
2,"ACME Corp","rent",1000000

And need to convert to the following JSON structure:

{"data": [{
            "company_id": 1,
            "name": "Foobar Inc",
            "income": ["royalties": 5000000]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales": 3000000,
                "rent": 1000000
            ]
        }]
}

But my current code (based on this and using Python and the pandas library):

# script.py
import json
import pandas as pd

df = pd.read_csv('data.csv')

def get_nested_rec(key, grp):
rec = {}

    rec['company_id'] = key[0]
    rec['company_name'] = key[1]

    for field in ['income_type']:
        income_types = list(grp[field].unique())
        rec['income'] = income_types

    return rec

records = []

for key, grp in df.groupby(['company_id','company_name','income_type','income_amt']):
    rec = get_nested_rec(key, grp)
    records.append(rec)

records = dict(data = records)

print(json.dumps(records, indent=4))

Outputs this format:

{"data": [
        {
            "company_id": 1,
            "company_name": "Foobar Inc", 
            "income": [
                "royalties"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "rent"
            ]
        }
    ]}

Hitting a wall in figuring out how to combine rows with the same company_id into a single object and add in the income_amt values.

1 Answer 1

1

You can do it like this:

for key, grp in df.groupby('company_id'):
    records.append({
        "company_id": key,
        "company_name": grp.company_name.iloc[0],
        "income": {
            row.income_type: row.income_amt for row in grp.itertuples()
        }})

That gives you:

[{'company_id': 1,
  'company_name': 'Foobar Inc',
  'income': {'royalties': 5000000}},
 {'company_id': 2,
  'company_name': 'ACME Corp',
  'income': {'rent': 1000000, 'sales': 3000000}}]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.