Conversion from nested json to csv with pandas

Question

I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.

I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.

My json file looks like this:

{
  "tickets":[
    {
      "Name": "Liam",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Piano",
        "Sports"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "barkele01",
      "salary" : 870000
    },
    {
      "Name": "John",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Music",
        "Running"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "bedrost01",
      "salary" : 550000
    }
  ],
  "count": 2
}

my code, so far, looks like this:

import json
from pandas.io.json import json_normalize
import argparse


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
    parser.add_argument(
        "-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)

    args = parser.parse_args()

    with open(args.json_file, "r") as inputFile:  # open json file
        json_data = json.loads(inputFile.read())  # load json content
    flat_json = flatten_json(json_data)
    # normalizing flat json
    final_data = json_normalize(flat_json)

    with open(args.json_file.replace(".json", ".csv"), "w") as outputFile:  # open csv file

        # saving DataFrame to csv
        final_data.to_csv(outputFile, encoding='utf8', index=False)

What I would like to obtain is 1 line per ticket in the csv, with headings:

Name,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.

I would really appreciate anything that can do the click! Thank you!

big-o · Accepted Answer · 2019-07-16 19:19:15Z

6

I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!

I think the following code would give you exactly what you're after:

from cherrypicker import CherryPicker
import json
import pandas as pd

with open('file.json') as file:
    data = json.load(file)

picker = CherryPicker(data)
flat = picker['tickets'].flatten().get()
df = pd.DataFrame(flat)
print(df)

This gave me the output:

  Location_City Location_State  Name hobbies_0 hobbies_1   playerId  salary teamId  year
0   Los Angeles             CA  Liam     Piano    Sports  barkele01  870000    ATL  1985
1   Los Angeles             CA  John     Music   Running  bedrost01  550000    ATL  1985

You can install the package with:

pip install cherrypicker

...and there's more docs and guidance at https://cherrypicker.readthedocs.io.

answered Jul 16, 2019 at 19:19

big-o

4854 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user10033434 Over a year ago

I was researching the subject and came to your answer. Is there a way to reverse the process? from flattened csv to json? Thanks

big-o Over a year ago

Well JSON is just a way of formatting objects of various structures so it's really just a case of taking the parsed CSV and restructuring the contents of each row into the structure you want with lists, dicts etc. Then use the json library to encode it as JSON.

Serge Ballesta · Accepted Answer · 2019-07-04 09:46:47Z

2

An you already have a function to flatten a Json object, you have just to flatten the tickets:

...
with open(args.json_file, "r") as inputFile:  # open json file
    json_data = json.loads(inputFile.read())  # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...

With your sample data, final_data is as expected:

  Location_City Location_State  Name hobbies_0 hobbies_1   playerId  salary teamId  year
0   Los Angeles             CA  Liam     Piano    Sports  barkele01  870000    ATL  1985
1   Los Angeles             CA  John     Music   Running  bedrost01  550000    ATL  1985

answered Jul 4, 2019 at 9:46

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Comments

Sachin Prabhu · Accepted Answer · 2019-07-04 09:19:34Z

1

There may be a simpler solution for this. But this should work!

import json
import pandas as pd

with open('file.json') as file:
    data = json.load(file)

df = pd.DataFrame(data['tickets'])

for i,item in enumerate(df['Location']):
    df['location_city'] = dict(df['Location'])[i]['City']
    df['location_state'] = dict(df['Location'])[i]['State']

for i,item in enumerate(df['hobbies']):
    df['hobbies_{}'.format(i)] = dict(df['hobbies'])[i]

df = df.drop({'Location','hobbies'}, axis=1)

print(df)

edited Jul 4, 2019 at 9:19

answered Jul 4, 2019 at 9:13

Sachin Prabhu

1522 silver badges11 bronze badges

1 Comment

monachus Over a year ago

Thanks - I needed this exact tool for extracting nested documents from a mongo datastore. Perfect.

Collectives™ on Stack Overflow

Conversion from nested json to csv with pandas

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related