viewing nested JSON data into a pandas dataframe

Question

I have now added the current problem onto GitHib. Please find the URL for the repo. I have included a Jupyter notebook that also explains the problem. Thanks guys.

https://github.com/simongraham/dataExplore.git

I am currently working with nutritional data for a project, where the data is in raw JSON format, and I want to use python and pandas to obtain an understandable data frame. I understand that this is an easy task when the JSON is not nested. Here I would use:

nutrition = pd.read_json('data')

However I have nested information and I am finding it very difficult to get it into a reasonable data frame. The JSON format is as follows, where the nutritionNutrients element itself is a nested element. The nest for this element will describe the nutritional contents for a variety of different things such as alcohol and bcfa, as included. I have only included a sample because this is a large data file.

  [
        {
            "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
            "vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
            "vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
            "vcPortionId": "1",
            "vcPortionName": "1 average pepper",
            "vcPortionSize": "20",
            "ftEnergyKcal": 5.2,
            "vcPortionUnit": "g",
            "dtConsumedDate": "2016-05-04T00:00:00",
            "nutritionNutrients": [
                {
                    "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                    "vcNutrient": "alcohol",
                    "ftValue": 0,
                    "vcUnit": "g",
                    "nPercentRI": 0,
                    "vcTrafficLight": ""
                },
                {
                    "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                    "vcNutrient": "bcfa",
                    "ftValue": 0,
                    "vcUnit": "g",
                    "nPercentRI": 0,
                    "vcTrafficLight": ""
                },
                {
                    "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                    "vcNutrient": "biotin",
                    "ftValue": 0,
                    "vcUnit": "µg",
                    "nPercentRI": 0,
                    "vcTrafficLight": ""
                },
                ...
            ]
        }
    ]

Any help would be appreciated.

Thanks.

.... ....

Now that I have found out how to solve this problem using json_normalize, I return the same problem, but this time my code is nested twice. Ie:

[
{
...
}
[,
"nutritionPortions": [
    {
        "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
        "vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
        "vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
        "vcPortionId": "1",
        "vcPortionName": "1 average pepper",
        "vcPortionSize": "20",
        "ftEnergyKcal": 5.2,
        "vcPortionUnit": "g",
        "dtConsumedDate": "2016-05-04T00:00:00",
        "nutritionNutrients": [
            {
                "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                "vcNutrient": "alcohol",
                "ftValue": 0,
                "vcUnit": "g",
                "nPercentRI": 0,
                "vcTrafficLight": ""
            },
            {
                "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                "vcNutrient": "bcfa",
                "ftValue": 0,
                "vcUnit": "g",
                "nPercentRI": 0,
                "vcTrafficLight": ""
            },
            {
                "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
                "vcNutrient": "biotin",
                "ftValue": 0,
                "vcUnit": "µg",
                "nPercentRI": 0,
                "vcTrafficLight": ""
            },
            ...
           }
          ]
        }
      ]

When I have a JSON consisting of only nutrition data I can use:

nutrition = (pd.io
   .json
   .json_normalize((data, ['nutritionPortions']), 'nutritionNutrients',
        ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
         'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)

However, my data does not only contain nutrition information. For example it will contain activity information, and therefore the nutrition information is nested with "nutrtitionPortions" at the start. Let's assume that all other columns are not nested and they are represented by "Activity" and "Wellbeing".

If I use the code:

nutrition = (pd.io
   .json
   .json_normalize(data, ['nutritionPortions'])
)

I will return to the original problem where "nutritionNutrients" is nested, but I am having no success then obtaining the corresponding data frame.

Thanks

MaxU - stand with Ukraine · Accepted Answer · 2016-06-25 22:16:22Z

4

UPDATE: this should work for your kaidoData.json file:

df = (pd.io
        .json
        .json_normalize(data[0]['ionPortions'], 'nutritionNutrients',
            ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
             'dtCreatedDate','dtUpdatedDate','nProcessingStatus',
             'vcPortionUnit','dtConsumedDate'
            ]
        )
)

PS i don't know what is wrong with the 'ftEnergyKcal' - it throws me:

KeyError: 'ftEnergyKcal'

maybe it's missing in some sections

OLD answer:

use json_normalize():

(pd.io
   .json
   .json_normalize(l, 'nutritionNutrients',
        ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
         'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)

demo:

In [107]: (pd.io
   .....:    .json
   .....:    .json_normalize(l, 'nutritionNutrients',
   .....:         ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
   .....:          'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
   .....: )
Out[107]:
   ftValue  nPercentRI vcNutrient vcNutritionPortionId vcTrafficLight        ...        vcPortionSize  \
0        0           0    alcohol  478d1905-f264-4d...                       ...                   20
1        0           0       bcfa  478d1905-f264-4d...                       ...                   20
2        0           0     biotin  478d1905-f264-4d...                       ...                   20

         vcNutritionId vcPortionId ftEnergyKcal     vcPortionName
0  2476378b-79ee-48...           1          5.2  1 average pepper
1  2476378b-79ee-48...           1          5.2  1 average pepper
2  2476378b-79ee-48...           1          5.2  1 average pepper

[3 rows x 14 columns]

where l is your list (parsed JSON)

edited Jun 25, 2016 at 22:16

answered Jun 23, 2016 at 22:18

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

simongraham Over a year ago

I don't suppose you can give me a hand with my latest edit of the current problem. Thanks :)

MaxU - stand with Ukraine Over a year ago

@simongraham, i think you would have to pass something like data['nutritionPortions'] instead of data to the json_normalize

simongraham Over a year ago

I'm afraid I already tried this, but with no luck unfortunately @MaxU. I guess I need a way of gaining the subset of the raw JSON data

MaxU - stand with Ukraine Over a year ago

@simongraham, i would recommend you to open another question and put there a working sample of your data. For example: data = [...], which can be executed in Python without errors. Alternatively you can upload your data as file somewhere and post a link to this file in your question...

simongraham Over a year ago

I have added the relevant data to github.com/simongraham/dataExplore.git . I have also added a Jupiter notebook. Feel free to play around :) @MaxU

Collectives™ on Stack Overflow

viewing nested JSON data into a pandas dataframe

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related