2

I am trying to normalize json function using pandas but not able to flatten the nested arrays inside json

I have tried this reading one of the example but it only gives me one record using this code

from pandas.io.json import json_normalize
data ={
"id":"001",
    "counties" : [ {"name":"y"},{"name":"X"}], 
"extendedDescriptionFrench" : "Fromage Brick Petit Gaspesien",
"brand" : "PETIT GASPESIEN",
"brandFrench" : "PETIT GASPESIEN",
"productLife" : "90",
"digitalAssetFoodservice" : [ {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167445",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00",
"digitalAssetStateFoodservice" : "P"
}, {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167597",
"digitalAssetImageTypeFoodservice" : "M",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00"
}, {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167687",
"digitalAssetImageTypeFoodservice" : "C",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00",
"digitalAssetStateFoodservice" : "C"
} ]
}

a=json_normalize(data)
    print(a)

Is there anyway to flatten "digitalAssetFoodservice" array into columns.

Here is the output that i am getting Current ouput

And Also what if i have more than one nested array fields

6
  • What do you intend to do, what is not working? What about the output of a=json_normalize(data["digitalAssetFoodservice"]) ? Commented Sep 19, 2019 at 15:21
  • can we normalize and output of this field + the other scalar fields, because i want the whole json to be normalized Commented Sep 19, 2019 at 15:36
  • Can you add an example of more than one nested array fields, because it wouldn't be proper to append them to a table if they are not matching the same index (from what I understand you mean). Commented Sep 19, 2019 at 17:04
  • @CeliusStingher Edited my JSON : included "counties" filed Commented Sep 19, 2019 at 17:20
  • However, what is your expected output? Everyting duplicated, once for name: x and one for name: y ? Commented Sep 19, 2019 at 17:48

1 Answer 1

1

I believe that you need to pass the key of the nested array together with the non-nested ones.

a=json_normalize(data,'digitalAssetFoodservice',['id','extendedDescriptionFrench','brand','productLife'])

print(a)
print(a.columns)

Out:

  digitalAssetFormatFoodservice     digitalAssetGDTIFoodservice digitalAssetImageVersionDateTimeFoodservice  ...      extendedDescriptionFrench            brand productLife
0                           JPG  754000000016500000000002167445               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90
1                           JPG  754000000016500000000002167597               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90
2                           JPG  754000000016500000000002167687               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90

[3 rows x 9 columns]
Index(['digitalAssetFormatFoodservice', 'digitalAssetGDTIFoodservice',
       'digitalAssetImageVersionDateTimeFoodservice',
       'digitalAssetStateFoodservice', 'digitalAssetImageTypeFoodservice',
       'id', 'extendedDescriptionFrench', 'brand', 'productLife'],
      dtype='object')

When trying for two cases:

This is what I came up with given I couldn't not find a way to generate in one line your expected output. I also changed the script a bit to make it easier. (Based on: "Question: What is your expected output? Everyting duplicated, once for name: x and one for name: y ? OP: yes , duplicated"

from pandas.io.json import json_normalize
data ={
"id":"001",
"counties" : [ {"name":"y"},{"name":"X"}], 
"eDF" : "Fromage Brick Petit Gaspesien",
"brand" : "PETIT GASPESIEN",
"brandFrench" : "PETIT GASPESIEN",
"productLife" : "90",
"dAF" : [ {
"dAFF" : "JPG",
"dAGDTIF" : "75401652167445",
"dAIVDTF" : "2016-06-28",
"dASF" : "P"
}, {
"dAFF" : "JPG",
"dAGDTIF" : "75401652167597",
"dAITFa" : "M",
"dAIVDTF" : "2016-06-28"
}, {
"dAFF" : "JPG",
"dAGDTIF" : "7540162167687",
"dAITF" : "C",
"dAIVDTF" : "2016-06-28",
"dASF" : "C"
} ]
}

repetitive = ['id','eDF','brand','brandFrench','productLife']
a=json_normalize(data,'counties',repetitive)
b=json_normalize(data,'dAF',repetitive)
c = a.merge(b,how='inner',left_on=repetitive,right_on=repetitive)
print(a)

Output:

 name   id                            eDF            brand      brandFrench productLife
0    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
1    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90

Now the other b:

print(b)

Output:

  dAFF         dAGDTIF     dAIVDTF dASF dAITFa dAITF   id                            eDF            brand      brandFrench productLife
0  JPG  75401652167445  2016-06-28    P    NaN   NaN  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
1  JPG  75401652167597  2016-06-28  NaN      M   NaN  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
2  JPG   7540162167687  2016-06-28    C    NaN     C  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90

Finally c:

print(c)

Output:

  name   id                            eDF            brand      brandFrench productLife dAFF         dAGDTIF     dAIVDTF dASF dAITFa dAITF
0    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167445  2016-06-28    P    NaN   NaN
1    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167597  2016-06-28  NaN      M   NaN
2    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG   7540162167687  2016-06-28    C    NaN     C
3    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167445  2016-06-28    P    NaN   NaN
4    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167597  2016-06-28  NaN      M   NaN
5    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG   7540162167687  2016-06-28    C    NaN     C
Sign up to request clarification or add additional context in comments.

5 Comments

Can i also pass other nested array fields in json_normalize function?
Yes, edit your question and I'll edit my asnwer accordingly.
Okay So for every nested field we have to create another variable and then merge altogether right using inner Right?
I just couldn't find a way to work it in one line, I've been trying to search for some examples and also couldn't make lostCode's answer work. You might try here, maybe you can find something stackoverflow.com/questions/47242845/… However it is not the same, I hope my answer was helpful enough to help you solve your problem.
Yes it was useful , Thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.