5

I've a requirement to convert CSV Data File to Nested JSON in Python for an application. My below current Python code is working fine for 1 Customer/Accounts document but somehow not able to create json dump for all the customers in the CSV file.

I am providing the Python code below which will give you some insight on what I am trying to achieve. Please let me know if any existing resolutions on this.

Sample Python CODE:

import pandas as pd
from itertools import groupby 
from collections import OrderedDict
import json    

df = pd.read_csv('cust.csv', dtype={
        "ClientID" : str,
        "ClientName" : str,
        "AcctID" : str,
        "AcctNbr" : str,
        "AcctTyp" : str
    })

results = []

for (ClientID, ClientName), bag in df.groupby(["ClientID", "ClientName"]):
contents_df = bag.drop(["ClientID", "ClientName"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("ClientID", ClientID),("ClientName", ClientName),("subset", subset)]))

print json.dumps(results[0], indent=4)

with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(results[0], indent=4))

Sample INPUT CSV:

ClientID,ClientName,AcctID,AcctNbr,AcctTyp
----------------------------------------------------------
00001,John George,812001,812001095,DDA
00001,John George,813002,813002096,SAV
00001,John George,814003,814003097,AFS
00024,Richard Polado,512987,512987085,ML
00024,Richard Polado,512983,512983086,IL
00345,John Cruze,1230,123001567,SAV
00345,John Cruze,5145,514502096,CD
00345,John Cruze,7890,7890033527,SGD

Desired Output JSON:

{  
   "clientId":00001,
   "ClientName":"John George",
   "subset":[  
      {  
         "AcctID":812001,
         "AcctNbr":"812001095",
         "AcctTyp":"DDA",
      },
      {  
         "AcctID":813002,
         "AcctNbr":"813002096",
         "AcctTyp":"SAV",
      },
      {  
         "AcctID":814003,
         "AcctNbr":"814003097",
         "AcctTyp":"AFS",
      }
   ]
},
{  
   "clientId":00024,
   "ClientName":"Richard Polado",
   "subset":[  
      {  
         "AcctID":512987,
         "AcctNbr":"512987085",
         "AcctTyp":"ML",
      },
      {  
         "AcctID":512983,
         "AcctNbr":"512983086",
         "AcctTyp":"IL",
      }
   ]
}

and these documents should continue creating for other thousands of clients.

7
  • What do you mean by "somehow not able to" - what problem are you having specifically? What does it not do? Commented Jun 29, 2018 at 6:40
  • It only creates one document for the first ClientID only. { "clientId":00001, "ClientName":"John George", "subset":[ { "AcctID":812001, "AcctNbr":"812001095", "AcctTyp":"DDA", }, { "AcctID":813002, "AcctNbr":"813002096", "AcctTyp":"SAV", }, { "AcctID":814003, "AcctNbr":"814003097", "AcctTyp":"AFS", } ] } Commented Jun 29, 2018 at 6:43
  • Sounds like a debugging problem, then. Write some code to detect if it gets to the file writing part on the second pass of the loop. Does it not get there? Or, maybe your file writing fails because it is using the same filename? Commented Jun 29, 2018 at 6:44
  • I'm not familiar with Python, but don't loops need indentation? You do not seem to have any in this case. Commented Jun 29, 2018 at 6:45
  • @ShivKumar Does my solution work? Commented Jun 29, 2018 at 7:25

2 Answers 2

2

The solution groups by every 'ClientID','ClientName' pair

Your DataFrame

df = pd.DataFrame([['00001','John George','812001','812001095','DDA'],
['00001','John George','813002','813002096','SAV'],
['00001','John George','814003','814003097','AFS'],
['00024','Richard Polado','512987','512987085','ML'],
['00024','Richard Polado','512983','512983086','IL'],
['00345','John Cruze','1230','123001567','SAV'],
['00345','John Cruze','5145','514502096','CD'],
['00345','John Cruze','7890','7890033527','SGD']])

df.columns = ['ClientID','ClientName','AcctID','AcctNbr','AcctTyp'] 

and now

finalList = []
finalDict = {}
grouped = df.groupby(['ClientID', 'ClientName'])
for key, value in grouped:


    dictionary = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['ClientID'] = j.at[0, 'ClientID']
    dictionary['ClientName'] = j.at[0, 'ClientName']


    dictList = []
    anotherDict = {}
    for i in j.index:

        anotherDict['AcctID'] = j.at[i, 'AcctID']
        anotherDict['AcctNbr'] = j.at[i, 'AcctNbr']
        anotherDict['AcctTyp'] = j.at[i, 'AcctTyp']

        dictList.append(anotherDict)

    dictionary['subset'] = dictList


    finalList.append(dictionary)

import json
json.dumps(finalList)

gives:

'[
   {"ClientID": "00001", 
    "ClientName": "John George", 
    "subset": 
            [{"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}]

   }, 

  {
   "ClientID": "00024", 
   "ClientName": "Richard Polado", 
   "subset": 
            [{"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}, 

             {"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}]
   }, 

  {
   "ClientID": "00345", 
   "ClientName": "John Cruze", 
   "subset": 
            [{"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}]
   }

]'

Is this how you seem to want it?

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Abhishek, Just a correction, I looked now with more data and actually its not working as expected ... when more records inner dictionary always brings first value
Hi @imperialgendarme, it works for me... But can you help me I had a bit of doubt? What if I want another nest inside the subset with another name?? How can I insert my info inside the subset? Thanks in advance for your help!!!
Comment by Mubeen Ghafoor: The another_dict = { } should be inside the for loop.
2

Use dictList.append(anotherDict.copy()) otherwise you will get the same dict object in the list.

More details in this question: Create List of Dictionary Python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.