0

I have the following nested dictionary that contains the information that a person publicly reported to the organization.

`person_json={'basicInformation': {'individualId': 5429958, 
'firstName': 'BARBARA','middleName': 'JEAN','lastName': 'ABADI',
'sanctions': {'permanentBar': 'Y',
'sanctionDetails': [{'category': 'BAR',
 'regulator': 'FINRA',
 'messages': ['FINRA_BAR_ALL_MESSAGE'],
 'detail': [],
 'capacity': ['ALL']}]},
 'otherNames': [],
 'bcScope': 'InActive',
 'iaScope': 'NotInScope',
 'daysInIndustry': 3185},
 'currentEmployments': [],
 'currentIAEmployments': [],
 'previousEmployments': [{'iaOnly': 'N',
 'bdSECNumber': '44376',
 'firmId': 29705,
 'firmName': 'ACGM, INC.',
 'street1': '590 MADISON AVENUE, 41ST FLOOR, SUITE 4103',
 'city': 'NEW YORK',
 'state': 'NY',
 'zipCode': '10022',
 'registrationBeginDate': '6/1/2011',
 'registrationEndDate': '2/1/2017',
 'firmBCScope': 'INACTIVE',
 'firmIAScope': 'NOTINSCOPE'},
 {'iaOnly': 'N',
 'bdSECNumber': '49900',
 'firmId': 42619,
 'firmName': 'ACTIVA CAPITAL MARKETS, INC.',
 'street1': '590 MADISON AVENUE',
 'street2': '41ST FLOOR',
 'city': 'NEW YORK',
 'state': 'NY',
 'zipCode': '10022',
 'registrationBeginDate': '2/1/2008',
 'registrationEndDate': '2/16/2011',
 'firmBCScope': 'INACTIVE',
 'firmIAScope': 'NOTINSCOPE'}],
 'previousIAEmployments': [],
 'disclosureFlag': 'Y',
 'iaDisclosureFlag': 'N',
 'disclosures': [{'eventDate': '4/4/2017',
 'disclosureType': 'Regulatory',
 'disclosureResolution': 'Final',
 'isIapdExcludedCCFlag': 'N',
 'isBcExcludedCCFlag': 'N',
 'bcCtgryType': 11,
 'disclosureDetail': {'DocketNumberFDA': '2015044587502',
 'DocketNumberAAO': '2015044587502',
 'Initiated By': 'FINRA',
 'Allegations': 'Without admitting or denying the findings, Abadi consented to the 
 sanctions and to the entry of findings that she refused to produce documents and information and to appear for on-the-record testimony requested by FINRA in connection with its investigation concerning certain suspicious fund transfers involving her member firm and affiliates thereof indirectly owned and controlled by Abadi.',
'Resolution': 'Acceptance, Waiver & Consent(AWC)',
'SanctionDetails': [{'Sanctions': 'Bar (Permanent)',
  'SanctionDetails': [{'Registration Capacities Affected': 'All Capacities',
    'Duration': 'Indefinite',
    'Start Date': '4/4/2017'}]}]}}],
'examsCount': {'stateExamCount': 1,
'principalExamCount': 1,
'productExamCount': 2},
'stateExamCategory': [{'examCategory': 'Series 63',
'examName': 'Uniform Securities Agent State Law Examination',
'examTakenDate': '3/13/2008',
'examScope': 'BC'}],
'principalExamCategory': [{'examCategory': 'Series 24',
'examName': 'General Securities Principal Examination',
'examTakenDate': '9/26/2008',
'examScope': 'BC'}],
'productExamCategory': [{'examCategory': 'SIE',
'examName': 'Securities Industry Essentials Examination',
'examTakenDate': '2/1/2017',
'examScope': 'BC'},
{'examCategory': 'Series 7',
'examName': 'General Securities Representative Examination',
'examTakenDate': '1/31/2008',
'examScope': 'BC'}],
'registrationCount': {'approvedSRORegistrationCount': 0,
'approvedFinraRegistrationCount': 0,
'approvedStateRegistrationCount': 0},
'registeredStates': [],
'registeredSROs': [],
'brokerDetails': {'hasBCComments': 'N',
'hasIAComments': 'N',
'legacyReportStatusDescription': 'Not Requested'}}`

What I try to do is to tabulate the information in the above nested dictionary. This is the code I used. The table is named 'test1'.

`test1=pds.json_normalize(person_json)`

What I got from 'test1'is the following table. I captured the photo of some part of the table.

enter image description here

As you may see, there are still a list in "previousEmployments", "disclosures", "stateExamCategory", "principalExamCategory", "productExamCategory", and I am trying to flatten these data and joining them together in the table format.

The expected outcome should be a dataframe/a table that contain all information from all keys and all 'sub' keys of person_json.

How could I do this?

Thank you,

1 Answer 1

1

Assuming you ignore how should look like your final table/df, here is a proposition with flatten_json/json_normalize to make a DataFrame where the columns are the outer keys of person_json :

#pip install flatten_json
from flatten_json import flatten
​
tmp = pd.json_normalize(flatten(person_json))
​
df = (
    tmp.set_axis(tmp.columns.str.split("_", n=1, expand=True), axis=1)
            .stack(1).droplevel(0)
)

​ Output :

print(df)
​
                                             basicInformation  ... stateExamCategory
NaN                                                       NaN  ...               NaN
0_bcCtgryType                                             NaN  ...               NaN
0_bdSECNumber                                             NaN  ...               NaN
...                                                       ...  ...               ...
sanctions_sanctionDetails_0_messages_0  FINRA_BAR_ALL_MESSAGE  ...               NaN
sanctions_sanctionDetails_0_regulator                   FINRA  ...               NaN
stateExamCount                                            NaN  ...               NaN

[72 rows x 16 columns]

If you wanna retrieve a specific portion/slice, you can use something like :

query = df.filter(regex=".*ExamCount.*", axis=0).dropna(how="all", axis=1)
# query = df.filter(regex=".*ExamCount.*", axis=0).loc[:, "examsCount"] #variant
​
print(query) 

                    examsCount
principalExamCount        1.00
productExamCount          2.00
stateExamCount            1.00
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. It fixes my issue and the last part I may have to do is to construct the data frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.