I have a dataframe with 7 columns, as follows:
Bank_Acct Firstname | Bank_Acct Lastname | Bank_AcctNumber | Firstname | Lastname | ID | Date1 | Date2
B1 | Last1 | 123 | ABC | EFG | 12 | Somedate | Somedate
B2 | Last2 | 245 | ABC | EFG | 12 | Somedate | Somedate
B1 | Last1 | 123 | DEF | EFG | 12 | Somedate | Somedate
B3 | Last3 | 356 | ABC | GHI | 13 | Somedate | Somedate
B4 | Last4 | 478 | XYZ | FHJ | 13 | Somedate | Somedate
B5 | Last5 | 599 | XYZ | DFI | 13 | Somedate | Somedate
I want to create a dictionary with:
{ID1: (Count of Bank_Acct Firstname, Count of distinct Bank_Acct Lastname,
{Bank_AcctNumber1 : ItsCount, Bank_AcctNumber2 : ItsCount},
Count of distinct Firstname, Count of distinct Lastname),
ID2: (...), }
For the above example:
{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}
Below is the code for that:
cols = ['Bank First Name', 'Bank Last Name' 'Bank AcctNumber', 'First Name', 'Last Name']
df1 = df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
d = df1.to_dict()
But the above code only gives the output as:
{12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)}
giving count of distinct bank acctnumber instead of the inner dictionary.
How to get the required dictionary instead? Thanks!!