1

I have a dataframe with 7 columns, as follows:

  Bank_Acct Firstname | Bank_Acct Lastname | Bank_AcctNumber   | Firstname | Lastname | ID | Date1    | Date2
    B1                  | Last1              | 123               | ABC       | EFG      | 12 | Somedate | Somedate
    B2                  | Last2              | 245               | ABC       | EFG      | 12 | Somedate | Somedate
    B1                  | Last1              | 123               | DEF       | EFG      | 12 | Somedate | Somedate
    B3                  | Last3              | 356               | ABC       | GHI      | 13 | Somedate | Somedate
    B4                  | Last4              | 478               | XYZ       | FHJ      | 13 | Somedate | Somedate
    B5                  | Last5              | 599               | XYZ       | DFI      | 13 | Somedate | Somedate

I want to create a dictionary with:

 {ID1: (Count of Bank_Acct Firstname, Count of distinct Bank_Acct Lastname, 
        {Bank_AcctNumber1 : ItsCount, Bank_AcctNumber2 : ItsCount}, 
         Count of distinct Firstname, Count of distinct Lastname), 
  ID2: (...), }

For the above example:

{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}

Below is the code for that:

cols = ['Bank First Name', 'Bank Last Name' 'Bank AcctNumber', 'First Name', 'Last Name']
    df1 = df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
    d = df1.to_dict()

But the above code only gives the output as:

 {12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)}

giving count of distinct bank acctnumber instead of the inner dictionary.

How to get the required dictionary instead? Thanks!!

1 Answer 1

2

You could define your columns and functions in a list

In [15]: cols = [
     ...:     {'col': 'Bank_Acct Firstname', 'func': pd.Series.nunique},
     ...:     {'col': 'Bank_Acct Lastname', 'func': pd.Series.nunique},
     ...:     {'col': 'Bank_AcctNumber', 'func': lambda x: x.value_counts().to_dict()},
     ...:     {'col': 'Firstname', 'func': pd.Series.nunique},
     ...:     {'col': 'Lastname', 'func': pd.Series.nunique}
     ...:     ]

In [16]: df.groupby('ID').apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
Out[16]:
ID
12            (2, 2, {123: 2, 245: 1}, 2, 1)
13    (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)
dtype: object

In [17]: (df.groupby('ID')
            .apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
            .to_dict())
Out[17]:
{12: (2, 2, {123: 2, 245: 1}, 2, 1),
 13: (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}
Sign up to request clarification or add additional context in comments.

1 Comment

This works but is extremely slow. Any way of making this faster? I have a huge dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.