Create a dictionary by grouping by values from a dataframe column in python

Question

I have a dataframe with 7 columns, as follows:

  Bank_Acct Firstname | Bank_Acct Lastname | Bank_AcctNumber   | Firstname | Lastname | ID | Date1    | Date2
    B1                  | Last1              | 123               | ABC       | EFG      | 12 | Somedate | Somedate
    B2                  | Last2              | 245               | ABC       | EFG      | 12 | Somedate | Somedate
    B1                  | Last1              | 123               | DEF       | EFG      | 12 | Somedate | Somedate
    B3                  | Last3              | 356               | ABC       | GHI      | 13 | Somedate | Somedate
    B4                  | Last4              | 478               | XYZ       | FHJ      | 13 | Somedate | Somedate
    B5                  | Last5              | 599               | XYZ       | DFI      | 13 | Somedate | Somedate

I want to create a dictionary with:

 {ID1: (Count of Bank_Acct Firstname, Count of distinct Bank_Acct Lastname, 
        {Bank_AcctNumber1 : ItsCount, Bank_AcctNumber2 : ItsCount}, 
         Count of distinct Firstname, Count of distinct Lastname), 
  ID2: (...), }

For the above example:

{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}

Below is the code for that:

cols = ['Bank First Name', 'Bank Last Name' 'Bank AcctNumber', 'First Name', 'Last Name']
    df1 = df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
    d = df1.to_dict()

But the above code only gives the output as:

 {12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)}

giving count of distinct bank acctnumber instead of the inner dictionary.

How to get the required dictionary instead? Thanks!!

Zero · Accepted Answer · 2017-08-14 18:44:51Z

2

You could define your columns and functions in a list

In [15]: cols = [
     ...:     {'col': 'Bank_Acct Firstname', 'func': pd.Series.nunique},
     ...:     {'col': 'Bank_Acct Lastname', 'func': pd.Series.nunique},
     ...:     {'col': 'Bank_AcctNumber', 'func': lambda x: x.value_counts().to_dict()},
     ...:     {'col': 'Firstname', 'func': pd.Series.nunique},
     ...:     {'col': 'Lastname', 'func': pd.Series.nunique}
     ...:     ]

In [16]: df.groupby('ID').apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
Out[16]:
ID
12            (2, 2, {123: 2, 245: 1}, 2, 1)
13    (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)
dtype: object

In [17]: (df.groupby('ID')
            .apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
            .to_dict())
Out[17]:
{12: (2, 2, {123: 2, 245: 1}, 2, 1),
 13: (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}

edited Aug 14, 2017 at 18:44

answered Aug 14, 2017 at 18:39

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

akrama81 Over a year ago

This works but is extremely slow. Any way of making this faster? I have a huge dataframe.

Collectives™ on Stack Overflow

Create a dictionary by grouping by values from a dataframe column in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related