1

I have a table like this:

import pandas as pd

df = pd.DataFrame(
        [
            ['john', 'rdgsdr', 2, 'A'],
            ['ann',  'dsdfds', 3, 'A'],
            ['john', 'jkfgdj', 1, 'B'],
            ['bob',  'xcxfcd', 5, 'A'],
            ['john', 'uityuu', 3, 'C'],
            ['ann',  'werwwe', 2, 'C'],
        ],
        columns=['name', 'stuff', 'orders', 'store']
    )

# df
#    name   stuff  orders store
# 0  john  rdgsdr       2     A
# 1   ann  dsdfds       3     A
# 2  john  jkfgdj       1     B
# 3   bob  xcxfcd       5     A
# 4  john  uityuu       3     C
# 5   ann  werwwe       2     C

I need to extract for each name the row with maximum number of orders; and also compute for that name the list of all the stores. Like this:

grouped = df.groupby('name')

for name, group in grouped:
    print('-'*5, name, '-'*5)
    print(group)

# ----- ann -----
#   name   stuff  orders store
# 1  ann  dsdfds       3     A  <- max(orders) for ann
# 5  ann  werwwe       2     C
# ----- bob -----
#   name   stuff  orders store
# 3  bob  xcxfcd       5     A  <- max(orders) for bob
# ----- john -----
#    name   stuff  orders store
# 0  john  rdgsdr       2     A
# 2  john  jkfgdj       1     B
# 4  john  uityuu       3     C  <- max(orders) for john

# ##########################
# This is what I want to get
# ##########################
>>> result
   name   stuff  max orders  all stores
1  ann   dsdfds           3         A,C
3  bob   xcxfcd           5           A
4  john  uityuu           3       A,B,C

I tried this:

result = grouped.agg(
        **{
            # 'stuff': 'stuff',
            'max orders': pd.NamedAgg('orders', max),
            'all stores': pd.NamedAgg('store', lambda s: s.str.join(',')),
        }
    )

But I don't know how to include the 'stuff' column in the result (in my real app I have many such additional columns, maybe dozens). And also, the join gives me lists instead of strings:

>>> result
   name  max orders all stores
0   ann           3     [A, C]
1   bob           5          A
2  john           3  [A, B, C]

4 Answers 4

2

Try with first

out = df.set_index('stuff').groupby('name').agg(stuff = ('orders' , 'idxmax'),
                                          max_orders = ('orders' , 'max'),
                                          all_stores = ('store',','.join))#.reset_index()
Out[200]: 
       stuff  max_orders all_stores
name                               
ann   dsdfds           3        A,C
bob   xcxfcd           5          A
john  uityuu           3      A,B,C
Sign up to request clarification or add additional context in comments.

4 Comments

It says TypeError: Must provide 'func' or tuples of '(column, aggfunc).
Also, I think I need to tell it somehow that I need the stuff value corresponding to max(orders), not just any (or the first) row in the group.
@Amenhotep check the update
As I said, I have dozens of "stuff" columns, that one was just an example.
1

You can do this by combining this answer with a groupby to get the list of stores they have worked at.

# Get stores that each person works at
stores_for_each_name = df.groupby('name')['store'].apply(','.join)

# Get row with largest order value for each name
df = df.sort_values('orders', ascending=False).drop_duplicates('name').rename({'orders': 'max_orders'}, axis=1)

# Replace store column with comma-separated list of stores they have worked at
df = df.drop('store', axis=1)
df = df.join(stores_for_each_name, on='name')

Output:

   name   stuff  max_orders  store
3   bob  xcxfcd           5      A
1   ann  dsdfds           3    A,C
4  john  uityuu           3  A,B,C

2 Comments

Thank you, it worked.
Also, I think your solution is very straight forward and efficient, since you don't recalculate things. Congratulations!
0

If You want to use Pandas, then Try this :

import pandas as pd
import numpy as np

df = pd.DataFrame([
    ['john', 'rdgsdr', 2, 'A'],
    ['ann',  'dsdfds', 3, 'A'],
    ['john', 'jkfgdj', 1, 'B'],
    ['bob',  'xcxfcd', 5, 'A'],
    ['john', 'uityuu', 3, 'C'],
    ['ann',  'werwwe', 2, 'C'],
], columns=['name', 'stuff', 'orders', 'store'])

idx = df.groupby('name')['orders'].idxmax()
df_max = df.loc[idx].reset_index(drop = True)

all_stores = df.groupby('name')['store'].apply(
lambda x: ','.join(np.sort(x.unique()))    
)

df_max['all_stores'] = df_max['name'].map(all_stores)
'''
   name   stuff  orders store all_stores
0   ann  dsdfds       3     A        A,C
1   bob  xcxfcd       5     A          A
2  john  uityuu       3     C      A,B,C
'''

Comments

0

For Huge Datasets, Try this Hybrid version using Numba + Numpy :

import numpy as np
import pandas as pd
from numba import njit
from io import StringIO

df = pd.DataFrame([
    ['john', 'rdgsdr', 2, 'A'],
    ['ann',  'dsdfds', 3, 'A'],
    ['john', 'jkfgdj', 1, 'B'],
    ['bob',  'xcxfcd', 5, 'A'],
    ['john', 'uityuu', 3, 'C'],
    ['ann',  'werwwe', 2, 'C'],
], columns=['name', 'stuff', 'orders', 'store'])
print(df)

# Factorize 'name' for use in the Numba function
df['name_cat'], name_labels = pd.factorize(df['name'])
nameCat = df['name_cat'].to_numpy()
orders = df['orders'].to_numpy()

@njit
def getMaxOrderIdx(nameCat, orders):
    """
    Finds the index of the row with the maximum order for each name category.
    """
    
    seen = {}
    for i in range(len(nameCat)):
        key = nameCat[i]
        if key not in seen or orders[i] > orders[seen[key]]:
            seen[key] = i
    
    return np.array(list(seen.values()))

# Get the indices of the rows with the maximum order for each name
idx_max_order = getMaxOrderIdx(nameCat, orders)

# Select the rows from the original DataFrame using the obtained indices
res = df.iloc[idx_max_order].copy()

# Aggregate all unique stores visited by each name
allStores = df.groupby('name')['store'].agg(lambda x: ','.join(sorted(x.unique())))

# Map the aggregated stores back to the result DataFrame
res['allStores'] = res['name'].map(allStores)

# Reset index and drop temporary/unnecessary columns
res.reset_index(drop=True, inplace=True)

res_final = res.drop(columns=['name_cat', 'store'])

'''
Final Result:
   name   stuff  orders allStores
0  john  uityuu       3     A,B,C
1   ann  dsdfds       3       A,C
2   bob  xcxfcd       5         A
'''

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.