Groupby column name and add results as additional columns

Question

I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({
    'stuff_1_var_1': range(5),
    'stuff_1_var_2': range(2, 7),
    'stuff_2_var_1': range(3, 8),
    'stuff_2_var_2': range(5, 10)
})

   stuff_1_var_1  stuff_1_var_2  stuff_2_var_1  stuff_2_var_2
0              0              2              3              5
1              1              3              4              6

I would like to groupby based on the column headers and then add the mean and median of each group as new columns. So my expected output looks like this:

   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5

Brief explanation: we have two groups stuff_1_var_ and stuff_2_var_ for which would calculate the mean and median per row. So, e.g. for stuff_1_var_ it would be:

# values from stuff_1_var_1 and stuff_1_var_2
(0 + 2) / 2 = 1 and 
( 1 + 3) / 2 = 2

The values are then added as a new column stuff_1_var_mean; analogue for meadian and stuff_2_var_.

I got until:

df = df.T

pattern = df.index.str.extract('(^stuff_\d_var_)', expand=False)

dfgb = df.groupby(pattern).agg(['mean', 'median']).T

          stuff_1_var_  stuff_2_var_
0 mean               1             4
  median             1             4
1 mean               2             5
  median             2             5

How can I do the final step(s)?

how would the output dataframe look like? the mean/median value will be duplicated across rows? — YOLO
– YOLO, Commented Feb 24, 2020 at 13:08
@YOLO: The desired output is included in the question; I added a bit more explanation; hope it is clearer now?! — Cleb
– Cleb, Commented Feb 24, 2020 at 13:44
@JoshFriedlander: I added a bit more explanation; hope it is clearer now?! — Cleb
– Cleb, Commented Feb 24, 2020 at 13:44
It doesn't mean you can't change it before proceeding... It can seem a lot of work at first but it's a really rewarding in the long run. — godot
– godot, Commented Feb 24, 2020 at 14:12

jezrael · Accepted Answer · 2020-02-24 14:07:23Z

1

Your solution should be changed:

df = df.T

pattern = df.index.str.extract('(^stuff_\d_var_)', expand=False)

dfgb = df.groupby(pattern).agg(['mean', 'median']).T.unstack()
dfgb.columns = dfgb.columns.map(lambda x: f'{x[0]}{x[1]}')

print (dfgb)
   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5
2                 3                   3                 6                   6
3                 4                   4                 7                   7
4                 5                   5                 8                   8

Unfortunately for axis=1 is not implemented agg, so possible solution is create mean and median separately and then concat:

dfgb = df.groupby(pattern, axis=1).agg(['mean','median'])

NotImplementedError: axis other than 0 is not supported

pattern = df.columns.str.extract('(^stuff_\d_var_)', expand=False)
g = df.groupby(pattern, axis=1)

dfgb = pd.concat([g.mean().add_suffix('mean'), 
                  g.median().add_suffix('median')], axis=1)
dfgb = dfgb.iloc[:, [0,2,1,3]]
print (dfgb)
   stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median
0                 1                   1                 4                   4
1                 2                   2                 5                   5
2                 3                   3                 6                   6
3                 4                   4                 7                   7
4                 5                   5                 8                   8

edited Feb 24, 2020 at 14:07

answered Feb 24, 2020 at 13:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Cleb Over a year ago

That seems to do the trick, thanks again! Will wait with acceptance for a bit to check whether anything else shows up.

jezrael Over a year ago

@Cleb - Is order of columns important?

jezrael Over a year ago

@Cleb - Is possible first all means and then all median columns?

Cleb Over a year ago

The order of columns is not important, no.

YOLO · Accepted Answer · 2020-02-24 13:51:56Z

1

Here's a way you can do:

col = 'stuff_1_var_'
use_col = [x for x in df.columns if 'stuff_1' in x]

df[f'{col}mean'] = df[use_col].mean(1)
df[f'{col}median'] = df[use_col].median(1)

col2 = 'stuff_2_var_'
use_col = [x for x in df.columns if 'stuff_2' in x]

df[f'{col2}mean'] = df[use_col].mean(1)
df[f'{col2}median'] = df[use_col].median(1)

print(df.iloc[:,-4:]) # showing last four new columns

  stuff_1_var_mean  stuff_1_var_median  stuff_2_var_mean  stuff_2_var_median  
0               1.0                 1.0               4.0                 4.0  
1               2.0                 2.0               5.0                 5.0  
2               3.0                 3.0               6.0                 6.0  
3               4.0                 4.0               7.0                 7.0  
4               5.0                 5.0               8.0                 8.0

Ofcourse, you can put it in a function to avoid repeating the same code.

answered Feb 24, 2020 at 13:51

YOLO

22k5 gold badges25 silver badges42 bronze badges

1 Comment

Cleb Over a year ago

Not as straightforward as @jezrael's answer, but still worth an upvote :)

Collectives™ on Stack Overflow

Groupby column name and add results as additional columns

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related