I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({
'stuff_1_var_1': range(5),
'stuff_1_var_2': range(2, 7),
'stuff_2_var_1': range(3, 8),
'stuff_2_var_2': range(5, 10)
})
stuff_1_var_1 stuff_1_var_2 stuff_2_var_1 stuff_2_var_2
0 0 2 3 5
1 1 3 4 6
I would like to groupby based on the column headers and then add the mean and median of each group as new columns. So my expected output looks like this:
stuff_1_var_mean stuff_1_var_median stuff_2_var_mean stuff_2_var_median
0 1 1 4 4
1 2 2 5 5
Brief explanation:
we have two groups stuff_1_var_ and stuff_2_var_ for which would calculate the mean and median per row. So, e.g. for stuff_1_var_ it would be:
# values from stuff_1_var_1 and stuff_1_var_2
(0 + 2) / 2 = 1 and
( 1 + 3) / 2 = 2
The values are then added as a new column stuff_1_var_mean; analogue for meadian and stuff_2_var_.
I got until:
df = df.T
pattern = df.index.str.extract('(^stuff_\d_var_)', expand=False)
dfgb = df.groupby(pattern).agg(['mean', 'median']).T
stuff_1_var_ stuff_2_var_
0 mean 1 4
median 1 4
1 mean 2 5
median 2 5
How can I do the final step(s)?