I am working on some proof of concepts for ML and want to try an unusual scaling method. I would like to group my data and then "scale" it and apply a binarize to that data. Basically I want all data above the mean to be 1 and below to be 0.
I can get the scale to work but the binarize needs a reshape and I want to do them all in groupby step.
import pandas as pd
from sklearn import preprocessing
df = pd.DataFrame({'group': ['A', 'A', 'A', 'B','B', 'B'],
'column_to_scale': [4, 2, 6, 4, 9, 6]})
print(df)
df['column_to_scale'] = df.groupby("group")['column_to_scale'].transform(lambda x: preprocessing.scale(x))
print(df)
I expect the output to look like:
# inital
group column_to_scale
0 A 4
1 A 2
2 A 6
3 B 4
4 B 9
5 B 6
# preprocessing.scale()
group column_to_scale
0 A 0.000000
1 A -1.224745
2 A 1.224745
3 B -1.135550
4 B 1.297771
5 B -0.162221
# preprocessing.binarize()
group column_to_scale
0 A 0
1 A 0
2 A 1
3 B 0
4 B 1
5 B 0