0

I am working on some proof of concepts for ML and want to try an unusual scaling method. I would like to group my data and then "scale" it and apply a binarize to that data. Basically I want all data above the mean to be 1 and below to be 0.

I can get the scale to work but the binarize needs a reshape and I want to do them all in groupby step.

    import pandas as pd
    from sklearn import preprocessing


    df = pd.DataFrame({'group': ['A', 'A', 'A', 'B','B', 'B'],
                        'column_to_scale': [4, 2, 6, 4, 9, 6]})
    print(df)
    df['column_to_scale'] = df.groupby("group")['column_to_scale'].transform(lambda x: preprocessing.scale(x))
    print(df)

I expect the output to look like:

# inital
  group  column_to_scale
0     A                4
1     A                2
2     A                6
3     B                4
4     B                9
5     B                6

# preprocessing.scale()
  group  column_to_scale
0     A         0.000000
1     A        -1.224745
2     A         1.224745
3     B        -1.135550
4     B         1.297771
5     B        -0.162221

# preprocessing.binarize()
  group  column_to_scale
0     A         0
1     A         0
2     A         1
3     B         0
4     B         1
5     B         0
1
  • you can chain both steps inside a single groupby().transform() using sklearn tools. The tricky part is that preprocessing.binarize expects a 2D array, so you need to reshape the scaled series inside the lambda. But since you’re doing it per group, this works nicely. Commented Jun 16 at 8:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.