Preprocessing Data with Scale and then Binarize in Python

I am working on some proof of concepts for ML and want to try an unusual scaling method. I would like to group my data and then "scale" it and apply a binarize to that data. Basically I want all data above the mean to be 1 and below to be 0.

I can get the scale to work but the binarize needs a reshape and I want to do them all in groupby step.

    import pandas as pd
    from sklearn import preprocessing


    df = pd.DataFrame({'group': ['A', 'A', 'A', 'B','B', 'B'],
                        'column_to_scale': [4, 2, 6, 4, 9, 6]})
    print(df)
    df['column_to_scale'] = df.groupby("group")['column_to_scale'].transform(lambda x: preprocessing.scale(x))
    print(df)

I expect the output to look like:

# inital
  group  column_to_scale
0     A                4
1     A                2
2     A                6
3     B                4
4     B                9
5     B                6

# preprocessing.scale()
  group  column_to_scale
0     A         0.000000
1     A        -1.224745
2     A         1.224745
3     B        -1.135550
4     B         1.297771
5     B        -0.162221

# preprocessing.binarize()
  group  column_to_scale
0     A         0
1     A         0
2     A         1
3     B         0
4     B         1
5     B         0

asked Jun 16 at 6:46

Tim Romero

112 bronze badges

you can chain both steps inside a single groupby().transform() using sklearn tools. The tricky part is that preprocessing.binarize expects a 2D array, so you need to reshape the scaled series inside the lambda. But since you’re doing it per group, this works nicely.

Omprakash S
– Omprakash S

2025-06-16 08:07:11 +00:00
Commented Jun 16 at 8:07

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Preprocessing Data with Scale and then Binarize in Python

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest