0

I have to preprocess a feature which is basically a list of number codes enocoded as a string, and I want to encode it such that the output is an array of frequencies of each of these numbers. The features should also be preprocessed by imputing missing values.

Here is what I did:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
s_data = pd.Series(['123 342 789', '12 34 56', np.nan, '1 2 3 123'])
s_data = str_data.str.split(" ")
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent'))
    ('mlb', MultiLabelBinarizer())
])
encoded_data = pipeline.fit_transform(s_data)
encoded_df = pd.DataFrame(encoded_data, columns=mlb.classes_)

The output that I'm expecting is something like this:

   1  12  123  2  3  34  342  56  789
0  0   0    1  0  0   0    1   0    1
1  0   1    0  0  0   1    0   1    0
2  1   0    1  1  1   0    0   0    0

However the SimpleImputer wouldn't accept the input saying the inut contains lists. When I tried to change th input to a numpy array format, it was reject by MultiLabelBinariZer saying it expects only 2 inputs but 3 were given.

4
  • What is the input that you want to turn it into the expected output? Commented Jul 27, 2024 at 19:37
  • The input has been given in the code. Its: s_data = pd.Series(['123 342 789', '12 34 56', np.nan, '1 2 3 123') Commented Jul 28, 2024 at 13:40
  • 1
    Welcome to SO; please post a minimal reproducible example. Commented Jul 29, 2024 at 18:47
  • The "expects 2 inputs" issue arises from the binarizer not being compatible with a pipeline, because it is designed to be used on targets, not features. And as @desertnaut said, please make sure the code you give actually runs and gives the error you are experiencing (eg str_data is not defined, you need to import numpy, etc). Commented Aug 5, 2024 at 9:56

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.