Cant fix ValueError: Invalid parameter criterion for estimator for MultiOutputClassifier and GridSearchCV

Question

I want to write a code for MultiOutputClassifier in Python using scikit learn. I have text values so I used CountVectorizer(), and I want to find the best parameters for my model so I used GridSearchCV and model.best_params_. Best parameter for decision tree and for MultiOutputClassifier.

I get the error and I do not know how to fix it, I looked everywhere:

ValueError: Invalid parameter criterion for estimator MultiOutputClassifier(estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'),
           n_jobs=None). Check the list of available parameters with `estimator.get_params().keys()`.

How can I fix this error? This is the full code:

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.feature_extraction.text import CountVectorizer

from sklearn import tree
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score

df = pd.DataFrame({"first":["yes", "no", "yes", "yes", "no"],
                  "second":["yes", "no", "no", "yes", "yes"],
                  "third":["true","true", "false", "true", "false"]})

#print(df)

features = df.iloc[:,-1]
results = df.iloc[:,:-1]

cv = CountVectorizer()  
features = cv.fit_transform(features)

features_train, features_test, result_train, result_test = train_test_split(features, results, test_size = 0.3, random_state = 42)

tuned_tree = {'criterion':['entropy','gini'], 'random_state':[1,2,3,4,5,6,7,8,9,10,11,12,13]}

cls = GridSearchCV(MultiOutputClassifier(tree.DecisionTreeClassifier()), tuned_tree)
model = cls.fit(features_train, result_train)

acc_prediction  = model.predict(features_test)
accuracy_test = accuracy_score(result_test, acc_prediction)

print(accuracy_test, model.best_params_)

Venkatachalam · Accepted Answer · 2021-07-21 06:38:08Z

1

You need to set the parameter of MultiOutputClassifier using estimator__ prefix.

Try this

{'estimator__criterion':['entropy','gini']}

Note: You should not be tuning the random_state for any reason. Just you that for reproducibility.

You need to binarize the labels (target variable) for computing metrics in multi-label setting.

For multi-label format, stratified train- test splitting is not defined in sklearn. Hence, you have to do random splitting of train-test and then apply binarization.

In sklearn, lot of metrics available for multi-label task, check this.

import pandas as pd  

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.feature_extraction.text import CountVectorizer

from sklearn import tree
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score, f1_score
from sklearn import preprocessing


df = pd.DataFrame({"first":["yes", "no", "yes", "yes", "no"],
                  "second":["yes", "no", "no", "yes", "yes"],
                  "third":["true","true", "false", "true", "false"]})

train, test = train_test_split(
    df, test_size = 0.3, random_state = 42)

# vectorization
cv = CountVectorizer()  
# always fit the vectorizer on the train data alone
# fitting on complete data leads to data leakage

features_train_vect = cv.fit_transform(train.iloc[:,-1])

# label binarization
mlb = preprocessing.MultiLabelBinarizer()
result_train = mlb.fit_transform(train.iloc[:,:-1].values) 

# applying the transform in test data
result_test = mlb.transform(test.iloc[:,:-1].values)
features_test_vect = cv.transform(test.iloc[:,-1])


params_range = {'estimator__criterion':['entropy','gini']}


cls = GridSearchCV(MultiOutputClassifier(tree.DecisionTreeClassifier(random_state=1),),
                   params_range, cv=3)
model = cls.fit(features_train_vect, result_train)

f1_score(cls.predict(features_test_vect), result_test, average='weighted')
# 0.6666666666666666

edited Jul 21, 2021 at 6:38

answered Jul 22, 2019 at 5:53

Venkatachalam

17k10 gold badges52 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

taga Over a year ago

Ok, but do I need to do preprocessing.MultiLabelBinarizer()? I just wanted to find the best parameters for my decision tree. I have used GridSearchCV(tree.DecisionTreeClassifier(), tuned_parameters) when my code is without MultiOutputClassifier

Venkatachalam Over a year ago

MultiLabelBinarizer is required for when with multi-label problem. GridSearchCV can work without MutliLabelBinarizer for one target variable only.

taga Over a year ago

@ai_learning ok but I want to get best parameters for my decision tree model and for my mulitioutput classifier, i want to use ` model.best_params_, thats why I used tuned_tree `, I see that you put parameters for decision tree by yourself

Venkatachalam Over a year ago

The code that I suggested tunes parameters of decision tree only. There is no parameter for multioutputclassifier, it's just a extension of estimate for multiple target vqriables.

Venkatachalam Over a year ago

I have just changed your varaible name from tuned_tree to params_range for better understanding.

lte__ · Accepted Answer · 2019-07-22 09:31:52Z

0

You're passing the DecisionTreeClassifier() constructor function to the MultiOutputClassifier. Try instantiating a decision tree estimator object and passing that to the function:

dtc = tree.DecisionTreeClassifier()
cls = GridSearchCV(MultiOutputClassifier(dtc), tuned_tree)

answered Jul 22, 2019 at 9:31

lte__

7,60130 gold badges87 silver badges143 bronze badges

1 Comment

taga Over a year ago

still the same error. Also, i do not think that this way will give me the parameters for decision tree

Fenil · Accepted Answer · 2019-07-22 13:00:41Z

0

The dictionary passed should be like

tuned_tree = {'estimator__criterion':['entropy','gini'], 'estimator__random_state':[1,2,3,4,5,6,7,8,9,10,11,12,13]}

The estimator__ prefix is required for all the parameters

answered Jul 22, 2019 at 13:00

Fenil

3961 gold badge5 silver badges16 bronze badges

Collectives™ on Stack Overflow

Cant fix ValueError: Invalid parameter criterion for estimator for MultiOutputClassifier and GridSearchCV

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related