1

I'm trying to use random forest with grid search but this error shows up

ValueError: Invalid parameter classifier for estimator Pipeline(steps=[('tfidf_vectorizer', TfidfVectorizer()),
                ('rf_classifier', RandomForestClassifier())]). 
Check the list of available parameters with `estimator.get_params().keys()`.
import numpy as np # linear algebra
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import pipeline,ensemble,preprocessing,feature_extraction,metrics
train=pd.read_json('cleaned_data1')
#split dataset into X , Y
X=train.iloc[:,0]
Y=train.iloc[:,2]

estimators=pipeline.Pipeline([
        ('tfidf_vectorizer', feature_extraction.text.TfidfVectorizer(lowercase=True)),
        ('rf_classifier', ensemble.RandomForestClassifier())
    ])

print(estimators.get_params().keys())

params = {"classifier__max_depth": [3, None],
              "classifier__max_features": [1, 3, 10],
              "classifier__min_samples_split": [1, 3, 10],
              "classifier__min_samples_leaf": [1, 3, 10],
              # "bootstrap": [True, False],
              "classifier__criterion": ["gini", "entropy"]}

X_train,X_test,y_train,y_test=train_test_split(X,Y, test_size=0.2)

rf_classifier=GridSearchCV(estimators,params, cv=10 , n_jobs=-1 ,scoring='accuracy',iid=True)

rf_classifier.fit(X_train,y_train)

y_pred=rf_classifier.predict(X_test)

metrics.confusion_matrix(y_test,y_pred)
print(metrics.accuracy_score(y_test,y_pred))

I've tried to add those params

param_grid = {
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}

but still the same error

2
  • 1
    There is not any classifier in your pipeline - there is an rf_classifier. Commented Feb 24, 2021 at 20:45
  • You can try to train RF with AutoML github.com/mljar/mljar-supervised You can call AutoML with RF only to be optimized: AutoML(algorithms=['Random Forest'], mode='Compete') and just fit AutoML to have hyperparameters search: automl.fit(X, y) Commented Feb 25, 2021 at 10:03

2 Answers 2

0

Please ensure that when you reference something in the pipeline, you use the same naming convention when you are initializing a parameter grid.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV


# Define a pipeline to search for the best combination of PCA truncation
# and classifier regularization.
pca = PCA()
# set the tolerance to a large value to make the example faster
logistic = LogisticRegression(max_iter=10000, tol=0.1)
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

X_digits, y_digits = datasets.load_digits(return_X_y=True)

# Parameters of pipelines can be set using ‘__’ separated parameter names:
param_grid = {
    'pca__n_components': [5, 15, 30, 45, 64],
    'logistic__C': np.logspace(-4, 4, 4),
}
search = GridSearchCV(pipe, param_grid, n_jobs=-1)
search.fit(X_digits, y_digits)
print("Best parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)

In this example, we reference LogisticRegression model as 'logistic'. Also on a side note, please note that for RandomForestClassifiers, a value of min_samples_split = 1 is not possible and will result in an error.

This is from the sklearn documentation

Sign up to request clarification or add additional context in comments.

Comments

0

Where you have called the random forest ensemble 'rf_classifier' within the pipeline, you should rename this to 'classifier' which should solve the issue.

The params look for something named 'classifier' in the pipeline so they can apply themselves however at current there is nothing named this and therefore this error is thrown. If you want (I'm not sure if this will work but worth testing), you could change "classifier__" in the params list to "rf_classifier__" to see if the params will then recognise the passed classifier.

1 Comment

I changed the classifier name to 'classifier' but still I'm having the same problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.