4

Curious edge behavior. In this example, KNN exists gets printed, but Random Forest exists does not.

Discovered it when checking for the presence of a model, where if model: ... was not triggered when the model is a Random Forest.

from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

if KNeighborsClassifier(4):
    print('KNN exists')

if RandomForestClassifier(n_estimators=10, max_depth=4):
    print('Random Forest exists')

Why does this happen?

1
  • weiiiird. This could be a consequence of RandomForestClassifier implementing __len__ Commented Oct 27, 2017 at 21:53

1 Answer 1

5

Aha! It's because Random implements __len__:

In [1]: from sklearn.ensemble import RandomForestClassifier
   ...: from sklearn.neighbors import KNeighborsClassifier
   ...:

In [2]: knn =  KNeighborsClassifier(4)

In [3]: forest = RandomForestClassifier(n_estimators=10, max_depth=4)

In [4]: knn.__bool__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-ef1cfe16be77> in <module>()
----> 1 knn.__bool__

AttributeError: 'KNeighborsClassifier' object has no attribute '__bool__'

In [5]: knn.__len__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-dc98bf8c50e0> in <module>()
----> 1 knn.__len__

AttributeError: 'KNeighborsClassifier' object has no attribute '__len__'

In [6]: forest.__bool__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-fbdd7f01e843> in <module>()
----> 1 forest.__bool__

AttributeError: 'RandomForestClassifier' object has no attribute '__bool__'

In [7]: forest.__len__
Out[7]:
<bound method BaseEnsemble.__len__ of RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=4, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=10, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False)>

In [8]: len(forest)
Out[8]: 0

And, according to the Python Data Model:

object.__bool__(self)

Called to implement truth value testing and the built-in operation bool(); should return False or True. When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__() nor __bool__(), all its instances are considered true.

As one might expect, the len of a RandomForestClassifier is the number of estimators, but only after it is .fit:

In [9]: from sklearn.datasets import make_classification
   ...: X, y = make_classification(n_samples=1000, n_features=4,
   ...:             n_informative=2, n_redundant=0,
   ...:             random_state=0, shuffle=False)
   ...:

In [10]: X.shape
Out[10]: (1000, 4)

In [11]: y.shape
Out[11]: (1000,)

In [12]: forest.fit(X,y)
Out[12]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=4, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=10, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False)

In [13]: len(forest)
Out[13]: 10
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.