3

I have applied random forest classifier to get the feature that contributed for a specific row in a dateset. However, I get 2 values for the feature, instead of one. I am not quite sure why. Here is my code.

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
from treeinterpreter import treeinterpreter as ti

X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
                                  'Feature 2':X[:,1],
                                  'Feature 3':X[:,2],
                                  'Feature 4':X[:,3],
                                  'Feature 5':X[:,4],
                                  'Feature 6':X[:,5],
                                  'Class':y})


y_train = df['Class']
X_train = df.drop('Class',axis = 1)

rf = RandomForestClassifier(n_estimators=50,
                               random_state=0)

rf.fit(X_train, y_train)

print ("-"*20) 

importances = rf.feature_importances_

indices = X_train.columns

instances = X_train.loc[[60]]

print(rf.predict(instances))

print ("-"*20) 

prediction, biases, contributions = ti.predict(rf, instances)


for i in range(len(instances)):
    print ("Instance", i)
    print ("-"*20) 
    print ("Bias (trainset mean)", biases[i])
    print ("-"*20) 
    print ("Feature contributions:")
    print ("-"*20) 

    for c, feature in sorted(zip(contributions[i], 
                                 indices), 
                             key=lambda x: ~abs(x[0].any())):

        print (feature, np.round(c, 3))

    print ("-"*20) 

This is the output of my code. Can someone explain why the bias and features output 2 values instead of one ?

--------------------
[0]
--------------------
Instance 0
--------------------
Bias (trainset mean) [ 0.49854  0.50146]
--------------------
Feature contributions:
--------------------
Feature 1 [ 0.16 -0.16]
Feature 2 [-0.024  0.024]
Feature 3 [-0.154  0.154]
Feature 4 [ 0.172 -0.172]
Feature 5 [ 0.029 -0.029]
Feature 6 [ 0.019 -0.019]

1 Answer 1

3

You are getting arrays of length 2 for bias and feature contributions for the very simple reason that you have a 2-class classification problem.

As explained clearly in this blog post by the package creators, in the 3-class case of the iris dataset you get arrays of length 3 (i.e. one array element for each class):

from treeinterpreter import treeinterpreter as ti
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
iris = load_iris()

rf = RandomForestClassifier(max_depth = 4)
idx = range(len(iris.target))
np.random.shuffle(idx)

rf.fit(iris.data[idx][:100], iris.target[idx][:100])

prediction, bias, contributions = ti.predict(rf, instance)
print "Prediction", prediction
print "Bias (trainset prior)", bias
print "Feature contributions:"
for c, feature in zip(contributions[0], 
                             iris.feature_names):
    print feature, c

which gives:

Prediction [[ 0. 0.9 0.1]]
Bias (trainset prior) [[ 0.36 0.262 0.378]]
Feature contributions:
sepal length (cm) [-0.1228614 0.07971035 0.04315104]
sepal width (cm) [ 0. -0.01352012 0.01352012]
petal length (cm) [-0.11716058 0.24709886 -0.12993828]
petal width (cm) [-0.11997802 0.32471091 -0.20473289]

The formula

prediction = bias + feature_1_contribution + ... + feature_n_contribution

from TreeInterpreter is applicable for each class, in the case of classification problems; so, for a k-class classification problem, the respective arrays will be of length k (in your example k=2, while for the iris dataset k=3).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.