1

When trying to fit scikit-learn DecisionTreeClassifier on my data, I am observing some weird behavior.

x[54] (a boolan feature) is used to break the 19 samples into 2 and 17 on top left node. Then again, the same feature with exact same condition appears again in its True branch.

This time it again has True and False branches leading to leaf nodes.

I am using gini for deciding the split.

My question is, since we are in True branch, how can same boolean feature generate non-zero entropy or impurity at all? After all the new set can only have 0s for that feature. So there should not be any posibility of split.

What am I missing.

D-Tree issue

4
  • 1
    Does the feature have missing values? I think that's now supported by the model but not by the display? Commented May 2 at 2:58
  • 1
    maybe you should show code and data which gives this result. Commented May 2 at 9:41
  • 1
    maybe you could ask on similar portals DataScience, CrossValidated, Artificial Intelligence or forum Kaggle - they may have better experience with ML Commented May 2 at 9:42
  • about reproducible setup, its reasonably big corporate data, can't expose directly, let me try to trim it down without loosing the bug, guess its easy said then done Commented May 2 at 12:40

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.