XgBoost Script is not outputing binary properly

Question

I'm learning to use xgboost, and I have read through the documentation! However, I'm not understanding why the output of my script is coming out between 0~~2. First, I thought it should come as either 0 or 1, since its a binary classification, but then, I read it comes as a probability of 0 or 1, however, some outputs are 1.5+ ( at least on the CSV ), which doesnt make sense to me!

I'm unsure if the problem is on xgboost parameters or in the csv creation! This line, np.expm1(preds) , im not sure it should be np.expm1, but I dont know for what I could change it!

In conclusion, my question is :

Why the output is not 0 or 1, and instead comes as 0.0xxx and 1.xxx ?

Here is my script:

import numpy as np
import xgboost as xgb
import pandas as pd

train = pd.read_csv('../dataset/train.csv')
train = train.drop('ID', axis=1)

y = train['TARGET']

train = train.drop('TARGET', axis=1)
x = train

dtrain = xgb.DMatrix(x.as_matrix(), label=y.tolist())

test = pd.read_csv('../dataset/test.csv')

test = test.drop('ID', axis=1)
dtest = xgb.DMatrix(test.as_matrix())


# XGBoost params:
def get_params():
    #
    params = {}
    params["objective"] = "binary:logistic"
    params["booster"] = "gbtree"
    params["eval_metric"] = "auc"
    params["eta"] = 0.3  #
    params["subsample"] = 0.50
    params["colsample_bytree"] = 1.0
    params["max_depth"] = 20
    params["nthread"] = 4
    plst = list(params.items())
    #
    return plst


bst = xgb.train(get_params(), dtrain, 1000)

preds = bst.predict(dtest)

print np.max(preds)
print np.min(preds)
print np.average(preds)

# Make Submission
test_aux = pd.read_csv('../dataset/test.csv')
result = pd.DataFrame({"Id": test_aux["ID"], 'TARGET': np.expm1(preds)})

result.to_csv("xgboost_submission.csv", index=False)

Rogerio Campos · Accepted Answer · 2018-11-08 13:04:00Z

1

You just need to do that:

from xgboost import XGBClassifier

Call predict and the output will be 0 or 1, if you call predict_proba the output will be probabilities of the classes.

Sorry for my english.

answered Nov 8, 2018 at 13:04

Rogerio Campos

392 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Helen Over a year ago

Thank you! This didn't exactly answer the original question, but it solved a related problem of mine.

David Gasquez · Accepted Answer · 2016-03-06 13:03:24Z

0

When you run a xgb model with objective binary:logistic you get arrays of probabilities for each sample. Those probabilities are the chance of the sample to belong at class i.

Let's say you have 3 classes [A, B, C]. An output for the sample y like [0.2, 0.6, 0.4] indicates that this sample will probabliy belong to class B.

If you want just the more probable class, take the index of the maximum element in such probability array, for example using numpy function argmax.

You can find more info at the xgb package parameter's documentation.

edited Mar 6, 2016 at 13:03

answered Mar 6, 2016 at 12:56

David Gasquez

3276 silver badges16 bronze badges

7 Comments

KenobiBastila Over a year ago

Like this? result = pd.DataFrame({"Id": test_aux["ID"], 'TARGET': np.argmax(preds)})

David Gasquez Over a year ago

Notice np.argmax can take an axis argument. If you wan't the label prediction, try with np.argmax(preds, axis=1) .

KenobiBastila Over a year ago

It didnt work, axis=1 because there is just 1 axis, and when its axis=0, it just fills everything with 66390

David Gasquez Over a year ago

What is the shape of your preds variable?

David Gasquez Over a year ago

Can you give us a sample of your preds array?

|

Collectives™ on Stack Overflow

XgBoost Script is not outputing binary properly

2 Answers 2

1 Comment

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related