2,814 questions
-1
votes
0
answers
29
views
Impact of nulls in xgboost [closed]
I have a data set in which state is a one-hot encoded variable. Some states are allowed to use all predictors, some states are not allowed to use certain predictors. If I null out those variables as ...
0
votes
0
answers
50
views
Why does XGBoost training (with DMatrix) write heavily to disk instead of using RAM?
I am training an XGBoost model in Python on a dataset with approximately 20k features and 30M records.
The features are sparse, and I am using xgboost.DMatrix for training.
Problem
During training, ...
2
votes
1
answer
264
views
How to make a python package that can have two different version of a dependency?
It is now not uncommon to have a python package that is distributed in a multitude of different "flavors". This happens often with machine learning packages, e.g. onnxruntime has many "...
0
votes
1
answer
83
views
None default values in XGBoost regressor model [closed]
I am encountering a problem regarding XGBoost regressor. It produces NONE' default values as shown in figure below. What could be the reason behind for getting 'NONE' default values for XSGBoost ...
0
votes
1
answer
62
views
Time series patient visits for XGBoost classifier
I’m developing a tree-based model classifier (XGBoost) using some healthcare (patient visits) data. The data has a time dimension, and I want to observe if there is a longitudinal effect for the ...
0
votes
1
answer
102
views
How to tune the hyperparameter early-stopping of xgboost in mlr3 with auto_tune()?
I want to perform a XGBoost and tune some hyperparameters which are used to preprocess the data. (I reduce the noise of some spectrometry data by applying the Savitzky-Golay filter.) When training the ...
2
votes
1
answer
106
views
Why shap's explainer.model.predict() and model.predict don't match?
I have a machine learning model and I calculated SHAP on it using following code:
import shap
background = shap.kmeans(X_dev, k=100)
explainer = shap.TreeExplainer(model, feature_perturbation="...
0
votes
1
answer
114
views
When train a small XGBoost model on DataBricks, it will crash and show memory issue. But similar table actually works well
I meet a bug which blocks me a few days.
I have a spark dataframe with 66 columns and 100K rows, I want to train a XGBoost model on DataBricks platform but will always crash.
I generated a similar ...
0
votes
1
answer
97
views
Time Series Forecasting Model with XGBoost and Dask Large Datasets Crashing
I'm building a time series forecasting model in Python to predict hourly kWh loads for different customer types at a utility company. The dataset contains ~81 million rows, with hourly load data for ~...
-1
votes
1
answer
59
views
generatePartialDependenceData function returns Error when used for multiclass classification model
I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData() i ...
0
votes
1
answer
176
views
XGBoost does not predict properly on input that's equal to traning data [closed]
Why this quite simple example of XGBoost ML produces all-nulls even on input, that's equivalent to training data? This looks like a trivial case of input which should not require any fine tuning of ML,...
1
vote
0
answers
48
views
Xgboost Signature converting categorical variable to string. Need to keep categorical variables throughout the process
As part of model logging, I observed an issue. Infer Signature is converting categorical variables into object.
I need to log_model and register with variable as categorical, This is causing model ...
0
votes
1
answer
82
views
Process hangs when multiprocessing with XGBoost model batch prediction
Here's a batch prediction case using multiprocessing. Steps:
After with mp.Pool(processes=num_processes) as pool, there's a with Dataset(dataset_code) as data in the main process using websocket to ...
0
votes
0
answers
84
views
XGBoost bst.predict() output not matching with manual calculation from the (text) tree model for binary:logistic case
I am trying to validate the XGBoost output (booster.predict) for logistic regression wrt my understanding of output calculation via the trees built. I see a difference of around -1.58 factor in all my ...
0
votes
0
answers
129
views
Predict day-ahead hourly electricity prices after having trained a model using historical data
I have trained a XGboost model with historical data from 2015 - 2024. I have added some features like weather data, electrcity consumption, generation from different sourses like neuclear, and other ...
32
votes
4
answers
36k
views
'super' object has no attribute '__sklearn_tags__'
I am encountering an AttributeError while fitting an XGBRegressor using RandomizedSearchCV from Scikit-learn. The error message states:
'super' object has no attribute '__sklearn_tags__'.
This occurs ...
0
votes
0
answers
91
views
ImportError from Pandas while running XGBoost model on python
I am trying to run a basic XGBoost model on python (v 3.8.5), however getting an error that I can not resolve. Appreciate your help, thanks!
My code is as below:
import seaborn as sns
import pandas ...
3
votes
1
answer
604
views
XGBoost/ XGBRanker to produce probabilities instead of ranking scores
I have a dataset of the performance of students in exams which looks like:
Class_ID Class_size Student_Number IQ Hours_Studied Score
1 3 3 101 10 ...
0
votes
0
answers
134
views
XGBoost and LGBM models size depends on training data size for a given set of params whereas Catboost doesnt
I am comparing models in a walk forward cross validation setup, under python 3.11. For a given set of hyperparameters, xgboost and LGBM models size (when pickled or saved using the library saving ...
1
vote
1
answer
127
views
Holdout validation set- hyperparameter tuning
I have a large dataset and I have split it in:
training set (80%)
validation set (10%)
test set (10%)
On each set, I performed missing values imputation and feature selection (trained on the ...
0
votes
0
answers
63
views
Cannot import name XGBRegressor from xgboost (unknown location)
xgboost error
unable to import XGBRegressor
I have created an env on vscode to implement an end to end pipeline for a machine learning project. most of my code has been saved in github. I used a ...
0
votes
2
answers
607
views
XGBoost Early Stopping Rounds
my code below keeps blowing up and I can't work out what is going on
import optuna
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import ...
0
votes
1
answer
180
views
How to make shap.plots.scatter with xgboost.DMatrix holding missing data?
I have a dataset with missing data. They are encoded as NaN. This is fine for model fitting with XGBoost. When I want to understand the model, analyzing model importance with SHAP scatter plots, I am ...
0
votes
1
answer
243
views
More efficient way to stream data to AWS Batch Transform Job
I have a sagemaker process for training and running inference on data in sagemaker:
processing job: read input csv files from s3 and clean up the data, output csv files to s3
processing job: read in ...
0
votes
1
answer
270
views
Error when calculating SHAP value in xgboost model - feature names are different?
I have trained an XGBoost model using caret and now, I am calculating the mean SHAP value of each predictor using the package SHAPforxgboost, using the following code:
library(SHAPforxgboost)
...