xgboost.dask.DaskXGBClassifier implementation of .predict() does not adhere to the sklearn API specification #5985

jameskrach · 2020-08-06T01:49:56Z

The .predict() method of xgboost.dask.DaskXGBClassifier currently returns probabilities. Per the specification, the .predict() method is supposed to return class labels. This is also inconsistent with the behavior of the .predict() method of xgboost.XGBClassifier, which properly returns class labels.

Other functionality in dask (specifically in dask_ml.model_selection) depend on the behavior being correct.

Example of correct behavior in xgboost.XGBClassifier:

import xgboost as xgb
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_informative=5, n_classes=2, random_state=1234)
clf = xgb.XGBClassifier(objective="binary:logistic")
clf.fit(X, y)
print(clf.predict(X)[:5])
# [0 0 1 1 1]

Example of incorrect behavior in xgboost.dask.DaskXGBClassifier:

import xgboost as xgb
import dask.dataframe as dd
import dask.distributed
from sklearn.datasets import make_classification


cluster = dask.distributed.LocalCluster(n_workers=2, threads_per_worker=1)
client = dask.distributed.Client(cluster)
X, y = make_classification(n_samples=1000, n_informative=5, n_classes=2, random_state=1234)
X_ = dd.from_array(X, chunksize=500)
y_ = dd.from_array(y, chunksize=500)
clf = xgb.dask.DaskXGBClassifier(objective="binary:logistic")
clf.fit(X_, y_)
print(clf.predict(X_).compute()[:5])
# [0.03111755 0.00773133 0.99876463 0.99792993 0.9944484 ]

The text was updated successfully, but these errors were encountered:

jameskrach mentioned this issue Aug 6, 2020

[Breaking] Fix .predict() method and add .predict_proba() in xgboost.dask.DaskXGBClassifier #5986

Merged

trivialfis closed this as completed in #5986 Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost.dask.DaskXGBClassifier implementation of .predict() does not adhere to the sklearn API specification #5985

xgboost.dask.DaskXGBClassifier implementation of .predict() does not adhere to the sklearn API specification #5985

jameskrach commented Aug 6, 2020

xgboost.dask.DaskXGBClassifier implementation of .predict() does not adhere to the sklearn API specification #5985

xgboost.dask.DaskXGBClassifier implementation of .predict() does not adhere to the sklearn API specification #5985

Comments

jameskrach commented Aug 6, 2020