Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable_categorical=True results in error code 0xc0000409 #7851

Closed
ernesto-dreier opened this issue Apr 29, 2022 · 2 comments · Fixed by #7853
Closed

enable_categorical=True results in error code 0xc0000409 #7851

ernesto-dreier opened this issue Apr 29, 2022 · 2 comments · Fixed by #7853

Comments

@ernesto-dreier
Copy link

ernesto-dreier commented Apr 29, 2022

Fitting a XGBRegressor with enable_categorical=True on a Windows 10 machine results in an error. See below for a minimal example:

from random import choice
from string import ascii_lowercase

import pandas as pd
from xgboost.sklearn import XGBRegressor


n = 5
n_cat  = 100


X = pd.Series(
    [''.join(choice(ascii_lowercase) for _ in range(3)) for _ in range(n_cat)],
    dtype="category"
)[:n].to_frame()
y = pd.Series(range(n))

model = XGBRegressor(
    enable_categorical=True,
    tree_method="approx",
)

model.fit(X=X, y=y)

Error:
Process finished with exit code -1073740791 (0xC0000409)

conda list:

# Name                    Version                   Build  Channel
ca-certificates           2021.10.8            h5b45459_0    conda-forge
joblib                    1.1.0                    pypi_0    pypi
numpy                     1.22.3                   pypi_0    pypi
openssl                   1.1.1n               h8ffe710_0    conda-forge
pandas                    1.4.2                    pypi_0    pypi
pip                       22.0.4             pyhd8ed1ab_0    conda-forge
python                    3.9.1           h7840368_5_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.1                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.8.0                    pypi_0    pypi
setuptools                62.1.0           py39hcbf5309_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.38.3               h8ffe710_0    conda-forge
threadpoolctl             3.1.0                    pypi_0    pypi
tzdata                    2022a                h191b570_0    conda-forge
ucrt                      10.0.20348.0         h57928b3_0    conda-forge
vc                        14.2                 hb210afc_6    conda-forge
vs2015_runtime            14.29.30037          h902a5da_6    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xgboost                   1.6.0                    pypi_0    pypi

@trivialfis
Copy link
Member

trivialfis commented Apr 29, 2022

Emm .. we calculate the number of categories as the number of discrete values, which couldn't handle the case when the number of categories is greater than the total number of entries. Should find a better way to handle this.

@ernesto-dreier
Copy link
Author

Thanks for the incredibly quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants