-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical data support. #6503
Comments
As of posting, I have prototypes on new GPU evaluation function and refactored CPU hist/approx/local tree methods. Dart support is on the way. |
Great idea! Will this be compatible with
|
Yes, since we only care whether two features appear together or not.
No, since categorical values cannot be sorted in increasing order.
Probably, given that LightGBM also supports computing SHAP with categorical features. But we need to test it. |
Status update: Here are the few big items remaining for feature completeness:
|
Closing, initial support is completed. We will continue to add optimization and new features in the future. |
We started initial experimental support for categorical in xgboost 1.3. The initial target is to make the one-hot encoding based tree split available for all xgboost components. Here is a list of to-do items:
One Hot
Recognize categorical data in DMatrix.
Remove min values.
Device DMatrix.
GPU Sketching for categorical data.
GPU Evaluate splits based on feature types.
Tree model (RegTree).
GPU Hist Updater.
Prediction.
Predictor
fordart
. #6693).Model IO.
CPU Sketching.
CPU Hist.
Migrate the implementation of global approx to hist's codebase.
GHistIndexMatrix
used byhist
. (MoveGHistIndex
intoDMatrix
. #7064, Initial support for external memory in gradient index. #7183) .Global approx (Rewrite approx #7214)
Local approx.
Unsupported.
Sketch maker.
Exact
Unsupported.
Sklearn interfaces (Add
enable_categorical
to sklearn. #7011).Dask
Documents
Partitioning Based (LGB)
General
GPU Hist
Approx
CPU Hist
Documents
Feature items can only be marked as completed if there's a corresponding (unit)test. Please let me know if there are missing items or if you want to help to accelerate the progress. ;-)
The text was updated successfully, but these errors were encountered: