Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical split support for SHAP #7053

Merged
merged 1 commit into from
Jun 25, 2021
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis mentioned this pull request Jun 23, 2021
67 tasks
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
@trivialfis
Copy link
Member Author

Please do not merge this one until I can replace the submodule back to rapidsai.

src/predictor/gpu_predictor.cu Outdated Show resolved Hide resolved
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
@trivialfis trivialfis changed the title [WIP] Categorical split support for SHAP Categorical split support for SHAP Jun 25, 2021
@trivialfis trivialfis merged commit 8fa32fd into dmlc:master Jun 25, 2021
@trivialfis trivialfis deleted the cat-shap branch June 25, 2021 11:02
@talieh-tabatabaei
Copy link

talieh-tabatabaei commented Nov 24, 2021

How Shap values for each one-hot-encoded category are aggregated to generate the final value for the categorical variables?

@oOTWK
Copy link
Contributor

oOTWK commented Feb 11, 2022

@talieh-tabatabaei When enable_categorical=True is enabled, it does not compute Shap values for each one-hot-encoded variable. To enable this for Shap computation, the tree model should be trained with this enabled. In this case, there are no one-hot-encoded variables anymore, it's just one categorical variable itself. So decision variable in the tree model is the categorical variable.

For example, let's say we have Fruit category and the values are Apple, Orange, and Melon. A node of tree could have Fruit==Orange. In one-hot-encoded version, it is equivalent to Is_Orange==1 but in this case, Is_Orange, Is_Apple, and Is_Melon are different variables. In TreeShap, these variables do not invoke Unwind to each other. On the other hand, if we just use the categorical variable Fruit, Fruit==Orange and Fruit==Apple and Fruit==Melon all use the same variable Fruit. This does invoke Unwind if there is more than one Fruit variable appearing on the current path. As a result, it will only compute Shap value for Fruit variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants