Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPU categorical split memory allocation. #9529

Merged
merged 1 commit into from
Aug 29, 2023

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Aug 28, 2023

Close #9521 .

  • Remove split_cats from split candidate.
  • Fetch bits directly from the evaluator.

@@ -64,20 +64,13 @@ struct DeviceSplitCandidate {
// split.
bst_cat_t thresh{-1};

common::CatBitField split_cats;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand what this PR does.
Does the reference split_cats get invalidated as more categories are added? Why is it better to get the categories from the evaluator directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's invalidated when more nodes are added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So same reason as why you shouldn't carry around references to vector elements? Got it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was introduced during optimization and refactoring in 1.7. But should be fixed now. I will do some more tests myself. The error was only reproducible using a sanitizer.

@trivialfis trivialfis merged commit 942b957 into dmlc:master Aug 29, 2023
25 checks passed
@trivialfis trivialfis deleted the fix-gpu-partition-cat branch August 29, 2023 02:06
@trivialfis trivialfis mentioned this pull request Aug 29, 2023
5 tasks
trivialfis added a commit to trivialfis/xgboost that referenced this pull request Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 2.0 Done
Development

Successfully merging this pull request may close these issues.

cudaErrorIllegalAddress: an illegal memory access was encountered
2 participants