-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Avoid unnecessary split for degenerate case where all labels are identical #3243
[REVIEW] Avoid unnecessary split for degenerate case where all labels are identical #3243
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Example from #3188 (comment)Before (max_depth=3): [
{
"nodeid": 0,
"split_feature": 1,
"split_threshold": 2.28741693,
"gain": 0.540000081,
"yes": 1,
"no": 2,
"children": [
{
"nodeid": 1,
"split_feature": 0,
"split_threshold": -2.97261524,
"gain": 0.25,
"yes": 3,
"no": 4,
"children": [
{
"nodeid": 3,
"split_feature": 1,
"split_threshold": -8.30485821,
"gain": 2.38418579e-07,
"yes": 5,
"no": 6,
"children": [
{
"nodeid": 5,
"leaf_value": 2
},
{
"nodeid": 6,
"leaf_value": 2
}
]
},
{
"nodeid": 4,
"split_feature": 0,
"split_threshold": 2.9149611,
"gain": 5.96046448e-08,
"yes": 7,
"no": 8,
"children": [
{
"nodeid": 7,
"leaf_value": 1
},
{
"nodeid": 8,
"leaf_value": 1
}
]
}
]
},
{
"nodeid": 2,
"leaf_value": 0
}
]
}
] After this PR (max_depth=3): [
{
"nodeid": 0,
"split_feature": 1,
"split_threshold": 2.28741693,
"gain": 0.539999962,
"yes": 1,
"no": 2,
"children": [
{
"nodeid": 1,
"split_feature": 0,
"split_threshold": -2.97261524,
"gain": 0.25,
"yes": 3,
"no": 4,
"children": [
{
"nodeid": 3,
"leaf_value": 2
},
{
"nodeid": 4,
"leaf_value": 1
}
]
},
{
"nodeid": 2,
"leaf_value": 0
}
]
}
] Example from #3188 (comment)Before: [
{
"nodeid": 0,
"split_feature": 1,
"split_threshold": 0,
"gain": 2.98023224e-08,
"yes": 1,
"no": 2,
"children": [
{
"nodeid": 1,
"leaf_value": 1
},
{
"nodeid": 2,
"leaf_value": 1
}
]
}
] After: [
{
"nodeid": 0,
"leaf_value": 1
}
] Example from #3128
Before:
After:
|
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #3243 +/- ##
===============================================
+ Coverage 71.48% 71.49% +0.01%
===============================================
Files 206 206
Lines 16648 16648
===============================================
+ Hits 11900 11902 +2
+ Misses 4748 4746 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks isolated enough to make the release. I found it hard to follow what spred vs. spredP vs. spred2 vs pred are all supposed to store and think that a comment describing the variables (and more descriptive names) would be very helpful - however, I think the renaming should wait until 0.18.
gs.sync(); | ||
// now, compute the mean value to be used for metric update | ||
for (IdxT i = threadIdx.x; i < nbins; i += blockDim.x) { | ||
scount[i] = count[gcOffset + i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next revision, I would love to have these variable names be a bit more explicit. They all look similar and it was a bit hard to parse the formulas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I had to read the code very carefully to deduce their meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with John. When this was written initially, readability was not kept as a priority item! @hcho3 can you please file an issue so that we don't forget about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hcho3 for the fix! Change LGTM.
Closes #3231
Closes #3128
Partially addresses #3188
The degenerate case (labels all identical in a node) is now robustly handled, by computing the MSE metric separately for each of the three nodes (the parent node, the left child node, and the right child node). Doing so ensures that the gain is 0 for the degenerate case.
The degenerate case may occur in some real-world regression problems, e.g. house price data where the price label is rounded up to nearest 100k.
As a result, the MSE gain is computed very similarly as the MAE gain.
Disadvantage: now we always make two passes over data to compute the gain.
cc @teju85 @vinaydes @JohnZed