-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support slicing tree model #6302
Conversation
Currently I treat out of bound error specially and raise |
5fbb2f2
to
5cb2d5e
Compare
Codecov Report
@@ Coverage Diff @@
## master #6302 +/- ##
==========================================
+ Coverage 80.75% 81.32% +0.56%
==========================================
Files 12 12
Lines 3372 3421 +49
==========================================
+ Hits 2723 2782 +59
+ Misses 649 639 -10
Continue to review full report at Codecov.
|
Pasting offline conversion with @hcho3 here. The trees in xgboost can be considered as a 3 dim tensor. First dim is the number of boosting round, second is number of classes, and the last is size of forest. This PR supports only slicing the first dim (number of boosted rounds), it's possible to support slicing other dimensions. but to us that seems to be over engineering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great new feature. Any plans to deprecate ntree_limit
throughout the code base in favour of the new terminology?
There are other language bindings out there. I need to go over them to deprecate the parameter. |
doc/python/model.rst
Outdated
dtrain = xgb.DMatrix(data=X, label=y) | ||
num_parallel_tree = 4 | ||
num_boost_round = 16 | ||
total_trees = num_parallel_tree * num_classes * num_boost_round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not used anywhere in the code snippet.
total_trees = num_parallel_tree * num_classes * num_boost_round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted into a comment.
include/xgboost/c_api.h
Outdated
* \brief Slice a model according to layers. | ||
* | ||
* \param handle Booster to be sliced. | ||
* \param begin_layer start of the slice | ||
* \param end_layer end of the slice | ||
* \param step step size of the slice | ||
* \param out Sliced booster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \brief Slice a model according to layers. | |
* | |
* \param handle Booster to be sliced. | |
* \param begin_layer start of the slice | |
* \param end_layer end of the slice | |
* \param step step size of the slice | |
* \param out Sliced booster. | |
* \brief Slice a model using boosting index. The slice m:n indicates taking all trees | |
* that were fit during the boosting rounds m, (m+1), (m+2), ..., (n-1). | |
* | |
* \param handle Booster to be sliced. | |
* \param begin_layer start of the slice | |
* \param end_layer end of the slice; end_layer=0 is equivalent to | |
* end_layer=num_boost_round | |
* \param step step size of the slice | |
* \param out Sliced booster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
include/xgboost/gbm.h
Outdated
/*! | ||
* \brief Slice the model. | ||
* \param layer_begin Begining of boosted tree layer used for prediction. | ||
* \param layer_end End of booster layer. 0 means do not limit trees. | ||
* \param out Output gradient booster | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*! | |
* \brief Slice the model. | |
* \param layer_begin Begining of boosted tree layer used for prediction. | |
* \param layer_end End of booster layer. 0 means do not limit trees. | |
* \param out Output gradient booster | |
*/ | |
/*! | |
* \brief Slice a model using boosting index. The slice m:n indicates taking all trees | |
* that were fit during the boosting rounds m, (m+1), (m+2), ..., (n-1). | |
* \param layer_begin Begining of boosted tree layer used for prediction. | |
* \param layer_end End of booster layer. 0 means do not limit trees. | |
* \param out Output gradient booster | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
tests/python/test_basic_models.py
Outdated
def test_slice(self): | ||
self.run_slice('gbtree') | ||
self.run_slice('dart') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use @pytest.mark.parameterize
instead?
@pytest.mark.parameterize(booster, ['gbtree', 'dart'])
def test_slice(self, booster):
# Body of test
See examples at Parameterizing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not being compatible with class method:
TypeError: test_slice() missing 1 required positional argument: 'booster'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try @pytest.mark.parameterize('booster', ['gbtree', 'dart'])
(note the quotes around booster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trivialfis Also, TestModels
should not be a subclass of unittest.TestCase
. https://stackoverflow.com/a/35562401. Try making the class a subclass of object
:
class TestModels(object):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, done.
layer_begin, layer_end, step, this->model_, tparam_, layer_trees, | ||
[&](auto const &in_it, auto const &out_it) { | ||
auto new_tree = | ||
std::make_unique<RegTree>(*this->model_.trees.at(in_it)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have assurance that the implicitly generated copy constructor RegTree(const RegTree&)
behaves correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests with prediction.
The parametrization looks quite nice. The benefits are as follows:
I'd like to submit a follow-up PR to introduce more test parametrization where it's appropriate. For example, the following snippet can be made more compact using a test parametrization: xgboost/tests/python/test_with_dask.py Lines 431 to 444 in 29745c6
|
This PR is meant the end the confusion around
best_ntree_limit
and unify model slicing. We have multi-class and random forests, asking users to understand how to setntree_limit
is difficult and error prone.Close #5531 Close #4052.
save_best
option in early stopping.