-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature] Fast Histogram Optimized Grower, 8x to 10x Speedup #1950
Comments
Do you have time to also do a performance test on LightGBM as well to compare with new feature? |
@wxchan I will get to it soon |
@hcho3 ok, thx, look forward to it. |
@hetong007 @slundberg @terrytangyuan @CodingCat @khotilov @phunterlau we could use more, test-cases examples and tutorial on language bindings. |
This is exciting! However I will be traveling in January, thus may not start working on this until Feb. It would be great if you guys @khotilov and @terrytangyuan have time recently. |
Awesome ! Will start working on it in the weekend |
Very cool, @hcho3! I won't be able to get to it soon but will post here when I start the work. |
Brilliant explain! Thanks @hcho3 ! |
@wxchan We've added performance numbers for LightGBM. Parallel performance for our new updater is not so good, so we are working to improve that. |
@wxchan Good point. The data in the table represents first dozen iterations or so. Let us post a graph of cumulative runtime vs (# iterations). |
a reminder, the PR #1940 is now merged |
@wxchan We've added performance results over 300 iterations. |
I ran a quick speed comparison of the old colmaker updater to the new fast_histmaker one using one of my datasets. It was dense data with some missingness, about 100K x 1000 in size, with rare and noisy signal. I ran 1000 simple decision stubs with 0.5 subsampling on a 16-core server. In addition to the speedup in elapsed times factor, the table also shows ratios of user times relative to the single thread user time, as well as ratios of the single thread elapsed time to elapsed times. The significantly worse hist's performance scaling with nthread indicates that its parallel implementation might indeed have some room for improvement.
|
@khotilov Thanks a lot for the detailed report. We are currently working hard to improve parallel implementation. Stay tuned for more updates. |
Does fast histogram really work with Rabit? I didn't find any Rabit operation in the current implementation? |
@CodingCat I double checked, we will need to add the allreduce call to end of BuildHist function. @hcho3 https://github.com/dmlc/xgboost/blob/master/src/common/hist_util.cc |
I wonder if the optimization to the parallel version has been completed? |
@holyglenn I've submitted a patch to improve multithreaded scaling. |
This issue is to document a series of recent improvements in the tree grower. Related: #1673.
Faster training via feature binning
Histogram aggregation is a major computational bottleneck in tree growing. We introduce a new tree growing method
hist
where only a subset of possible split values are considered. As in FastBDT and LightGBM, continuous features are bucketed into discrete bins. The histogram accumulation becomes faster due to less index manipulationsHow does the new method differ from
tree_method=approx
? The existing approximate splitting method in xgboost also buckets continuous features into discrete bins to speed up training. Theapprox
method generates a new set of bins for each iteration, whereas thehist
method re-uses the bins over multiple iterations.The latter approach enables additional optimizations that are not possible for the former, as follows:
Besides the above improvements, some highlights
How to use
Simply set
tree_method
tohist
. You may also want to setmax_bin
, which represents the (maximum) number of discrete bins to bucket continuous features. By default,max_bin
is set to 256. Increasing this number improves the optimality of splits at the cost of higher computation time.Flexible tree growing policies
The existing tree grower in xgboost grows a tree in a depth-wise fashion, executing splits in first level before splits in second and so forth. The new grower lets you control the way new nodes are added to the tree:
grow_policy=depthwise
(default): split at nodes closest to the root, i.e. grow depth-wise.grow_policy=lossguide
: split at nodes with highest loss change. This behavior mimics that of LightGBM.It has been reported that the
lossguide
policy often results in faster convergence in loss, though there is also risk of over-fitting(see the preliminary results).How to use
For now, only the
hist
grower supports multiple growing policies; we may extend the support to other growers in the future. So you should settree_method
tohist
. Thegrow_policy
parameters lets you select the growing policy. If unspecified,grow_policy
will be set todepthwise
by default. Here are two parameters that are relevant:max_leaves
: maximum number of nodes to be added. Only relevant for thelossguide
policy.max_depth
: maximum depth. Behaves as usual whengrow_policy
is set todepthwise
. Ifgrow_policy
islossguide
,max_depth
can be set to zero, indicating the lack of limit on depth.Benchmark
Configurations
tree_method=exact, eta=0.1, gamma=1.0, max_depth=8, min_child_weight=100
tree_method=approx, eta=0.1, gamma=1.0, max_depth=8, min_child_weight=100
tree_method=hist, grow_policy=depthwise, eta=0.1, gamma=1.0, max_depth=8, min_child_weight=100
tree_method=hist, grow_policy=lossguide, eta=0.1, gamma=1.0, max_depth=0, max_leaves=255, min_child_weight=100
learning_rate=0.1, max_bin=255, num_leaves=255, min_sum_hessian_in_leaf=100, min_data_in_leaf=0
Model accuracy
For
allstate
andhiggs
, we plot AUC (Area Under Curve) on the validation set. Foryahoo
, we plot NDCG (Normalized Discounted Cumulative Gain) at 10.Notice that favoring splits with highest change leads to overfitting in the case of
allstate
, which is more like a real world insurance dataset. While the validation training curve is more consistent onhiggs
, which is simulated data points.We tried different options such as
min_child_weight
didn't move the curve much. This seems to suggest that depthwise is more conservative and more robust to overfitting and should be used when the training validation distribution might not be very identical, or overfitting is a problem in the datasetSingle-threaded performance
Keep in mind that the tree growing policy of LightGBM is equivalent to setting
grow_policy=lossguide
in xgboost, soLightGBM
column should be compared withhist + lossguide
column.Per-iteration runtime (seconds)
The average was taken over 300 iterations.
exact
approx
hist + depthwise
hist + lossguide
LightGBM
allstate
higgs
yahoo
Cumulative runtime over 300 iterations
Multi-threaded performance, not yet fully optimized,
Parallel implementation of
tree_method=hist
needs more work, for the time being.Per-iteration runtime (seconds)
exact
approx
hist + depthwise
hist + lossguide
LightGBM
allstate
higgs
yahoo
Cumulative runtime over 300 iterations
The text was updated successfully, but these errors were encountered: