-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split FIL infer_k into phases to speed up compilation (when a patch is applied) #4148
Conversation
3m50s to compile with the rest 1m30s to compile alone
the performance impact is within measurement noise: Titan V measurements on
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved provided that the review comments are addressed.
rerun tests
|
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #4148 +/- ##
===============================================
Coverage ? 85.96%
===============================================
Files ? 232
Lines ? 18500
Branches ? 0
===============================================
Hits ? 15904
Misses ? 2596
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@gpucibot merge |
…s applied) (rapidsai#4148) FIL takes several minutes to compile every time even on release (up to 15mins on debug). When combined with an occasional larger recompile, it makes iterating on the code much slower. This code reduces the release compile time of `infer.cu` to 18s and probably a similar speedup on debug builds. Some phases depend on fewer template parameters than the whole `infer_k`. If we merely avoid inlining those pieces, a lot of the code will no longer be duplicated 240 times (3 storage_type x 2 cols_in_shmem x 4 NITEMS x 5 leaf_algo x 2 branch_can_be_categorical). Since those functions are called once per `rows_per_block` (and once per whole forest within that), the runtime overhead should be low enough. An empirical test confirms this. We are keeping the default compilation as joint due to the theoretical uncertainty with function call overhead. Authors: - https://github.com/levsnv Approvers: - Andy Adinets (https://github.com/canonizer) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4148
FIL takes several minutes to compile every time even on release (up to 15mins on debug). When combined with an occasional larger recompile, it makes iterating on the code much slower. This code reduces the release compile time of
infer.cu
to 18s and probably a similar speedup on debug builds.Some phases depend on fewer template parameters than the whole
infer_k
. If we merely avoid inlining those pieces, a lot of the code will no longer be duplicated 240 times (3 storage_type x 2 cols_in_shmem x 4 NITEMS x 5 leaf_algo x 2 branch_can_be_categorical).Since those functions are called once per
rows_per_block
(and once per whole forest within that), the runtime overhead should be low enough. An empirical test confirms this.We are keeping the default compilation as joint due to the theoretical uncertainty with function call overhead.