-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bound the size of the histogram cache. #9440
Conversation
- A new histogram collection with limit in size. - Unify histogram building logic between hist, multi-hist, and approx.
9adac94
to
59f1703
Compare
BlockedSpace2d(std::size_t dim1, std::function<std::size_t(std::size_t)> getter_size_dim2, | ||
std::size_t grain_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use std::function<>
instead of a template arg? Doesn't it add an extra pointer indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted the compiler to do checks for me. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more self-explanatory API.
inline void ParallelFor2d(BlockedSpace2d const& space, std::int32_t n_threads, | ||
std::function<void(std::size_t, Range1d)> func) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::function
in a tight loop may have a performance implication, since func
is no longer inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used in a tight loop, it's used to create parallel loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault, it's used in a tight loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But some benchmarks with higgs dataset show no difference before and after the PR. I can revert the change if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just revert this line. The constructor can remain the same since it's not in a tight loop.
Second attempt for #9432 . Less intrusive and smaller overhead when the size doesn't overflow.
Close #9342
Close #9372
Aside from the raised issue, this also helps multi-target tree as memory usage can be significant when the number of targets is large.
todos:
notes
max_depth=20
, the CPU utilization is low and consistently under 60 percent. Further profiling is needed in the future.