-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List lexicographic comparator #11129
Conversation
Many iterations already happened. I just realized late that I should commit
sliced no longer works
early return and remove unnecessary statements
I'm starting to review this PR. The statement above makes me a bit worried. There are many cases comparing non-nested structs (struct of basic types) in Spark using the flattening approach + basic lex comparator. Switching to using the new lex comparator for non-nested structs could cause a lot of performance regression. The Spark team may scream in panic if they see their benchmarks slow down due to it. |
As per my argument above, if this PR changes the performance of the existing APIs (like binary search and sorting) then I would suggest not to merge this until it has been tested with the performance benchmark in spark-rapids and got confirmation of a safe-merge. |
auto const d_comparator = comparator.less<true>(nullate::DYNAMIC{has_nulls}); | ||
if (find_first) { | ||
thrust::lower_bound(rmm::exec_policy(stream), | ||
haystack_it, | ||
haystack_it + haystack.num_rows(), | ||
needles_it, | ||
needles_it + needles.num_rows(), | ||
out_it, | ||
d_comparator); | ||
} else { | ||
thrust::upper_bound(rmm::exec_policy(stream), | ||
haystack_it, | ||
haystack_it + haystack.num_rows(), | ||
needles_it, | ||
needles_it + needles.num_rows(), | ||
out_it, | ||
d_comparator); | ||
} | ||
} else { | ||
thrust::upper_bound(rmm::exec_policy(stream), | ||
haystack_it, | ||
haystack_it + haystack.num_rows(), | ||
needles_it, | ||
needles_it + needles.num_rows(), | ||
out_it, | ||
d_comparator); | ||
auto const d_comparator = comparator.less<false>(nullate::DYNAMIC{has_nulls}); | ||
if (find_first) { | ||
thrust::lower_bound(rmm::exec_policy(stream), | ||
haystack_it, | ||
haystack_it + haystack.num_rows(), | ||
needles_it, | ||
needles_it + needles.num_rows(), | ||
out_it, | ||
d_comparator); | ||
} else { | ||
thrust::upper_bound(rmm::exec_policy(stream), | ||
haystack_it, | ||
haystack_it + haystack.num_rows(), | ||
needles_it, | ||
needles_it + needles.num_rows(), | ||
out_it, | ||
d_comparator); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I tried to use a lambda to simplify the code:
auto const do_search = [find_first](auto&&... args) {
if (find_first) {
thrust::lower_bound(std::forward<decltype(args)>(args)...);
} else {
thrust::upper_bound(std::forward<decltype(args)>(args)...);
}
};
do_search(rmm::exec_policy(stream),
count_it,
count_it + haystack.num_rows(),
count_it,
count_it + needles.num_rows(),
out_it,
comp);
Unfortunately using such variadic template causes false compiler warnings. I want to bring it back here in case you guys may have a new idea for improving this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not, we can also reduce the code by half by writing a lambda for the block lower_bound
/upper_bound
and calling it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11667 for discussion on how we can improve this. The plan is to do that in a follow-up so that we don't hold up this feature any longer for refactoring-like tasks.
This makes me sad, but it is a good point. I agree that we probably want to get this tested by spark-rapids. Can you follow up with the appropriate person (perhaps @abellina?) to get those benchmarks run? The old comparator will eventually be phased out entirely in favor of the new comparator. If there are cases where the new comparator's performance measures up poorly against the old one, then we'll have to consider ways to optimize it. I take your point about the new comparator perhaps slowing down the case of structs that don't contain other structs, but I'm honestly not sure how far we want to go here. Truly providing peak performance for every possible combination would probably require us to have many different template overloads of this type, each with different operator overloads enabled or disabled from compilation (lists without structs, structs without lists, no lists or structs, different types of mixed structs/lists) and perhaps even based on whether or not nulls are present. Going down this route will, I suspect, very quickly start to incur a much higher maintenance burden than what we currently have since these performance characteristics are also likely to change with each compiler version. In my view it is expected that enabling more features for comparing nested types could have performance costs, and to some extent that is acceptable. |
@gpucibot merge |
Contributes to #10186
This PR enables lexicographic comparisons between list columns. The comparison is robust to arbitrary levels of nesting, but does not yet support lists of (lists of...) structs. The comparison is based on the Dremel encoding already in use in the Parquet file format. To assist reviewers, here is a reasonably complete list of the changes in this PR:
lexicographic::device_row_comparator::operator()
for lists. This function is the heart of the change to enable comparisons between list columns.