-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nan_equality
in cudf::distinct
#11118
Support nan_equality
in cudf::distinct
#11118
Conversation
# Conflicts: # cpp/benchmarks/stream_compaction/distinct.cpp # cpp/include/cudf/stream_compaction.hpp # cpp/src/dictionary/detail/concatenate.cu # cpp/src/dictionary/set_keys.cu # cpp/src/stream_compaction/distinct.cu # cpp/src/stream_compaction/distinct_reduce.cu # cpp/src/stream_compaction/distinct_reduce.cuh # cpp/src/transform/encode.cu # cpp/tests/stream_compaction/distinct_tests.cpp
97a6cb6
to
be3b2fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions. Otherwise this is a straightforward extension of the comparator API to forward the NaN policy through. Nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. LGTM!
map.get_device_view(), key_hasher, key_equal, keep, reduction_results.begin()}); | ||
auto const row_comp = cudf::experimental::row::equality::self_comparator(preprocessed_input); | ||
|
||
auto const reduce_by_row = [&](auto const value_comp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the lines of what @PointKernel was suggesting -- one alternative I considered was making this lambda actually allocate and return the output vector, rather than binding in reduction_results
as an output iterator and returning void
. That felt like an implicit "output parameter" in the lambda. If the lambda were a real function, we'd avoid the output parameter and return the rmm::device_uvector
directly from the function. I don't have strong feelings on this, so feel free to keep it as-is.
edit: Initially wrote "IIFE" where I meant lambda. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did that, however, almost the entire body of hash_reduce_by_row
becomes a lambda and the part at the end is just a nans_equal
dispatcher. 😛
Thanks all. I really appreciate your help with polishing everything 🍬 |
@gpucibot merge |
This PR adds the following APIs for set operations: * `lists::have_overlap` * `lists::intersect_distinct` * `lists::union_distinct` * `lists::difference_distinct` ### Name Convention Except for the first API (`lists::have_overlap`) that returns a boolean column, the suffix `_distinct` of the rest APIs denotes that their results will be lists columns in which all list rows have been post-processed to remove duplicates. As such, their results are actually "set" columns in which each row is a "set" of distinct elements. --- Depends on: * #10945 * #11017 * NVIDIA/cuCollections#175 * #11052 * #11118 * #11100 * #11149 Closes #10409. Authors: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) Approvers: - Michael Wang (https://github.com/isVoid) - AJ Schmidt (https://github.com/ajschmidt8) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #11043
This adds
nan_equality
parameter tocudf::distinct
, allowing to specify the desired behavior when dealing with floating-point data:NaN
should be compared equally to otherNaN
or not.Depends on #11052 (built on top of it).
Closes #11092.
This is a blocker for set-like operations (#11043) and also the last blocker for #11053.