Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-table comparators with strong index types #10730

Merged
merged 35 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
50b8891
Add strong index type.
bdice Apr 16, 2022
b9ed4d7
Revert changes to non-experimental row operators.
bdice Apr 20, 2022
d67f17e
Use enum for strongly typed index.
bdice May 3, 2022
464ed2b
Add two table comparator and adapter.
bdice May 3, 2022
b26b318
Add friends. :)
bdice May 3, 2022
1fd199d
Apply two-table comparator to search algorithms.
bdice May 3, 2022
18bd9f0
Move shared lhs/rhs logic into launch_search.
bdice May 3, 2022
b5b8b39
Improve comments, remove old code.
bdice May 3, 2022
4060b4f
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 11, 2022
73c4b27
Move strong typing code into cudf::experimental::row::lexicographic.
bdice May 11, 2022
9cdbe27
Merge remote-tracking branch 'upstream/branch-22.06' into strong-inde…
bdice May 13, 2022
c8a38fe
Improve comment.
bdice May 13, 2022
8b5ef34
Fix docstrings.
bdice May 13, 2022
77f85b4
Enable weak ordering machinery (weak_ordering_comparator_impl) to wra…
bdice May 13, 2022
529e944
Remove template template parameters.
bdice May 13, 2022
fb0e192
Use references.
bdice May 13, 2022
56d99ba
Use Ts const...
bdice May 13, 2022
c5998b7
Move strong typing to cudf::experimental::row.
bdice May 13, 2022
b78d978
Use constexpr.
bdice May 13, 2022
3aea8d4
Use custom iterator class.
bdice May 14, 2022
bbaf360
Use __device__ only.
bdice May 14, 2022
4a1d7aa
Add comment.
bdice May 14, 2022
09c5661
Use symmetry of comparator (now possible with weak ordering) to avoid…
bdice May 14, 2022
290323f
Add constexpr to two_table_device_row_comparator_adapter.
bdice May 14, 2022
4c69edd
Remove forward (always accepts lvalues).
bdice May 14, 2022
fbd5b90
Indicate reversed signature.
bdice May 16, 2022
3db6484
Move constructor to implementation, add shape compatibility check.
bdice May 16, 2022
3e81b53
Improve docstrings.
bdice May 16, 2022
1834095
Use thrust::iterator_facade.
bdice May 16, 2022
ff26024
Use const for struct members.
bdice May 17, 2022
f779bff
Slim down the strong index layer by using a templated struct.
bdice May 17, 2022
157abbc
Simplify construction.
bdice May 17, 2022
a2ac19d
Use size_type const where possible.
bdice May 17, 2022
75249e8
Require weakly or strongly typed values for lhs_index and rhs_index.
bdice May 17, 2022
bed1162
Unconstrain template typenames.
bdice May 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions cpp/include/cudf/detail/utilities/strong_index.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Copyright (c) 2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cudf/types.hpp>

namespace cudf {

enum class lhs_index_type : cudf::size_type {};
enum class rhs_index_type : cudf::size_type {};

} // namespace cudf
160 changes: 159 additions & 1 deletion cpp/include/cudf/table/experimental/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <cudf/detail/utilities/algorithm.cuh>
#include <cudf/detail/utilities/assert.cuh>
#include <cudf/detail/utilities/hash_functions.cuh>
#include <cudf/detail/utilities/strong_index.hpp>
#include <cudf/lists/list_device_view.cuh>
#include <cudf/lists/lists_column_device_view.cuh>
#include <cudf/sorting.hpp>
Expand Down Expand Up @@ -89,6 +90,7 @@ namespace lexicographic {
template <typename Nullate>
class device_row_comparator {
friend class self_comparator;
// friend class two_table_device_row_comparator_adapter<Nullate>;

/**
* @brief Construct a function object for performing a lexicographic
Expand Down Expand Up @@ -277,7 +279,7 @@ struct preprocessed_table {
* @brief Preprocess table for use with lexicographical comparison
*
* Sets up the table for use with lexicographical comparison. The resulting preprocessed table can
* be passed to the constructor of `lex::self_comparator` to avoid preprocessing again.
* be passed to the constructor of `lexicographic::self_comparator` to avoid preprocessing again.
*
* @param table The table to preprocess
* @param column_order Optional, host array the same length as a row that indicates the desired
Expand Down Expand Up @@ -427,6 +429,162 @@ class self_comparator {
std::shared_ptr<preprocessed_table> d_t;
};

template <typename Nullate>
class two_table_device_row_comparator_adapter {
bdice marked this conversation as resolved.
Show resolved Hide resolved
friend class two_table_comparator;

public:
/**
* @brief Checks whether the row at `lhs_index` in the `lhs` table compares
* lexicographically less than the row at `rhs_index` in the `rhs` table.
*
* @param lhs_index The index of row in the `lhs` table to examine
* @param rhs_index The index of the row in the `rhs` table to examine
* @return `true` if row from the `lhs` table compares less than row in the `rhs` table
*/
__device__ bool operator()(lhs_index_type const lhs_index,
rhs_index_type const rhs_index) const noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need overloads for (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible — but I haven’t found an algorithm that needs those yet. For example, merge requires sorted inputs, so the lhs/lhs and rhs/rhs overloads are never needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, overloads for (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) would require this class to own two additional comparators (self comparators for left and right tables).

Copy link
Contributor

@ttnghia ttnghia May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys! I stumbled on to this: using self_comparator with strong index types! So we are going to need (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) overloads for self_comparator.

These overloads just convert lhs_index_type or lhs_index_type into size_type and call the normal operator()(size_type).

Update: I may not need this as I have another idea to avoid using strong index type, but this is still a relevant need.

{
return comp(static_cast<cudf::size_type>(lhs_index), static_cast<cudf::size_type>(rhs_index));
bdice marked this conversation as resolved.
Show resolved Hide resolved
}

/**
* @brief Checks whether the row at `rhs_index` in the `rhs` table compares
* lexicographically less than the row at `lhs_index` in the `lhs` table.
*
* @param rhs_index The index of row in the `rhs` table to examine
* @param lhs_index The index of the row in the `lhs` table to examine
* @return `true` if row from the `rhs` table compares less than row in the `lhs` table
*/
__device__ bool operator()(rhs_index_type const rhs_index,
lhs_index_type const lhs_index) const noexcept
{
// TODO: "not lhs < rhs" isn't quite the same as "rhs < lhs". The case of
// equality returns true for operator(rhs, lhs), while operator(lhs, rhs)
// returns false. This would have to be handled at a lower level, if it
// matters. Do we just document that this means "rhs <= lhs"?
return not comp(static_cast<cudf::size_type>(lhs_index),
static_cast<cudf::size_type>(rhs_index));
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
}

private:
/**
* @brief Construct a function object for performing a lexicographic
* comparison between the rows of two tables with strongly typed table index
* types.
*
* @param check_nulls Indicates if either input table contains columns with nulls.
* @param lhs The first table
* @param rhs The second table (may be the same table as `lhs`)
* @param depth Optional, device array the same length as a row that contains starting depths of
* columns if they're nested, and 0 otherwise.
* @param column_order Optional, device array the same length as a row that indicates the desired
* ascending/descending order of each column in a row. If `nullopt`, it is assumed all columns are
* sorted in ascending order.
* @param null_precedence Optional, device array the same length as a row and indicates how null
* values compare to all other for every column. If `nullopt`, then null precedence would be
* `null_order::BEFORE` for all columns.
*/
two_table_device_row_comparator_adapter(
Nullate check_nulls,
table_device_view lhs,
table_device_view rhs,
std::optional<device_span<int const>> depth = std::nullopt,
std::optional<device_span<order const>> column_order = std::nullopt,
std::optional<device_span<null_order const>> null_precedence = std::nullopt)
: comp{check_nulls, lhs, rhs, depth, column_order, null_precedence}
{
}

device_row_comparator<Nullate> comp;
bdice marked this conversation as resolved.
Show resolved Hide resolved
bdice marked this conversation as resolved.
Show resolved Hide resolved
};

/**
* @brief An owning object that can be used to lexicographically compare rows of two different
* tables
*
* This class takes two table_views and preprocesses certain columns to allow for lexicographical
* comparison. The preprocessed table and temporary data required for the comparison are created and
* owned by this class.
*
* Alternatively, `two_table_comparator` can be constructed from two existing
* `shared_ptr<preprocessed_table>`s when sharing the same tables among multiple comparators.
*
* This class can then provide a functor object that can used on the device.
* The object of this class must outlive the usage of the device functor.
*/
class two_table_comparator {
public:
/**
* @brief Construct an owning object for performing a lexicographic comparison between rows of
* two different tables.
*
* The left and right table are expected to have the same number of columns
* and data types for each column.
*
* @param left The left table to compare
* @param right The right table to compare
* @param column_order Optional, host array the same length as a row that indicates the desired
* ascending/descending order of each column in a row. If empty, it is assumed all columns are
* sorted in ascending order.
* @param null_precedence Optional, device array the same length as a row and indicates how null
* values compare to all other for every column. If empty, then null precedence would be
* `null_order::BEFORE` for all columns.
* @param stream The stream to construct this object on. Not the stream that will be used for
* comparisons using this object.
*/
two_table_comparator(table_view const& left,
table_view const& right,
host_span<order const> column_order = {},
host_span<null_order const> null_precedence = {},
rmm::cuda_stream_view stream = rmm::cuda_stream_default)
: d_left_table{preprocessed_table::create(left, column_order, null_precedence, stream)},
d_right_table{preprocessed_table::create(right, column_order, null_precedence, stream)}
{
bdice marked this conversation as resolved.
Show resolved Hide resolved
}

/**
* @brief Construct an owning object for performing a lexicographic comparison between two rows of
* the same preprocessed table.
*
* This constructor allows independently constructing a `preprocessed_table` and sharing it among
* multiple comparators.
*
* @param left A table preprocessed for lexicographic comparison
* @param right A table preprocessed for lexicographic comparison
*/
two_table_comparator(std::shared_ptr<preprocessed_table> left,
std::shared_ptr<preprocessed_table> right)
: d_left_table{std::move(left)}, d_right_table{std::move(right)}
{
}

/**
* @brief Return the binary operator for comparing rows in the table.
*
* Returns a binary callable, `F`, with signature `bool F(lhs_index_type, rhs_index_type)`.
bdice marked this conversation as resolved.
Show resolved Hide resolved
*
* `F(i,j)` returns true if and only if row `i` of the left table compares
* lexicographically less than row `j` of the right table.
*
* @tparam Nullate A cudf::nullate type describing whether to check for nulls.
*/
template <typename Nullate>
two_table_device_row_comparator_adapter<Nullate> device_comparator(Nullate nullate = {}) const
{
return two_table_device_row_comparator_adapter<Nullate>(nullate,
*d_left_table,
*d_right_table,
d_left_table->depths(),
d_left_table->column_order(),
d_left_table->null_precedence());
}

private:
std::shared_ptr<preprocessed_table> d_left_table;
std::shared_ptr<preprocessed_table> d_right_table;
};

} // namespace lexicographic

namespace hash {
Expand Down