Skip to content

Commit

Permalink
Expose mixed and conditional joins in pylibcudf (#17235)
Browse files Browse the repository at this point in the history
Expose these join types to pylibcudf, they will be useful for implement inequality joins in cudf polars.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #17235
  • Loading branch information
wence- authored Nov 4, 2024
1 parent e6f5c0e commit 076ad58
Show file tree
Hide file tree
Showing 7 changed files with 771 additions and 24 deletions.
32 changes: 16 additions & 16 deletions cpp/include/cudf/join.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -573,7 +573,7 @@ class distinct_hash_join {
* Result: {{1}, {0}}
* @endcode
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down Expand Up @@ -620,7 +620,7 @@ conditional_inner_join(table_view const& left,
* Result: {{0, 1, 2}, {None, 0, None}}
* @endcode
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down Expand Up @@ -666,7 +666,7 @@ conditional_left_join(table_view const& left,
* Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
* @endcode
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down Expand Up @@ -705,7 +705,7 @@ conditional_full_join(table_view const& left,
* Result: {1}
* @endcode
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down Expand Up @@ -746,7 +746,7 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_semi_join(
* Result: {0, 2}
* @endcode
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down Expand Up @@ -793,7 +793,7 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_anti_join(
* Result: {{1}, {0}}
* @endcode
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -855,7 +855,7 @@ mixed_inner_join(
* Result: {{0, 1, 2}, {None, 0, None}}
* @endcode
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -917,7 +917,7 @@ mixed_left_join(
* Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}}
* @endcode
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -972,7 +972,7 @@ mixed_full_join(
* Result: {1}
* @endcode
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -1022,7 +1022,7 @@ std::unique_ptr<rmm::device_uvector<size_type>> mixed_left_semi_join(
* Result: {0, 2}
* @endcode
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -1061,7 +1061,7 @@ std::unique_ptr<rmm::device_uvector<size_type>> mixed_left_anti_join(
* choose a suitable compare_nulls value AND use appropriate null-safe
* operators in the expression.
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -1103,7 +1103,7 @@ std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_in
* choose a suitable compare_nulls value AND use appropriate null-safe
* operators in the expression.
*
* @throw cudf::logic_error If the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error If the binary predicate outputs a non-boolean result.
* @throw cudf::logic_error If the number of rows in left_equality and left_conditional do not
* match.
* @throw cudf::logic_error If the number of rows in right_equality and right_conditional do not
Expand Down Expand Up @@ -1142,7 +1142,7 @@ std::pair<std::size_t, std::unique_ptr<rmm::device_uvector<size_type>>> mixed_le
* If the provided predicate returns NULL for a pair of rows
* (left, right), that pair is not included in the output.
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand All @@ -1167,7 +1167,7 @@ std::size_t conditional_inner_join_size(
* If the provided predicate returns NULL for a pair of rows
* (left, right), that pair is not included in the output.
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand All @@ -1192,7 +1192,7 @@ std::size_t conditional_left_join_size(
* If the provided predicate returns NULL for a pair of rows
* (left, right), that pair is not included in the output.
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand All @@ -1217,7 +1217,7 @@ std::size_t conditional_left_semi_join_size(
* If the provided predicate returns NULL for a pair of rows
* (left, right), that pair is not included in the output.
*
* @throw cudf::logic_error if the binary predicate outputs a non-boolean result.
* @throw cudf::data_type_error if the binary predicate outputs a non-boolean result.
*
* @param left The left table
* @param right The right table
Expand Down
7 changes: 5 additions & 2 deletions cpp/src/join/conditional_join.cu
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include <cudf/table/table_device_view.cuh>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/memory_resource.hpp>

#include <rmm/cuda_stream_view.hpp>
Expand Down Expand Up @@ -178,7 +179,8 @@ conditional_join(table_view const& left,
auto const parser =
ast::detail::expression_parser{binary_predicate, left, right, has_nulls, stream, mr};
CUDF_EXPECTS(parser.output_type().id() == type_id::BOOL8,
"The expression must produce a boolean output.");
"The expression must produce a boolean output.",
cudf::data_type_error);

auto left_table = table_device_view::create(left, stream);
auto right_table = table_device_view::create(right, stream);
Expand Down Expand Up @@ -330,7 +332,8 @@ std::size_t compute_conditional_join_output_size(table_view const& left,
auto const parser =
ast::detail::expression_parser{binary_predicate, left, right, has_nulls, stream, mr};
CUDF_EXPECTS(parser.output_type().id() == type_id::BOOL8,
"The expression must produce a boolean output.");
"The expression must produce a boolean output.",
cudf::data_type_error);

auto left_table = table_device_view::create(left, stream);
auto right_table = table_device_view::create(right, stream);
Expand Down
7 changes: 5 additions & 2 deletions cpp/src/join/mixed_join.cu
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include <cudf/table/table_device_view.cuh>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/memory_resource.hpp>
#include <cudf/utilities/span.hpp>

Expand Down Expand Up @@ -115,7 +116,8 @@ mixed_join(
auto const parser = ast::detail::expression_parser{
binary_predicate, left_conditional, right_conditional, has_nulls, stream, mr};
CUDF_EXPECTS(parser.output_type().id() == type_id::BOOL8,
"The expression must produce a boolean output.");
"The expression must produce a boolean output.",
cudf::data_type_error);

// TODO: The non-conditional join impls start with a dictionary matching,
// figure out what that is and what it's needed for (and if conditional joins
Expand Down Expand Up @@ -381,7 +383,8 @@ compute_mixed_join_output_size(table_view const& left_equality,
auto const parser = ast::detail::expression_parser{
binary_predicate, left_conditional, right_conditional, has_nulls, stream, mr};
CUDF_EXPECTS(parser.output_type().id() == type_id::BOOL8,
"The expression must produce a boolean output.");
"The expression must produce a boolean output.",
cudf::data_type_error);

// TODO: The non-conditional join impls start with a dictionary matching,
// figure out what that is and what it's needed for (and if conditional joins
Expand Down
76 changes: 76 additions & 0 deletions python/pylibcudf/pylibcudf/join.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from pylibcudf.libcudf.types cimport null_equality

from .column cimport Column
from .expressions cimport Expression
from .table cimport Table


Expand Down Expand Up @@ -37,3 +38,78 @@ cpdef Column left_anti_join(
)

cpdef Table cross_join(Table left, Table right)

cpdef tuple conditional_inner_join(
Table left,
Table right,
Expression binary_predicate,
)

cpdef tuple conditional_left_join(
Table left,
Table right,
Expression binary_predicate,
)

cpdef tuple conditional_full_join(
Table left,
Table right,
Expression binary_predicate,
)

cpdef Column conditional_left_semi_join(
Table left,
Table right,
Expression binary_predicate,
)

cpdef Column conditional_left_anti_join(
Table left,
Table right,
Expression binary_predicate,
)

cpdef tuple mixed_inner_join(
Table left_keys,
Table right_keys,
Table left_conditional,
Table right_conditional,
Expression binary_predicate,
null_equality nulls_equal
)

cpdef tuple mixed_left_join(
Table left_keys,
Table right_keys,
Table left_conditional,
Table right_conditional,
Expression binary_predicate,
null_equality nulls_equal
)

cpdef tuple mixed_full_join(
Table left_keys,
Table right_keys,
Table left_conditional,
Table right_conditional,
Expression binary_predicate,
null_equality nulls_equal
)

cpdef Column mixed_left_semi_join(
Table left_keys,
Table right_keys,
Table left_conditional,
Table right_conditional,
Expression binary_predicate,
null_equality nulls_equal
)

cpdef Column mixed_left_anti_join(
Table left_keys,
Table right_keys,
Table left_conditional,
Table right_conditional,
Expression binary_predicate,
null_equality nulls_equal
)
Loading

0 comments on commit 076ad58

Please sign in to comment.