-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32763: [C++] Add FromProto for fetch & sort #34651
GH-32763: [C++] Add FromProto for fetch & sort #34651
Conversation
|
d77339c
to
167d4a3
Compare
@vibhatha @rtpsw are you able to take a look as suggest by @westonpace ? |
167d4a3
to
87451a6
Compare
Rebased. |
I think @rtpsw is out this week but maybe @icexelloss can take a look |
@amol- @westonpace I will take a look. |
compute::SortOrder SortOrderFromDirection( | ||
const substrait::SortField::SortDirection& direction) { | ||
if (direction < 3) { | ||
return compute::SortOrder::Ascending; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider UNSPECIFIED
as Ascending?
Here:
enum SortDirection {
SORT_DIRECTION_UNSPECIFIED = 0;
SORT_DIRECTION_ASC_NULLS_FIRST = 1;
SORT_DIRECTION_ASC_NULLS_LAST = 2;
SORT_DIRECTION_DESC_NULLS_FIRST = 3;
SORT_DIRECTION_DESC_NULLS_LAST = 4;
SORT_DIRECTION_CLUSTERED = 5;
}
I assume since the 1, 2 are Ascending, <3
is to pick the first three as Ascending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, technically unspecified would be an invalid plan I think. So it would probably be better to reject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now an explicit rejection.
case substrait::SortField::SortDirection::
SortField_SortDirection_SORT_DIRECTION_UNSPECIFIED:
return Status::Invalid("The substrait plan does not specify a sort direction");
namespace { | ||
|
||
bool IsSortNullsFirst(const substrait::SortField::SortDirection& direction) { | ||
return direction % 2 == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is efficient, but shall we leave a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But CLUSTERED
has no null preference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now throw an error on CLUSTERED
:
case substrait::SortField::SortDirection::
SortField_SortDirection_SORT_DIRECTION_CLUSTERED:
default:
return Status::NotImplemented(
"Acero does not support the specified sort direction: ", dir);
NamedTableProvider table_provider = [&](const std::vector<std::string>& names, | ||
const Schema&) { | ||
std::shared_ptr<acero::ExecNodeOptions> options = | ||
std::make_shared<acero::TableSourceNodeOptions>(input_table); | ||
return acero::Declaration("table_source", {}, options, "mock_source"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead could we use the following?
NamedTableProvider table_provider = [&](const std::vector<std::string>& names, | |
const Schema&) { | |
std::shared_ptr<acero::ExecNodeOptions> options = | |
std::make_shared<acero::TableSourceNodeOptions>(input_table); | |
return acero::Declaration("table_source", {}, options, "mock_source"); | |
}; | |
NamedTableProvider table_provider = AlwaysProvideSameTable(std::move(input_table)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good cleanup, thanks. I've switched to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace the PR looks good to me. I have added a few suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions. Mostly looks good.
87451a6
to
1eb683a
Compare
I'll merge this when green but I don't think there is any significant rush to get this into 12.0.0? |
With Arrow v12 released (🎉 ) you should be all clear to merge at your leisure! |
1eb683a
to
4c457e8
Compare
I've rebased just in case and will let CI run one more time. |
Also, I see I missed some feedback from @vibhatha so I will try and get to that today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, but I'm not competent enough to comment on the core logic.
namespace { | ||
|
||
bool IsSortNullsFirst(const substrait::SortField::SortDirection& direction) { | ||
return direction % 2 == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, rather than arithmetic on enum values, I'd rather see a proper switch/case
statement to make code more readable and maintainable. This is not so performance-sensitive that it must be optimized to the latest nanosecond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to proper handling of these cases and I return NullPlacement and SortOrder. I agree the code is more readable now.
|
||
compute::SortOrder SortOrderFromDirection( | ||
const substrait::SortField::SortDirection& direction) { | ||
if (direction < 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if ((null_placement == compute::NullPlacement::AtStart && | ||
!IsSortNullsFirst(sort.direction())) || | ||
(null_placement == compute::NullPlacement::AtEnd && | ||
IsSortNullsFirst(sort.direction()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change IsSortNullsFirst
to return a compute::NullPlacement
directly and this will make these lines a bit simpler (just compare the old null placement with the new one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've cleaned this up.
… correctly recognizing mixed null placement.
4c457e8
to
1e3b763
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for the update @westonpace
Benchmark runs are scheduled for baseline = 2a6848c and contender = fbe0d5f. fbe0d5f is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This does not support the clustered sort direction, custom sort functions, or complex (non-reference) expressions as sort keys.