Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: define sideband optimization hints #705

Merged
merged 13 commits into from
Oct 8, 2024

Conversation

EpsilonPrime
Copy link
Member

No description provided.

// The local system can use any numbering system it wants but for better compatibility
// it is suggested to refer to computations in order of the input that they are derived
// from. Computation numbers start at 1.
int32 number = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle with the use of this. Anchor seems sufficient to have referencing between locations and if we're trying to identify engine meaningful, I think better to just allow a hint any value on the computation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defines a sharing group. Because there is a potentially one to many relationship for every computation we need some way of defining which group a particular reference is from.

Copy link
Contributor

@jacques-n jacques-n Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it.

I think you're saying that a single rel could save, let's say, two hashtables. And five operators could consume all/some/none of those two hash tables. I think anchor is more than enough for that. Why do we also need number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you know if the first hashtable or the second hashtable was supposed to be output for that computation group? The same goes for the consuming side -- how do you know if the that computation is destined for the first or the second hashtable slot? Anchor (or computation group id) tells you which relations are consuming it and not where in the relation it is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main thought is the "how the number maps to an internal structure" is opaque to consumers, whether it is one number or two. Whether we have anchor1 and anchor2 as the left and right or anchor1,number1 and anchor1,number2 doesn't impact that the mapping is hidden/implicit.

For now I suggest that we just eliminate the second number.

proto/substrait/algebra.proto Show resolved Hide resolved
proto/substrait/algebra.proto Outdated Show resolved Hide resolved
@vbarua
Copy link
Member

vbarua commented Sep 20, 2024

It would be helpful to have example plans for this feature. In the abstract I understand the intent of what we're trying to accomplish, but I think it would make a lot more sense in context.

@EpsilonPrime
Copy link
Member Author

It would be helpful to have example plans for this feature. In the abstract I understand the intent of what we're trying to accomplish, but I think it would make a lot more sense in context.

I've added some documentation that I hope addresses this concern.

// The local system can use any numbering system it wants but for better compatibility
// it is suggested to refer to computations in order of the input that they are derived
// from. Computation numbers start at 1.
int32 number = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main thought is the "how the number maps to an internal structure" is opaque to consumers, whether it is one number or two. Whether we have anchor1 and anchor2 as the left and right or anchor1,number1 and anchor1,number2 doesn't impact that the mapping is hidden/implicit.

For now I suggest that we just eliminate the second number.

proto/substrait/algebra.proto Show resolved Hide resolved
site/docs/relations/common_fields.md Outdated Show resolved Hide resolved
site/docs/relations/common_fields.md Outdated Show resolved Hide resolved
site/docs/relations/common_fields.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @EpsilonPrime !

@jacques-n
Copy link
Contributor

Since this isn't breaking, I think we can merge with two SMC+1, right? So @EpsilonPrime and myself should be enough. Right?

@jacques-n
Copy link
Contributor

Since this isn't breaking, I think we can merge with two SMC+1, right? So @EpsilonPrime and myself should be enough. Right?

@EpsilonPrime , pinging on my question here.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved in the sense of "let's get something out there and see how it works." We can probably refine slightly as we go.

@westonpace
Copy link
Member

Since this isn't breaking, I think we can merge with two SMC+1

I suppose that is technically true since we didn't explicitly mark (not including proposer) on the SMC changes. Either way, added my +1 to remove the ambiguity.

@jacques-n jacques-n merged commit e386a29 into substrait-io:main Oct 8, 2024
13 checks passed
@EpsilonPrime EpsilonPrime deleted the sideband_optimization branch October 12, 2024 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants