Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Add support for Substrait #32

Open
3 tasks
andygrove opened this issue May 21, 2022 · 1 comment
Open
3 tasks

[EPIC] Add support for Substrait #32

andygrove opened this issue May 21, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

andygrove commented May 21, 2022

[EDIT: Updated this on 2/25/23]

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The substrait standard is gaining adoption and I would like to add support to Balllista. There are three different areas where we could potentially support Substrait:

  • ExecuteQueryParams currently accepts either LogicalPlan or a SQL string. We could add Substrait here as well, represented as a byte array. This would allow clients such as Ibis to submit queries directly to Ballista's gRPC service.
  • The executor currently receives tasks containing DataFusion physical plans. These plans could be serialized to Substrait and passed to other execution engines, such as DuckDB, Polars, and cuDF, making Ballista a general-purpose distributed query scheduler.
  • We currently use a proprietary protobuf format for representing plans in protobuf format. We could adopt Substrait here as well, or maybe just add a wrapper for Substrait plans.

Original description:

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Ballista (and DataFusion) has a proprietary protobuf-based format for serializing query plans. This really ties Ballista to DataFusion and does not allow other query engines and/or compute kernals to be used easily.

Describe the solution you'd like
There is now an emerging standard for query plan serialization at https://substrait.io/ and this is also protobuf-based. It would be good to move towards this over time.

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added the enhancement New feature or request label May 21, 2022
@andygrove
Copy link
Member Author

Substrait support is now in DataFusion, so I plan on working on this soon

@andygrove andygrove changed the title Adopt substrait.io for serializing query plans [EPIC] Add support for Substrait Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant