Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With Order Support for Memory Tables #1401

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

mertak-synnada
Copy link

Add support for WITH ORDER syntax for Memory Tables

@@ -124,6 +124,7 @@ pub struct CreateTable {
pub default_charset: Option<String>,
pub collation: Option<String>,
pub on_commit: Option<OnCommit>,
pub with_order: Vec<Vec<OrderByExpr>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include here a link to the docs where this syntax comes from? (e.g. similar to the on_cluster below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was inspired by Datafusion's WITH ORDER statement, so I've added the link and tests, thank you!

Copy link
Contributor

@alamb alamb Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I did some research and could not find any existing databases that use the WITH ORDER syntax.

ClickHouse does seem to have a way to specify order as part of a CREATE TABLE statement: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree

@mertak-synnada rather than introducing special DataFusion only syntax support, what do you think about extending DataFusion to use the existing ClickHouse syntax?

CREATE TABLE ....
ORDER BY <...>

We might have to change the GenericDialect or add some feature to the Dialect trait to permit other dialects to parse such syntax, but it would be nice to align to some other existing syntax rather than creating something new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two possible ways forward:

  1. We can change DF to support ORDER BY after CREATE and deprecate WITH ORDER. This way we would have Clickhouse-like syntax for both memory and external tables, and WITH ORDER syntax only for external tables. The latter asymmetry is a little weird but we can explain it away by deprecating WITH ORDER.
  2. We can support both ORDER BY and WITH ORDER for both memory and external tables in DF (they would be synonyms). We'd need to add WITH ORDER for ordinary CREATEs here though.

I am OK with both, with a slight preference to 2 because DF has had WITH ORDER for a while now. What do you think?

Copy link
Contributor

@alamb alamb Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case we should distinguish between DataFusion's needs and sqlparser-rs's needs

If we are going to introduce WITH ORDER to sqlparser-rs I think it should follow the existing pattern of being connected to a specific dialect (in this case DatafusionDialect).

Thus I would personally vote for option 1 (change DF) so we avoid creating a new SQL dialect (both in this crate and in general) as much as possible

  1. We can change DF to support ORDER BY after CREATE and deprecate WITH ORDER. This way we would have Clickhouse-like syntax for both memory and external tables, and WITH ORDER syntax only for external tables. The latter asymmetry is a little weird but we can explain it away by deprecating WITH ORDER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - let's do it

@@ -124,6 +124,7 @@ pub struct CreateTable {
pub default_charset: Option<String>,
pub collation: Option<String>,
pub on_commit: Option<OnCommit>,
pub with_order: Vec<Vec<OrderByExpr>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include test cases demonstrating the behavior introduced by the changes?

add unit test for WITH ORDER parsing in create table
Comment on lines 5594 to 5595
if self.parse_keyword(Keyword::WITH) {
if self.parse_keyword(Keyword::ORDER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.parse_keyword(Keyword::WITH) {
if self.parse_keyword(Keyword::ORDER) {
if self.parse_keywords(&[Keyword::WITH, Keyword::ORDER]) {

Also I think we can keep the postgres path separate since this feature doesn't apply to that dialect? - so that this part remains the same

        let with_options = self.parse_options(Keyword::WITH)?;

We can make the fuctionality conditional with e.g. if self.dialect.supports_with_order() or similar method

Comment on lines 5597 to 5603
loop {
order_exprs.push(vec![self.parse_order_by_expr()?]);
if !self.consume_token(&Token::Comma) {
self.expect_token(&Token::RParen)?;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the parse_comma_separated() method be reused for this functionality?

@@ -12364,6 +12385,43 @@ mod tests {
);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this would probably not be the ideal place for the test to live - if the feature isn't bound to any dialect, or if it uses a dialect method to guard the feature, we can add the tests to common e.g. https://github.com/sqlparser-rs/sqlparser-rs/blob/main/tests/sqlparser_common.rs#L4405

@coveralls
Copy link

Pull Request Test Coverage Report for Build 10678509735

Details

  • 50 of 55 (90.91%) changed or added relevant lines in 8 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.001%) to 89.25%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/helpers/stmt_create_table.rs 6 7 85.71%
src/dialect/generic.rs 1 2 50.0%
src/dialect/mod.rs 1 2 50.0%
tests/sqlparser_common.rs 24 26 92.31%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_postgres.rs 1 88.56%
tests/sqlparser_bigquery.rs 1 90.27%
Totals Coverage Status
Change from base Build 10653671802: -0.001%
Covered Lines: 28785
Relevant Lines: 32252

💛 - Coveralls

@alamb
Copy link
Contributor

alamb commented Sep 10, 2024

Marking as draft as i think this is still under discussion and not in need of more review. Please mark it as review if I got that wrong or when it is ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants