Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Iceberg parser should passthrough unsupported procedure to delegate #11480

Merged
merged 11 commits into from
Nov 12, 2024

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Nov 6, 2024

Spark allows users to configure multiple extensions and each extension is allowed to inject its own SQL parser, there is a chance that the user configures multiple extensions that support CALL syntax, ideally, each extension should only intercept its supported procedure and leave others to the delegate.

Currently, all Iceberg builtin procedures are under system namespace, which is guard by

so I propose to add a simple rule to check the SQL before parsing

@github-actions github-actions bot added the spark label Nov 6, 2024
@pan3793 pan3793 changed the title Iceberg parser should passthrough unsupported procedure to delegate Spark 3.5: Iceberg parser should passthrough unsupported procedure to delegate Nov 6, 2024
@pan3793 pan3793 marked this pull request as draft November 6, 2024 18:24
@pan3793 pan3793 marked this pull request as ready for review November 6, 2024 19:05
@pan3793
Copy link
Member Author

pan3793 commented Nov 7, 2024

cc @RussellSpitzer @Fokko @aokolnychyi could you please take a look?

public void testDelegateUnsupportedProcedure() {
assertThatThrownBy(() -> parser.parsePlan("CALL cat.d.t()"))
.isInstanceOf(ParseException.class)
.hasMessageContaining("[PARSE_SYNTAX_ERROR] Syntax error at or near 'CALL'.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.0 we won't need this correct? One of the things I don't like about this (but I don't think there is a fix in 3.5) is that Parse errors are very uninformative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark already has error class in 3.5. Shall we do something like

    assertThatThrownBy(() -> parser.parsePlan("CALL cat.d.t()"))
        .isInstanceOf(ParseException.class)
        .satisfies(exception -> {
            ParseException parseException = (ParseException) exception;
            assertThat(parseException.getErrorClass()).isEqualTo("PARSE_SYNTAX_ERROR");
            assertThat(parseException.getMessageParameters().get("error")).isEqualTo("'CALL'");
         });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.0 we won't need this correct? One of the things I don't like about this (but I don't think there is a fix in 3.5) is that Parse errors are very uninformative.

@RussellSpitzer Yes, with SPARK-44167, Spark supports to parse the CALL statement, and a FAILED_TO_LOAD_ROUTINE error class will be thrown for this case.

@huaxingao thank you for information, will change soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao test assertions are updated as suggested, please take another look, thank you in advance

@@ -37,6 +38,10 @@ public static ProcedureBuilder newBuilder(String name) {
return builderSupplier != null ? builderSupplier.get() : null;
}

public static Set<String> names() {
return BUILDERS.keySet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add the "system" here? do we want access to the list of names without the "system" prefix?

Copy link
Member Author

@pan3793 pan3793 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it consistent with the existing method - the parameter name does not contain the system. prefix

public static ProcedureBuilder newBuilder(String name)

@RussellSpitzer
Copy link
Member

Spark allows users to configure multiple extensions and each extension is allowed to inject its own SQL parser, there is a chance that the user configures multiple extensions that support CALL syntax, ideally, each extension should only intercept its supported procedure and leave others to the delegate.

Currently, all Iceberg builtin procedures are under system namespace, which is guard by

so I propose to add a simple rule to check the SQL before parsing

Do we have an example of someone else using the CALL syntax? Just wondering since we invented it for Iceberg specifically. I think this is fine either way, just a little disappointing we are losing a good error message for a bad one.

@pan3793
Copy link
Member Author

pan3793 commented Nov 12, 2024

Do we have an example of someone else using the CALL syntax?

@RussellSpitzer for now I have seen the following projects support CALL syntax too.

@RussellSpitzer RussellSpitzer merged commit 5054578 into apache:main Nov 12, 2024
31 checks passed
@RussellSpitzer
Copy link
Member

Thanks @pan3793 for making sure things remain more compatible across the ecosystem. Thanks @huaxingao for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants