Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5213] [SQL] Pluggable SQL Parser Support #4015

Closed
wants to merge 4 commits into from

Conversation

chenghao-intel
Copy link
Contributor

This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

# add the jar into the classpath
$hcheng@mydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25465 has started for PR 4015 at commit 2fe7d99.

  • This patch merges cleanly.

@OopsOutOfMemory
Copy link
Contributor

nice feature. 👍

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25465 timed out for PR 4015 at commit 2fe7d99 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25465/
Test FAILed.

@AlphaComponent
abstract class SQLDialect {
/**
* We assume the DDLParser has higher priority than any of the other SQL Parsers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption may lead to some problem, an example from #3935 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @scwf
Since our goal is to support variety sql dialects, we can not expect them all have the same behaviours so that the priority of parser is a problem.
What about leave each dialect's own implementation and abstract a method in SQLDialect to let each dialect implement their own order of parsing ?
And Sorry if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference about describe table between hive and sparksql is a known issue, we added those cases involved into blacklist in HiveCompatibilitySuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, even if we moved the extended parser first, I don't think we want to skip the DDLParser, right? in the meantime, we have to consider the parsing fallback (once fail, we have to resort to the DDLParser) for EVERY extended parser, then, why NOT just do the fallback in DDLParser by moving it ahead of time? That's exactly the currently implementation!

And I don't think the issues @scwf described is the motive we need to update the code here, probably a better solution is we define a unified DescribeCommand logical node, and it can be casted into different execution within the context (HiveContext / SQLContext).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree @chenghao-intel , we can define a unified DescribeCommand for that issue. And the order of DDLParser and sqlParser is not a big point since they cover different sql syntax range.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25519 has started for PR 4015 at commit 336cd89.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25519 has finished for PR 4015 at commit 336cd89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class SQLDialect
    • class DefaultSQLDialect extends SQLDialect
    • sys.error(s"$clazz is not the subclass of $
    • class HiveQLDialect extends SQLDialect

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25519/
Test PASSed.

@@ -71,7 +178,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
def getConf(key: String): String = conf.getConf(key)

/**
* Return the value of Spark SQL configuration property for the given key. If the key is not set
* Return the value of Sparkf SQL configuration property for the given key. If the key is not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25803 has started for PR 4015 at commit 983d53c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25803 has finished for PR 4015 at commit 983d53c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25803/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25807 has started for PR 4015 at commit d958589.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25807 has finished for PR 4015 at commit d958589.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class SQLDialect
    • class DefaultSQLDialect extends SQLDialect
    • class HiveQLDialect extends SQLDialect

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25807/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25808 has started for PR 4015 at commit 1c6edfa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25808 has finished for PR 4015 at commit 1c6edfa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class SQLDialect
    • class DefaultSQLDialect extends SQLDialect
    • class HiveQLDialect extends SQLDialect

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25808/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25866 has started for PR 4015 at commit c8f154d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25866 has finished for PR 4015 at commit c8f154d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class SQLDialect
    • class DefaultSQLDialect extends SQLDialect
    • class HiveQLDialect extends SQLDialect

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25866/
Test PASSed.

@chenghao-intel chenghao-intel changed the title [SPARK-5213] [SQL] [WIP] Sql Parser dialect support [SPARK-5213] [SQL] Sql Parser dialect support Jan 22, 2015
@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25947 has started for PR 4015 at commit b0e8084.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25947 has finished for PR 4015 at commit b0e8084.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • protected case class Keyword(str: String)
    • class SqlLexical extends StdLexical
    • abstract class SQLDialect
    • class DefaultSQLDialect extends SQLDialect
    • class HiveQLDialect extends SQLDialect

* interface for advanced user.
*
*/
abstract class Dialect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An abstract interface for adding a new SQL dialect.  A `Dialect` is responsible for creating a logical plan from a string representation of a query.  Since the `LogicalPlan` interface is not a public stable API, custom dialects will likely be tied to specific Spark releases.

Explicitly annotate this as an @DeveloperAPI.

@marmbrus
Copy link
Contributor

Final comments to improve user documentation. Otherwise LGTM.

@chenghao-intel
Copy link
Contributor Author

test this please

@chenghao-intel
Copy link
Contributor Author

retest this please

@chenghao-intel
Copy link
Contributor Author

@liancheng @rxin @marmbrus can you trigger the unit test for me?

Thanks.

@rxin
Copy link
Contributor

rxin commented Apr 27, 2015

I think Jenkins is having some trouble right now.

@rxin
Copy link
Contributor

rxin commented Apr 27, 2015

Jenkins, retest this please.

1 similar comment
@chenghao-intel
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31088 has started for PR 4015 at commit 493775c.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31088 has finished for PR 4015 at commit 493775c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Dialect
    • class DialectException(msg: String, cause: Throwable) extends Exception(msg, cause)
  • This patch adds the following new dependencies:
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
  • This patch removes the following dependencies:
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar

@chenghao-intel
Copy link
Contributor Author

cc @marmbrus

@asfgit asfgit closed this in 3ba5aaa May 1, 2015
@marmbrus
Copy link
Contributor

marmbrus commented May 1, 2015

Thanks, merged to master.

@scwf
Copy link
Contributor

scwf commented May 1, 2015

@marmbrus #5727 merged so actually this will fail mima test, now master branch failed due to mima check since you have merged this PR.

asfgit pushed a commit that referenced this pull request May 2, 2015
based on #4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser`  and we should construct `sqlParser` in sqlcontext according to the dialect
`protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))`

Author: Cheng Hao <[email protected]>
Author: scwf <[email protected]>

Closes #5827 from scwf/sqlparser1 and squashes the following commits:

81b9737 [scwf] comment fix
0878bd1 [scwf] remove comments
c19780b [scwf] fix mima tests
c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```

Author: Cheng Hao <[email protected]>

Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:

493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser`  and we should construct `sqlParser` in sqlcontext according to the dialect
`protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))`

Author: Cheng Hao <[email protected]>
Author: scwf <[email protected]>

Closes apache#5827 from scwf/sqlparser1 and squashes the following commits:

81b9737 [scwf] comment fix
0878bd1 [scwf] remove comments
c19780b [scwf] fix mima tests
c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```

Author: Cheng Hao <[email protected]>

Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:

493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser`  and we should construct `sqlParser` in sqlcontext according to the dialect
`protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))`

Author: Cheng Hao <[email protected]>
Author: scwf <[email protected]>

Closes apache#5827 from scwf/sqlparser1 and squashes the following commits:

81b9737 [scwf] comment fix
0878bd1 [scwf] remove comments
c19780b [scwf] fix mima tests
c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

```
# add the jar into the classpath
$hchengmydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)
```

Author: Cheng Hao <[email protected]>

Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits:

493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser`  and we should construct `sqlParser` in sqlcontext according to the dialect
`protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))`

Author: Cheng Hao <[email protected]>
Author: scwf <[email protected]>

Closes apache#5827 from scwf/sqlparser1 and squashes the following commits:

81b9737 [scwf] comment fix
0878bd1 [scwf] remove comments
c19780b [scwf] fix mima tests
c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1
493775c [Cheng Hao] update the code as feedback
81a731f [Cheng Hao] remove the unecessary comment
aab0b0b [Cheng Hao] polish the code a little bit
49b9d81 [Cheng Hao] shrink the comment for rebasing
@chenghao-intel chenghao-intel deleted the sqlparser branch July 2, 2015 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants