-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48824][SQL] Add Identity Column SQL syntax #47614
[SPARK-48824][SQL] Add Identity Column SQL syntax #47614
Conversation
de11615
to
cb7b13f
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IdentityColumn.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalogCapability.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IdentityColumn.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
45c5e9f
to
9632ce0
Compare
8b5d619
to
284caea
Compare
@dtenedor Hi Daniel, thanks for reviewing the PR! We changed the PR a bit to allow both IntegerType and LongType in the SQL interface to define an identity column, and it is up to the underlying framework that extends Spark to throw its own error if it doesn't want to support more than LongType. Are you fine with this change? |
6c99fc2
to
02ce0ac
Compare
"DataType <dataType> is not supported for IDENTITY columns." | ||
], | ||
"sqlState" : "428H2" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Always nice to see thoughtful pick for SQLSTATE :-)
9f2e916
to
f3da61b
Compare
Address comments update Address the comments Change test name
f3da61b
to
77eae56
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
import org.apache.spark.sql.errors.QueryCompilationErrors | ||
import org.apache.spark.sql.types.{StructField, StructType} | ||
|
||
case class IdentityColumnSpec(start: Long, step: Long, allowExplicitInsert: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a public API, and we shouldn't put it in org.apache.spark.sql.catalyst.util
. How about making it a java class in org.apache.spark.sql.connector.catalog
?
@@ -51,14 +56,22 @@ case class ColumnDefinition( | |||
} | |||
|
|||
def toV2Column(statement: String): V2Column = { | |||
val finalMetadata = if (identityColumnSpec.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this while v2 column already has an explicit field for identity column spec?
sql/api/src/main/java/org/apache/spark/sql/connector/catalog/IdentityColumnSpec.java
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
} | ||
} else { | ||
throw SparkException | ||
.internalError(s"Invalid identity column sequence generator option: ${option.getText}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also use internalError for unrecognized actions here:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Line 761 in 5533c81
throw SparkException.internalError( |
Otherwise we need to add a new error class for ParseException.
parameters = Map("error" -> "'a'", "hint" -> ": missing '('") | ||
parameters = Map("error" -> "'a'", "hint" -> "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, did you change the results for generated columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because now after GENERATED ALWAYS AS
there can be either (
or IDENTITY
, so that the hint for the parse exception changes.
* Returns the identity column specification of this table column. Null means no identity column. | ||
*/ | ||
@Nullable | ||
IdentityColumnSpec identityColumnSpec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to double check, only one of the defaultValue
, generationExpression
and identityColumnSpec
can be non-null, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Yes only one of the defaultValue
, generationExpression
and identityColumnSpec
can be non-null.
We blocked generation expression to be specified with identity column, but we did not block defaultValue
to be specified with identity column. Will provide a fix in this PR.
/** | ||
* Identity column specification. | ||
*/ | ||
public class IdentityColumnSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add @Evolving
thanks, merging to master! |
### What changes were proposed in this pull request? Add SQL support for creating identity columns. Users can specify a column `GENERATED ALWAYS AS IDENTITY(identityColumnSpec)` , where identity values are **always** generated by the system, or `GENERATED BY DEFAULT AS IDENTITY(identityColumnSpec)`, where users can specify the identity values. Users can optionally specify the starting value of the column (default = 1) and the increment/step of the column (default = 1). Also we allow both `START WITH <start> INCREMENT BY <step>` and `INCREMENT BY <step> START WITH <start>` It allows flexible ordering of the increment and starting values, as both variants are used in the wild by other systems (e.g. [PostgreSQL](https://www.postgresql.org/docs/current/sql-createsequence.html) [Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/CREATE-SEQUENCE.html#GUID-E9C78A8C-615A-4757-B2A8-5E6EFB130571)). For example, we can define ``` CREATE TABLE default.example ( id LONG GENERATED ALWAYS AS IDENTITY, id1 LONG GENERATED ALWAYS AS IDENTITY(), id2 LONG GENERATED BY DEFAULT AS IDENTITY(START WITH 0), id3 LONG GENERATED ALWAYS AS IDENTITY(INCREMENT BY 2), id4 LONG GENERATED BY DEFAULT AS IDENTITY(START WITH 0 INCREMENT BY -10), id5 LONG GENERATED ALWAYS AS IDENTITY(INCREMENT BY 2 START WITH -8), value LONG ) ``` This will enable defining identity columns in Spark SQL for data sources that support it. To be more specific this PR - Adds parser support for GENERATED { BY DEFAULT | ALWAYS } AS IDENTITY in create/replace table statements. Identity column specifications are temporarily stored in the field's metadata, and then are parsed/verified in DataSourceV2Strategy and used to instantiate v2 [Column] - Adds TableCatalog::capabilities() and TableCatalogCapability.SUPPORTS_CREATE_TABLE_WITH_IDENTITY_COLUMNS This will be used to determine whether to allow specifying identity columns or whether to throw an exception. ### Why are the changes needed? A SQL API is needed to create Identity Columns. ### Does this PR introduce _any_ user-facing change? It allows the aforementioned SQL syntax to create identity columns in a table. ### How was this patch tested? Positive and negative unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47614 from zhipengmao-db/zhipengmao-db/SPARK-48824-id-syntax. Authored-by: zhipeng.mao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Add SQL support for creating identity columns. Users can specify a column
GENERATED ALWAYS AS IDENTITY(identityColumnSpec)
, where identity values are always generated by the system, orGENERATED BY DEFAULT AS IDENTITY(identityColumnSpec)
, where users can specify the identity values.Users can optionally specify the starting value of the column (default = 1) and the increment/step of the column (default = 1). Also we allow both
START WITH <start> INCREMENT BY <step>
and
INCREMENT BY <step> START WITH <start>
It allows flexible ordering of the increment and starting values, as both variants are used in the wild by other systems (e.g. PostgreSQL Oracle).
For example, we can define
This will enable defining identity columns in Spark SQL for data sources that support it.
To be more specific this PR
Why are the changes needed?
A SQL API is needed to create Identity Columns.
Does this PR introduce any user-facing change?
It allows the aforementioned SQL syntax to create identity columns in a table.
How was this patch tested?
Positive and negative unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.