-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43390][SQL] DSv2 allows CTAS/RTAS to reserve schema nullability #41070
Conversation
* Return whether to reserve schema nullability of query output or forcibly use nullable schema | ||
* on creating table implicitly, e.g. CTAS/RTAS. | ||
*/ | ||
default boolean createTableReserveSchemaNullability() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
/**
* If true, mark all the fields of the query schema as nullable when executing
* CREATE/REPLACE TABLE ... AS SELECT ... and creating the table.
*/
default boolean useNullableQuerySchema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, changed
@@ -3261,3 +3278,7 @@ class FakeV2Provider extends SimpleTableProvider { | |||
throw new UnsupportedOperationException("Unnecessary for DDL tests") | |||
} | |||
} | |||
|
|||
class TestCreateTableReserveSchemaNullabilityCatalog extends InMemoryCatalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ForbidSchemaNullabilityCatalog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"forbid" seems not accurate here, would prefer to use "reserve" to represent respecting the query schema nullability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means the name seems too long. Could you rename it with simple one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "ReserveSchemaNullabilityCatalog"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to "ReserveSchemaNullabilityCatalog"
thanks, merging to master! |
### What changes were proposed in this pull request? Add a new method `useNullableQuerySchema` in `Table`, to allow the DataSource implementation to declare whether they need to reserve schema nullability on CTAS/RTAS. ### Why are the changes needed? SPARK-28837 forcibly uses the nullable schema on CTAS/RTAS, which seems too aggressive: 1. The existing matured RDBMSs have different behaviors for reserving schema nullability on CTAS/RTAS, as mentioned in apache#25536, PostgreSQL forcibly uses the nullable schema, but MySQL respects the query's output schema nullability. 2. Some OLAP systems(e.g. ClickHouse) are perf-sensitive for nullable, and have strict restrictions on table schema, e.g. the primary keys are not allowed to be nullable. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a new DSv2 API, but the default implementation reserves backward compatibility. ### How was this patch tested? UTs are updated. Closes apache#41070 from pan3793/SPARK-43390. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Add a new method
useNullableQuerySchema
inTable
, to allow the DataSource implementation to declare whether they need to reserve schema nullability on CTAS/RTAS.Why are the changes needed?
SPARK-28837 forcibly uses the nullable schema on CTAS/RTAS, which seems too aggressive:
Does this PR introduce any user-facing change?
Yes, this PR adds a new DSv2 API, but the default implementation reserves backward compatibility.
How was this patch tested?
UTs are updated.