-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Drop column of deleted partitioned field to Unbound partitionSpec #4602
API: Drop column of deleted partitioned field to Unbound partitionSpec #4602
Conversation
cc @alexjo2144 |
@felixYyu can you please add some information about your approach to the PR description? |
updated the PR description, Please check whether it is OK? @rdblue |
@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please? |
@rdblue The description of the solution has been updated, Could you review that again please? |
cc @findepi |
@alexjo2144 Could you testing with this patch, If there is still a problem, try another way. |
@rdblue Could you merge this PR? |
Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look? |
Thanks for linking these together, @ajantha-bhat ! :) |
@@ -421,6 +421,78 @@ public void testSparkTableAddDropPartitions() throws Exception { | |||
"spark table partition should be empty", 0, sparkTable().partitioning().length); | |||
} | |||
|
|||
@Test | |||
public void testUnboundPartitionSpecFormatVersion1() throws Exception { | |||
sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: since both tests are identical except for the format-version
property in the create statement, maybe we could combine them and supply the format-version as a parameter in a loop?
IntStream.rangeClosed(1, 2).forEach(version -> {
... // sql statements
});
This way it could be easily extended for future versions as well. WDYT?
|
||
sql("ALTER TABLE %s DROP COLUMN data", tableName); | ||
|
||
Assert.assertEquals( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the data read test, shall we add a test to read back the partitions
metatable contents as well?
Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts. @szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. Let's see if anyone else has any concerns
Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time. |
I'd be interested to take a look today too. |
Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)? |
assertEquals( | ||
"Should have expected rows", | ||
ImmutableList.of(row(1L, Timestamp.valueOf("2022-01-01 10:00:00"))), | ||
sql("SELECT * FROM %s WHERE ts < current_timestamp()", tableName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading from the table is actually breaking on current master:
Caused by: java.lang.NullPointerException: Type cannot be null
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:907)
at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:446)
at org.apache.iceberg.types.Types$NestedField.optional(Types.java:415)
at org.apache.iceberg.PartitionSpec.partitionType(PartitionSpec.java:135)
at org.apache.iceberg.Partitioning.partitionType(Partitioning.java:233)
at org.apache.iceberg.spark.source.SparkTable.metadataColumns(SparkTable.java:215)
@felixYyu could you please rebase and address #4602 (comment) so that we can get this PR reviewed & merged? |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
fix #4563 issue
Test case of problem recurrence following:
formatVersion 1 and formatVersion 2 of table have different exceptions occurred
Cannot find source column for partition field: 1000: data_trunc_4: void(3)
Cannot find source column: 3
See the log below for details
The proposed solution is following
version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.
the version 1 solution is skip unpatition field check:
version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.
the version 2 solution is skip deleting field unbound partition: