Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

Closed

Conversation

felixYyu
Copy link
Contributor

@felixYyu felixYyu commented Apr 21, 2022

fix #4563 issue

Test case of problem recurrence following:

  1. Insert data into table T without partitions
  2. Alter table add partition column A
  3. Continue inserting data into table T
  4. Alter table drop partition column A
  5. Continue inserting data into table T
  6. Alter table drop column A

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

  • version=1 table exception is Cannot find source column for partition field: 1000: data_trunc_4: void(3)
  • version=2 table exception is Cannot find source column: 3

See the log below for details

The proposed solution is following

  • version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.

    the version 1 solution is skip unpatition field check:

    if (sourceType == null && field.transform().toString().equalsIgnoreCase("void")) {
        continue;
    }
    
  • version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.

    the version 2 solution is skip deleting field unbound partition:

    for (UnboundPartitionField field : fields) {
      Types.NestedField column = schema.findField(field.sourceId);
      if (column != null) {
        if (field.partitionId != null) {
          builder.add(field.sourceId, field.partitionId, field.name, field.transformAsString);
        } else {
          builder.add(field.sourceId, field.name, field.transformAsString);
        }
      }
    }
    

@felixYyu
Copy link
Contributor Author

cc: @rdblue @findepi

@findepi
Copy link
Member

findepi commented Apr 21, 2022

cc @alexjo2144

@rdblue
Copy link
Contributor

rdblue commented Apr 21, 2022

@felixYyu can you please add some information about your approach to the PR description?

@felixYyu
Copy link
Contributor Author

updated the PR description, Please check whether it is OK? @rdblue

@rdblue
Copy link
Contributor

rdblue commented Apr 24, 2022

@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please?

@felixYyu
Copy link
Contributor Author

@rdblue The description of the solution has been updated, Could you review that again please?

@felixYyu
Copy link
Contributor Author

cc @findepi

@felixYyu
Copy link
Contributor Author

felixYyu commented May 5, 2022

@alexjo2144 Could you testing with this patch, If there is still a problem, try another way.

@felixYyu
Copy link
Contributor Author

@rdblue Could you merge this PR?

@marton-bod
Copy link
Collaborator

Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look?

@ajantha-bhat
Copy link
Member

ajantha-bhat commented Sep 22, 2022

is it the same as #5707

linked issues;
#5676

#5399

#4563

I think all these 3 issues are the same and 3 different persons are working on implementation (me, @Fokko, @felixYyu) 🙈

@marton-bod
Copy link
Collaborator

Thanks for linking these together, @ajantha-bhat ! :)

@@ -421,6 +421,78 @@ public void testSparkTableAddDropPartitions() throws Exception {
"spark table partition should be empty", 0, sparkTable().partitioning().length);
}

@Test
public void testUnboundPartitionSpecFormatVersion1() throws Exception {
sql(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: since both tests are identical except for the format-version property in the create statement, maybe we could combine them and supply the format-version as a parameter in a loop?

IntStream.rangeClosed(1, 2).forEach(version -> {
   ... // sql statements
});

This way it could be easily extended for future versions as well. WDYT?


sql("ALTER TABLE %s DROP COLUMN data", tableName);

Assert.assertEquals(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the data read test, shall we add a test to read back the partitions metatable contents as well?

@marton-bod
Copy link
Collaborator

Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts.

@szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Let's see if anyone else has any concerns

@szehon-ho
Copy link
Collaborator

Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time.

@aokolnychyi
Copy link
Contributor

I'd be interested to take a look today too.

@szehon-ho
Copy link
Collaborator

Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)?

@rdblue rdblue changed the title API:Drop column of deleted partitioned field to Unbound partitionSpec API: Drop column of deleted partitioned field to Unbound partitionSpec Sep 28, 2022
assertEquals(
"Should have expected rows",
ImmutableList.of(row(1L, Timestamp.valueOf("2022-01-01 10:00:00"))),
sql("SELECT * FROM %s WHERE ts < current_timestamp()", tableName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading from the table is actually breaking on current master:

Caused by: java.lang.NullPointerException: Type cannot be null
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:907)
	at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:446)
	at org.apache.iceberg.types.Types$NestedField.optional(Types.java:415)
	at org.apache.iceberg.PartitionSpec.partitionType(PartitionSpec.java:135)
	at org.apache.iceberg.Partitioning.partitionType(Partitioning.java:233)
	at org.apache.iceberg.spark.source.SparkTable.metadataColumns(SparkTable.java:215)

@Fokko Fokko requested a review from aokolnychyi October 4, 2022 16:22
@nastra
Copy link
Contributor

nastra commented Jan 19, 2023

@felixYyu could you please rebase and address #4602 (comment) so that we can get this PR reviewed & merged?

Copy link

github-actions bot commented Aug 8, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 8, 2024
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ALTER TABLE ... DROP COLUMN allows dropping a column used by old PartitionSpecs
9 participants