API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

felixYyu · 2022-04-21T11:09:47Z

fix #4563 issue

Test case of problem recurrence following:

Insert data into table T without partitions
Alter table add partition column A
Continue inserting data into table T
Alter table drop partition column A
Continue inserting data into table T
Alter table drop column A

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

version=1 table exception is Cannot find source column for partition field: 1000: data_trunc_4: void(3)
version=2 table exception is Cannot find source column: 3

See the log below for details

The proposed solution is following

version=1 After alter drop the partition field, this field partition transform has been changed to 'void'. and after alter drop column field, this field will be deleted from the current schema, so this partition field source type don't find.

the version 1 solution is skip unpatition field check:
```
if (sourceType == null && field.transform().toString().equalsIgnoreCase("void")) {
    continue;
}
```

version=2 After alter drop column field, this field will be deleted from the current schema, so in unbound partition process the deleted field don't find.

the version 2 solution is skip deleting field unbound partition:

for (UnboundPartitionField field : fields) {
  Types.NestedField column = schema.findField(field.sourceId);
  if (column != null) {
    if (field.partitionId != null) {
      builder.add(field.sourceId, field.partitionId, field.name, field.transformAsString);
    } else {
      builder.add(field.sourceId, field.name, field.transformAsString);
    }
  }
}

api/src/main/java/org/apache/iceberg/PartitionSpec.java

api/src/main/java/org/apache/iceberg/UnboundPartitionSpec.java

felixYyu · 2022-04-21T11:43:38Z

cc: @rdblue @findepi

findepi · 2022-04-21T12:33:39Z

cc @alexjo2144

rdblue · 2022-04-21T15:50:34Z

@felixYyu can you please add some information about your approach to the PR description?

felixYyu · 2022-04-22T01:01:54Z

updated the PR description, Please check whether it is OK? @rdblue

rdblue · 2022-04-24T22:15:31Z

@felixYyu, I was asking for you to describe the problem and the proposed solution that you've implemented here. I don't see much information about the solution you propose. Could you include that please?

felixYyu · 2022-04-25T03:46:48Z

@rdblue The description of the solution has been updated, Could you review that again please?

felixYyu · 2022-04-30T23:46:02Z

cc @findepi

felixYyu · 2022-05-05T07:40:16Z

@alexjo2144 Could you testing with this patch, If there is still a problem, try another way.

felixYyu · 2022-07-15T06:00:07Z

@rdblue Could you merge this PR?

marton-bod · 2022-09-22T14:16:27Z

Hi @felixYyu thank you for this fix! Could you please rebase your patch and then we could take a look?

ajantha-bhat · 2022-09-22T15:25:11Z

is it the same as #5707

linked issues;
#5676

#5399

#4563

I think all these 3 issues are the same and 3 different persons are working on implementation (me, @Fokko, @felixYyu) 🙈

marton-bod · 2022-09-22T15:49:43Z

Thanks for linking these together, @ajantha-bhat ! :)

marton-bod · 2022-09-26T14:43:10Z

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java

@@ -421,6 +421,78 @@ public void testSparkTableAddDropPartitions() throws Exception {
        "spark table partition should be empty", 0, sparkTable().partitioning().length);
  }

+  @Test
+  public void testUnboundPartitionSpecFormatVersion1() throws Exception {
+    sql(


Just an idea: since both tests are identical except for the format-version property in the create statement, maybe we could combine them and supply the format-version as a parameter in a loop?

IntStream.rangeClosed(1, 2).forEach(version -> { ... // sql statements });

This way it could be easily extended for future versions as well. WDYT?

marton-bod · 2022-09-26T14:45:57Z

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java

+
+    sql("ALTER TABLE %s DROP COLUMN data", tableName);
+
+    Assert.assertEquals(


In addition to the data read test, shall we add a test to read back the partitions metatable contents as well?

marton-bod · 2022-09-26T14:46:30Z

Thanks @felixYyu for the fix, and @Fokko for resolving the conflicts.

@szehon-ho @flyrain @pvary Could you please take a look at this patch? Do you see any problems with this approach?

Fokko

This makes sense to me. Let's see if anyone else has any concerns

szehon-ho · 2022-09-26T23:50:07Z

Yea it looks reasonable. Curious if metadata tables also suffer any problem from this scenario, will need to test when I have time.

aokolnychyi · 2022-09-27T15:46:18Z

I'd be interested to take a look today too.

szehon-ho · 2022-09-27T22:21:01Z

Tested the patch with metadata tables, seems to fix all the issues. Also nit: should we add the test to 3.3, so it lives longer (I suppose 3.3 will be used to clone further Spark versions, and if we don't add it we might lose it if nobody forward-ports it)?

Fokko · 2022-10-04T16:15:29Z

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java

+    assertEquals(
+        "Should have expected rows",
+        ImmutableList.of(row(1L, Timestamp.valueOf("2022-01-01 10:00:00"))),
+        sql("SELECT * FROM %s WHERE ts < current_timestamp()", tableName));


Reading from the table is actually breaking on current master:

Caused by: java.lang.NullPointerException: Type cannot be null at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:907) at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:446) at org.apache.iceberg.types.Types$NestedField.optional(Types.java:415) at org.apache.iceberg.PartitionSpec.partitionType(PartitionSpec.java:135) at org.apache.iceberg.Partitioning.partitionType(Partitioning.java:233) at org.apache.iceberg.spark.source.SparkTable.metadataColumns(SparkTable.java:215)

nastra · 2023-01-19T11:25:16Z

@felixYyu could you please rebase and address #4602 (comment) so that we can get this PR reviewed & merged?

github-actions · 2024-08-08T00:13:36Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2024-08-16T00:13:08Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

fix apache#4563 issue

32d4bec

github-actions bot added API spark labels Apr 21, 2022

fix NullPointerException

98ee0fe

felixYyu commented Apr 21, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/PartitionSpec.java Outdated Show resolved Hide resolved

api/src/main/java/org/apache/iceberg/UnboundPartitionSpec.java Show resolved Hide resolved

Fokko added 3 commits September 26, 2022 12:28

Merge branch 'master' into dropping-column-used-old-PartitionSpecs

f193aa2

Fix Spotless

831ec01

Fix spotless

ca05e06

marton-bod reviewed Sep 26, 2022

View reviewed changes

Fokko approved these changes Sep 26, 2022

View reviewed changes

rdblue changed the title ~~API：Drop column of deleted partitioned field to Unbound partitionSpec~~ API: Drop column of deleted partitioned field to Unbound partitionSpec Sep 28, 2022

Fokko mentioned this pull request Oct 4, 2022

Spark: Set the version explicitly #5907

Merged

Merge branch 'master' into dropping-column-used-old-PartitionSpecs

1e9c1c6

Fokko reviewed Oct 4, 2022

View reviewed changes

Fokko requested a review from aokolnychyi October 4, 2022 16:22

Fokko added 2 commits October 4, 2022 18:27

Fix spotless

fee7bc9

Add missing imports

f7e2fa9

krvikash mentioned this pull request Jan 18, 2023

Fails to insert into iceberg table after dropping a partition column that was used by older partition specs trinodb/trino#15729

Open

krvikash mentioned this pull request Feb 21, 2023

[WIP]: Fix drop void partition column in iceberg connector trinodb/trino#15740

Closed

github-actions bot added the stale label Aug 8, 2024

github-actions bot closed this Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

felixYyu commented Apr 21, 2022 •

edited

Loading

felixYyu commented Apr 21, 2022

findepi commented Apr 21, 2022

rdblue commented Apr 21, 2022

felixYyu commented Apr 22, 2022

rdblue commented Apr 24, 2022

felixYyu commented Apr 25, 2022

felixYyu commented Apr 30, 2022

felixYyu commented May 5, 2022

felixYyu commented Jul 15, 2022

marton-bod commented Sep 22, 2022

ajantha-bhat commented Sep 22, 2022 •

edited

Loading

marton-bod commented Sep 22, 2022

marton-bod Sep 26, 2022

marton-bod Sep 26, 2022

marton-bod commented Sep 26, 2022

Fokko left a comment

szehon-ho commented Sep 26, 2022

aokolnychyi commented Sep 27, 2022

szehon-ho commented Sep 27, 2022

Fokko Oct 4, 2022

nastra commented Jan 19, 2023

github-actions bot commented Aug 8, 2024

github-actions bot commented Aug 16, 2024


		sql("ALTER TABLE %s DROP COLUMN data", tableName);

		Assert.assertEquals(

API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

API: Drop column of deleted partitioned field to Unbound partitionSpec #4602

Conversation

felixYyu commented Apr 21, 2022 • edited Loading

Test case of problem recurrence following:

formatVersion 1 and formatVersion 2 of table have different exceptions occurred

The proposed solution is following

felixYyu commented Apr 21, 2022

findepi commented Apr 21, 2022

rdblue commented Apr 21, 2022

felixYyu commented Apr 22, 2022

rdblue commented Apr 24, 2022

felixYyu commented Apr 25, 2022

felixYyu commented Apr 30, 2022

felixYyu commented May 5, 2022

felixYyu commented Jul 15, 2022

marton-bod commented Sep 22, 2022

ajantha-bhat commented Sep 22, 2022 • edited Loading

marton-bod commented Sep 22, 2022

marton-bod Sep 26, 2022

Choose a reason for hiding this comment

marton-bod Sep 26, 2022

Choose a reason for hiding this comment

marton-bod commented Sep 26, 2022

Fokko left a comment

Choose a reason for hiding this comment

szehon-ho commented Sep 26, 2022

aokolnychyi commented Sep 27, 2022

szehon-ho commented Sep 27, 2022

Fokko Oct 4, 2022

Choose a reason for hiding this comment

nastra commented Jan 19, 2023

github-actions bot commented Aug 8, 2024

github-actions bot commented Aug 16, 2024

felixYyu commented Apr 21, 2022 •

edited

Loading

ajantha-bhat commented Sep 22, 2022 •

edited

Loading