Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when trying to drop partitioned columns in Iceberg #15707

Merged

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Jan 13, 2023

Description

Improve the error message while dropping the partitioned column or support dropping the partitioned column.

CREATE TABLE test (id INTEGER, name VARCHAR) WITH (partitioning = ARRAY['id', 'truncate(name, 5)']);

ALTER TABLE test DROP COLUMN id;
-- Failed to drop column: Cannot find source column for partition field: 1000: id: identity(1

ALTER TABLE test DROP COLUMN name;
-- Failed to drop column: Cannot find source column for partition field: 1001: name_trunc: truncate[5](2)

Release notes

(X) This is not user-visible or docs only and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Jan 13, 2023
{
String tableName = "test_drop_partition_column_" + randomNameSuffix();
assertUpdate("CREATE TABLE " + tableName + " (col0 INTEGER, col1 INTEGER, col2 INTEGER) WITH (partitioning = Array['col2'])");
assertQueryFails("ALTER TABLE " + tableName + " DROP COLUMN col2", "Failed to drop column: Can not drop partition field:.*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping partitioned columns was disallowed in Iceberg library side even before this change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It was disallowed. But the message which was thrown was "Cannot find source column for partition field: *".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the commit title as "Improve error message when trying to drop partitioned columns in Iceberg" or something?

Copy link
Member

@ebyhr ebyhr Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed the connector allowed dropping partitioned columns if the transform is void and Iceberg spec says:

The void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables.

@alexjo2144 @krvikash Do you agree to keep the above behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this ever got fixed: apache/iceberg#4563
The issue was closed for inactivity.
But if it didn't, dropping a column like that may corrupt your table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables.

Hi @ebyhr, I think we already have this behavior We can drop a partition column in trino that is part of the older partition spec. This refers to apache/iceberg#4563

CREATE TABLE test2 (id INTEGER, name VARCHAR) WITH (partitioning = ARRAY['id', 'truncate(name, 5)']);
ALTER TABLE test2 SET PROPERTIES partitioning = ARRAY['id'];
ALTER TABLE test2 DROP COLUMN name;

The current PR refers to the issue when we are trying to drop a partition column that is part of the current partition spec.

Spark SQL uses ALTER TABLE ... DROP PARTITION FIELD ... statement to remove the column from partition spec, whereas trino uses ALTER TABLE ... SET PROPERTIES partitioning = ....

Copy link
Contributor Author

@krvikash krvikash Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE test2 (id INTEGER, name VARCHAR) WITH (partitioning = ARRAY['id', 'truncate(name, 5)']);
ALTER TABLE test2 SET PROPERTIES partitioning = ARRAY['id'];
ALTER TABLE test2 DROP COLUMN name;

We could add a test for this case in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, Fails to insert into iceberg table after dropping a partition column that was used by older partition specs. Raised #15729

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krvikash Please try these statements. DROP COLUMN was working previously. It mean that this PR not only improves the message, but also changes the behavior.

CREATE TABLE test (id int, col int) WITH (partitioning = ARRAY['void(col)']);
ALTER TABLE test DROP COLUMN col;

Copy link
Contributor Author

@krvikash krvikash Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ebyhr for pointing it out. I have added isVoidTransform to check for void transform.

However, Fails to insert into iceberg table after dropping a void transform partition column #15738 (same issue for select and update)

@krvikash krvikash force-pushed the add-drop-partition-column-test-in-iceberg branch 4 times, most recently from 6538b0f to 68db989 Compare January 13, 2023 13:41
@krvikash krvikash changed the title Verify partition column drop in iceberg table Improve error message when trying to drop partitioned columns in Iceberg Jan 13, 2023
@krvikash
Copy link
Contributor Author

Thanks, @ebyhr for the review. I have addressed the comments and rebased them.

@krvikash krvikash force-pushed the add-drop-partition-column-test-in-iceberg branch from 68db989 to 92be2ae Compare January 13, 2023 16:13
@krvikash krvikash added the no-release-notes This pull request does not require release notes entry label Jan 13, 2023
@krvikash krvikash force-pushed the add-drop-partition-column-test-in-iceberg branch 2 times, most recently from e8cf45e to 2bcfc7e Compare January 16, 2023 20:19
@ebyhr ebyhr force-pushed the add-drop-partition-column-test-in-iceberg branch from 2bcfc7e to 2cc7499 Compare January 16, 2023 22:09
@@ -1539,6 +1545,11 @@ public void dropColumn(ConnectorSession session, ConnectorTableHandle tableHandl
}
}

private static boolean isVoidTransform(PartitionField field)
{
return field.transform().equals(Transforms.alwaysNull());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Upcoming Iceberg 1.2.0 introduces Transform#isVoid method. apache/iceberg@d350c9b.

@ebyhr ebyhr merged commit 46ee6c9 into trinodb:master Jan 17, 2023
@github-actions github-actions bot added this to the 406 milestone Jan 17, 2023
@krvikash krvikash deleted the add-drop-partition-column-test-in-iceberg branch January 18, 2023 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

3 participants