Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup metadata file when commitNewTable fails for the Iceberg table #14869

Merged
merged 1 commit into from
Feb 16, 2023

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Nov 2, 2022

Description

Fixes #14798

Iceberg connector creates a new metadata file when we do DDL/DML operation. In such case, if the operation gets fails then the metadata file does not get cleanup. Metastore Table Operation can fail in various cases like permission denied, and not having valid credentials. This fix will cleanup the metadata file if gets it is created.

The fix is inspired from https://github.com/apache/iceberg/blob/3cddc9f28c93b9231060ecb6b90e2d524bd5d160/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L142

Non-technical explanation

NA

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 2, 2022
@krvikash
Copy link
Contributor Author

krvikash commented Nov 2, 2022

TODO Add test cases.

@krvikash krvikash changed the title Fix metadata file cleanup in case of failure Cleanup metadata file when commit fails for the Iceberg table Nov 2, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 2 times, most recently from 4d110da to d2e44a1 Compare November 3, 2022 06:50
@krvikash krvikash changed the title Cleanup metadata file when commit fails for the Iceberg table Cleanup metadata file when commitNewTable fails for the Iceberg table Nov 3, 2022
protected void cleanupMetadata(String metadataLocation)
{
if (fileIo.newInputFile(metadataLocation).exists()) {
fileIo.deleteFile(metadataLocation);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a local File system, This operation does not delete the parent folder.

@krvikash krvikash self-assigned this Nov 7, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from cabcba5 to f1d331c Compare November 17, 2022 08:01
@krvikash krvikash changed the title Cleanup metadata file when commitNewTable fails for the Iceberg table WIP: Cleanup metadata file when commitNewTable fails for the Iceberg table Dec 22, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from 0150fee to 95036ab Compare December 27, 2022 13:29
@krvikash krvikash marked this pull request as ready for review December 27, 2022 14:02
@krvikash krvikash added the no-release-notes This pull request does not require release notes entry label Dec 27, 2022
@krvikash krvikash changed the title WIP: Cleanup metadata file when commitNewTable fails for the Iceberg table Cleanup metadata file when commitNewTable fails for the Iceberg table Dec 27, 2022
@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 3 times, most recently from e90cc3a to c287c6a Compare December 29, 2022 13:26
@krvikash krvikash requested a review from homar December 29, 2022 14:15

protected boolean isCommitSuccess(String metadataLocation)
{
AtomicReference<Boolean> isSuccess = new AtomicReference<>(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why atomic when it runs in one thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krvikash krvikash force-pushed the fix-iceberg-metadata-cleanup branch 2 times, most recently from 5e5a1ad to 2faf497 Compare January 18, 2023 12:40
}
catch (SchemaNotFoundException
| TableAlreadyExistsException
| UnsupportedOperationException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why UnsupportedOperationException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for createTable implementation and the exceptions thrown by them. InMemoryThriftMetastore#createTable throw UnsupportedOperationException.

But now I realized that InMemoryThriftMetastore is implemented for test cases. So UnsupportedOperationException check is needed.


import java.io.IOException;

@Test(singleThreaded = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. It was not needed earlier. But now it's needed with the latest change (// testException is a shared mutable state).

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

@Test(singleThreaded = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?


import java.io.IOException;

@Test(singleThreaded = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why single threaded?

import static org.assertj.core.api.Assertions.assertThatThrownBy;

@Test(singleThreaded = true)
public abstract class BaseIcebergFileCreateTableFailureTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be a base class with two subclasses.

  • one option is to just move QueryRunner setup to a test method (or helper of a test method) and have ordinary test class with two test methods
    • clean
    • the downside is that you pay query runner setup cost twice
  • another option is to just have a query runner and a test instance field (eg AtomicReference) which determines what exception to throw
    • this would be very similar to the code you have
    • a bit less clean as you have a mutable field
    • but you pay query runner setup cost once only
    • this is the option i'd implement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the 2nd option.

throws Exception
{
this.dataDirectory = Files.createTempDirectory("test_iceberg_create_table_failure");
this.metastore = new FileHiveMetastore(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure we need that test, since file metastore is just a testing utliity.
if we use file metastore as an approximation of HMS, that's also fine, but let's have a code comment explaining that

@findinpath
Copy link
Contributor

@krvikash please rebase & resolve the conflicts.

@krvikash
Copy link
Contributor Author

krvikash commented Feb 2, 2023

Rebased with master.

@krvikash
Copy link
Contributor Author

Hi @ebyhr, when you get time could you please review this PR?

@krvikash
Copy link
Contributor Author

Thanks, @ebyhr for the review. I have addressed the comments and updated the PR.

@ebyhr
Copy link
Member

ebyhr commented Feb 14, 2023

/test-with-secrets sha=311ae6e98b205d8d9eea761fd75b0b4d1cd375d7

@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from 311ae6e to d657166 Compare February 15, 2023 05:00
@ebyhr
Copy link
Member

ebyhr commented Feb 15, 2023

I pushed a small fix to rename test classes:

  • TestIcebergFileCreateTableFailureTestTestIcebergFileMetastoreCreateTableFailure
  • TestIcebergGlueCreateTableFailureTestTestIcebergGlueCreateTableFailure

@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from d657166 to b36e2aa Compare February 15, 2023 05:02
@ebyhr ebyhr force-pushed the fix-iceberg-metadata-cleanup branch from b36e2aa to b8dd25d Compare February 15, 2023 05:02
@ebyhr ebyhr merged commit 666dc6f into trinodb:master Feb 16, 2023
@github-actions github-actions bot added this to the 407 milestone Feb 16, 2023
@krvikash krvikash deleted the fix-iceberg-metadata-cleanup branch February 16, 2023 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

Cleanup metadata file when commit fails to create Iceberg table
5 participants