Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix missing delete files from transaction #9354

Merged
merged 2 commits into from
Dec 21, 2023

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Dec 20, 2023

With retries with conflicting manifest merges. This makes the caching a bit more defensive so cached emptied when cleaning up a commit.

@github-actions github-actions bot added the core label Dec 20, 2023
@Fokko Fokko marked this pull request as draft December 20, 2023 13:28
With retries with conflicting manifest merges.

Ryan pointed out that this might also occur whith the deletes.
However, I was unable to replicate this with a test. I've added
the test that should uncover this issue when merging DELETE
manifests, and deleting the old one before the transaction
is succesfully commited.
@Fokko Fokko marked this pull request as ready for review December 20, 2023 19:26
for (ManifestFile cachedNewDeleteManifest : cachedNewDeleteManifests) {
if (!committed.contains(cachedNewDeleteManifest)) {
deleteFile(cachedNewDeleteManifest.path());
hasDeleteDeletes = true;
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think a better name would be clearCachedDeleteManifests (but I see this was just following the pattern for data file manifests)

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another nit: I'd probably use the same pattern we did for the data manifest case where we just null it out and don't exercise the loop if it's null, but I see that the logic is the same with clearing the cached manifests and the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't null out cachedNewDeleteManifests because it's final. So the only thing that's being done in the code is to clear that collection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also looked at that. I think it is for performance reasons since writer.toManifestFiles(); returns a list. For the deletes, we do a addAll which is in O(n). I'm a bit torn, I like the performance optimization, but in practice, I don't think that we write that many manifests, so n is rather small. Therefore I prefer avoiding nulling it out to make the code easier to read.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also worth mentioning that @Fokko and me explored adding a test that passes with this fix and fails without it, but we were not able to come up with a test. However, We still wanted to align the handling of delete manifests with how it was done for #9230

@nastra nastra merged commit c340915 into apache:main Dec 21, 2023
41 checks passed
nastra pushed a commit to nastra/iceberg that referenced this pull request Dec 21, 2023
nastra added a commit that referenced this pull request Dec 21, 2023
@rdblue
Copy link
Contributor

rdblue commented Dec 21, 2023

Thanks for getting this in @nastra and @Fokko!

@Fokko Fokko deleted the fd-fix-deletes-as-well branch December 21, 2023 21:13
lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants