-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix missing delete files from transaction #9354
Conversation
8f5da1e
to
245ae82
Compare
With retries with conflicting manifest merges. Ryan pointed out that this might also occur whith the deletes. However, I was unable to replicate this with a test. I've added the test that should uncover this issue when merging DELETE manifests, and deleting the old one before the transaction is succesfully commited.
245ae82
to
fa3a58e
Compare
0be28ac
to
d10d041
Compare
for (ManifestFile cachedNewDeleteManifest : cachedNewDeleteManifests) { | ||
if (!committed.contains(cachedNewDeleteManifest)) { | ||
deleteFile(cachedNewDeleteManifest.path()); | ||
hasDeleteDeletes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think a better name would be clearCachedDeleteManifests (but I see this was just following the pattern for data file manifests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another nit: I'd probably use the same pattern we did for the data manifest case where we just null it out and don't exercise the loop if it's null, but I see that the logic is the same with clearing the cached manifests and the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't null out cachedNewDeleteManifests
because it's final
. So the only thing that's being done in the code is to clear that collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also looked at that. I think it is for performance reasons since writer.toManifestFiles();
returns a list
. For the deletes, we do a addAll
which is in O(n)
. I'm a bit torn, I like the performance optimization, but in practice, I don't think that we write that many manifests, so n
is rather small. Therefore I prefer avoiding nulling it out to make the code easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Fokko Driesprong <[email protected]>
With retries with conflicting manifest merges. This makes the caching a bit more defensive so cached emptied when cleaning up a commit.