-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten #11131
base: main
Are you sure you want to change the base?
Conversation
Publishing a draft so I can test against entire CI. I need to think more about a good way to benchmark this and if there's even more reasonable optimizations that I can do here Well another possible optimization to think through: If we have both the manifestLocation and the ordinal position of the content file in the manifest AND there are no delete expressions or deletions by pure paths, we can possibly just write the new manifest with entries that are not part of the referenced positions without having to evaluate file paths or predicates against manifest entries. We could just evaluate against the positions (every entry would be compared against the "deleted" pos set) as opposed to file paths/partition values, which should be a bit more performant. |
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
ad1dd92
to
b230f1e
Compare
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
47cda8a
to
4099608
Compare
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
4099608
to
bfed848
Compare
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On testing, even though SparkRewriteDataFilesAction exercises this new path I think it's worth adding a separate test in TestRewriteFiles, or if https://github.com/apache/iceberg/pull/11166/files gets in first can also add a test to RowDelta. The test should explicitly read entries from manifests to delete
bfed848
to
d0f28a6
Compare
be432d6
to
5eece22
Compare
if (filesToDelete > 0) { | ||
DeleteFile deleteFile = FileGenerationUtil.generatePositionDeleteFile(table, dataFile); | ||
rowDelta.addDeletes(deleteFile); | ||
generatedDeleteFiles.add(deleteFile); | ||
DeleteFile pendingDeleteFile = | ||
FileGenerationUtil.generatePositionDeleteFile(table, dataFile); | ||
generatedPendingDeleteFiles.add(pendingDeleteFile); | ||
filesToDelete--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opting for keeping the benchmarking pretty simple for now. We just use the percentage and this will correlate linearly with the amount of manifests which need to get rewritten. We could test cases where we vary the distribution of deletes across partitions, but don't think it's strictly required now since the current benchmark does indicate this is a good improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the new and the old logic replace all delete files. Your PR would benefit use cases when a small number of manifests has to be rewritten, so I'd probably add numFiles
data and delete files but replace only a percentage of those.
Also, I would consider running the same benchmark for an unpartitioned table as it will not have the partition filters and Iceberg would be forced to scan through all of the metadata even if a single delete file is replaced (without your PR, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the benchmark to only replace a percentage of the delete files rather than before where the percentage was more a ratio of how many data files had delete files. That's probably a more useful bench for evaluating synchronous maintenance since we get a sense of how expensive the operation is at varying levels of replacement + varying levels of files.
After updating to have the set be a source of truth for delete manifests, a more noticeable difference is observed, as expected! I updated the benchmark results in the PR description.
Seems like after my latest updates to not add to the delete paths if a manifest location is defined, some cherry pick test cases are failing. Taking a look. |
b37854d
to
07a3b1b
Compare
2e67a18
to
726fcfb
Compare
…o determine if a given manifest needs to be rewritten or not
726fcfb
to
ec6c919
Compare
// The current set of referenced manifests can be trusted if it is a subset of the manifests | ||
// being filtered. If a single referenced manifest is not in the set of manifests being filtered | ||
// this indicates that the referenced manifests are stale and cannot be trusted. | ||
Set<String> manifestLocations = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These manifests are the set of manifests in the parent snapshot. I think the check that the referenced manifests is a subset of the manifests in the parent is sufficient. The reason is in the case of a concurrent update which removes 1 or more of the currently referenced manifests, the parent would reflect this removal (since every retry due to a concurrent update would naturally have an updated parent and therefore an updated view of what are the manifests currently). Added some tests in TestTransactions
that verify this cc @rdblue @aokolnychyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rerunning benchmarks to make sure after all the recent refactoring I did, that somehow we're not skipping the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help to state in this comment what is meant by "trusted". I think it means that we can filter the set of referenced manifests rather than the set of manifests that was passed into this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that we can filter the set of referenced manifests rather than the set of manifests that was passed into this method.
Actually, I think it's more specific than this. It's possible that the set can be trusted, there's a manifest which is not referenced set, and the manifest still needs to be filtered because it either has live entries OR it's a delete manifest with aged out entries (the min sequence number ends up being less than the min data sequence number during filtering).
This also ties in to trying to make this logic a bit less complicated since that last condition I mentioned is what's leading to the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to remove the check to always rewrite a manifest just because it had aged off deletes and no other reason to rewrite. This simplified a bit since now a lot of the referenced sets checks could be done upfront a bit more explicitly. See #11131 (comment)
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
@@ -323,11 +345,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { | |||
PartitionSpec spec = reader.spec(); | |||
PartitionAndMetricsEvaluator evaluator = | |||
new PartitionAndMetricsEvaluator(tableSchema, spec, deleteExpression); | |||
boolean hasDeletedFiles = manifestsReferencedForDeletes.contains(manifest.path()); | |||
if (hasDeletedFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this structure makes the code harder to reason about and misses an optimization.
This currently checks whether the manifest is known to have a file to remove. If it is unknown, then the logic below is used, which scans the manifest to determine whether it needs to be rewritten. But if the set of referenced manifests is trusted, then there is no need to scan these manifests because the manifest to rewrite for every removal is known.
This should check allDeletesReferenceManifests
to determine if the next step can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that this may actually be caught by canContainDeletedFiles
.
I think part of the problem is that this is trying to use instance state rather than passing data through methods. I think the logic would be more clear if state were passed through args rather than relying on instance fields several calls down the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, canContainDeletedFiles should skip entirely if we can trust the referenced set.
I think part of the problem is that this is trying to use instance state rather than passing data through methods. I think the logic would be more clear if state were passed through args rather than relying on instance fields several calls down the chain.
Sounds good, I think what we could do is filter manager surfaces the referenced manifests if all the removals are performed with a referenced manifest location. Then in merging snapshot producer we leverage that set to filter down the set that gets passed to the filter manager to begin with. I still think this particular case would be a good optimization (the case where a file is known to be in a manfiest referenced in a delete) so that we avoid opening up the manifest and evaluating stats etc, and just skip to writing the new manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the check which led to always rewriting a delete manifest in case it had aged off deletes simplified this, for context see: #11131 (comment)
I moved the referenced check a bit more explicitly up front during the filtering and also avoided some additional referenced set checks (previously we were doing that a few times) but now it should only happen once.
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
661bc57
to
e137da1
Compare
// The current set of referenced manifests can be trusted if it is a subset of the manifests | ||
// being filtered. If a single referenced manifest is not in the set of manifests being filtered | ||
// this indicates that the referenced manifests are stale and cannot be trusted. | ||
Set<String> manifestLocations = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that we can filter the set of referenced manifests rather than the set of manifests that was passed into this method.
Actually, I think it's more specific than this. It's possible that the set can be trusted, there's a manifest which is not referenced set, and the manifest still needs to be filtered because it either has live entries OR it's a delete manifest with aged out entries (the min sequence number ends up being less than the min data sequence number during filtering).
This also ties in to trying to make this logic a bit less complicated since that last condition I mentioned is what's leading to the current implementation.
862bba6
to
219ed14
Compare
boolean canContainDropBySeq = | ||
manifest.content() == ManifestContent.DELETES | ||
&& manifest.minSequenceNumber() < minSequenceNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to rewrite a given manifest anytime it had aged deletes regardless of if the manifest needed to get rewritten for other purposes. Discussed with @rdblue and the way to look at this is that we should opportunistically remove the aged deletes when the manifest needs to get rewritten for other purposes.
Otherwise we don't really need to go out of our way since it can create unnecessary failures which I think I agree with the rationale on that. I removed canContainDropSeq
as an individual condition to drive a rewrite, which is technically a behavior change but arguably a better choice based on the argument that it avoids unnecessary failures that can happen by doing that work.
I also updated TestRowDelta#testDeleteDataFileWithDeleteFile so that we still exercise the case where aged deletes do get rewritten, but the conditions for triggering that had to change in the tests.
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
dataSeqs(1L, 1L), | ||
fileSeqs(1L, 1L), | ||
ids(deltaSnapshotId, deltaSnapshotId), | ||
files(FILE_A_DELETES, FILE_B_DELETES), | ||
statuses(Status.ADDED, Status.ADDED)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11131 (comment) for why I changed this test. TLDR, the behavior around rewriting manifests with aged off deletes changed from "Always rewrite in case the manifest does have aged off deletes" to "Remove the aged off deletes if the manifest actually needs to be rewritten for other purposes". This test used to just work on the assumption that this condition existed, but now we need to perform an action (deleting FILE_B_DELETES) which forces the manifest to be rewritten along with the deleted, aged off entries.
2e583a2
to
e8a87bc
Compare
DeleteFile deleteFile = TestHelpers.deleteFiles(table).iterator().next(); | ||
Path deleteFilePath = new Path(String.valueOf(deleteFile.path())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests changed for the same reason mentioned #11131 (comment)
e8a87bc
to
eb3e848
Compare
…o determine if a given manifest needs to be rewritten or not
eb3e848
to
6e71f7f
Compare
This change optimizes merging snapshot producer logic to determine if a file has been deleted in a manifest in the case
deleteFile(ContentFile file)
is called; optimizing this logic means that determining if a manifest needs to be rewritten is now optimized.Previously determining a manifest file needs to be rewritten has 2 high level steps:
The optimization in this PR will optimize step 1 in the case
deleteFile(ContentFile file)
is called by keeping track of the data/delete file's manifests and the position in the manifest that is being deleted. Using this the logic for doing the first pass over the manifest is no longer necessary since the presence of a referenced manifest means that manifest must be rewritten.Before