-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Switch usage to DataFileSet / DeleteFileSet #11158
Conversation
nastra
commented
Sep 18, 2024
•
edited
Loading
edited
6b35421
to
9d75725
Compare
5c25dd3
to
10150bf
Compare
1e0b413
to
54cffb4
Compare
4dbf58c
to
73638a6
Compare
70dcd1a
to
6fc4e35
Compare
1e6dfd7
to
8002680
Compare
8002680
to
a2688b8
Compare
95d12ef
to
08ccda9
Compare
88e1d58
to
52c5899
Compare
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
private final CharSequenceSet newDataFilePaths = CharSequenceSet.empty(); | ||
private final CharSequenceSet newDeleteFilePaths = CharSequenceSet.empty(); | ||
private final DataFileSet newDataFiles = DataFileSet.create(); | ||
private final DeleteFileSet newDeleteFiles = DeleteFileSet.create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these extra collections? Can't we use sets in newDataFilesBySpec
and newDeleteFilesBySpec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm handling this already in 4d42f18. I just didn't want to introduce too many changes/refactorings as the PR is already quite large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @nastra offline, overall feel like the change is good but I think it's worth running some benchmarks as a sanity check. There really shouldn't be much of a change afterwards but it's more so to make sure there's not some regression we're not expecting.
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
52c5899
to
ae3bcac
Compare
83c6052
to
0df0066
Compare
0df0066
to
c91f326
Compare
@amogh-jahagirdar I added benchmark results to the PR description |
@@ -213,7 +213,7 @@ public void testDropTable() throws IOException { | |||
table.newAppend().appendFile(file1).appendFile(file2).commit(); | |||
|
|||
// delete file2 | |||
table.newDelete().deleteFile(file2.path()).commit(); | |||
table.newDelete().deleteFile(file2).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required? There should be no behavior change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a required change, but I wanted to move existing places to the non-path-based delete path
@@ -407,7 +407,7 @@ public void testDropTable() throws IOException { | |||
table.newAppend().appendFile(file1).appendFile(file2).commit(); | |||
|
|||
// delete file2 | |||
table.newDelete().deleteFile(file2.path()).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a required change, but I wanted to move existing places to the non-path-based delete path
@@ -71,9 +70,9 @@ public String partition() { | |||
private final PartitionSet deleteFilePartitions; | |||
private final PartitionSet dropPartitions; | |||
private final CharSequenceSet deletePaths = CharSequenceSet.empty(); | |||
private final Set<F> deleteFiles = newFileSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we co-locate deleteFiles
and deleteFilePartitions
as deleteFilePartitions
essentially contains a set of partitions derived from deleteFiles
.
Actually, the comments are minor and I verified that changes in tests are cosmetic locally. I'll go ahead and merge this one to unblock the subsequent PR. |
Thanks, @nastra! Thanks for reviewing, @amogh-jahagirdar! |