-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Add PartitionMap #9194
Core: Add PartitionMap #9194
Conversation
1e73a91
to
4b0da7f
Compare
} | ||
|
||
private Map<StructLike, V> newPartitionMap(int specId) { | ||
PartitionSpec spec = specs.get(specId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will it happen if specs don't contain specId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fail in that case. These maps will be instantiated with table.specs()
containing all known specs of the table. I can add a precondition with a better error message but I wasn't worried about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a precondition with a proper error message.
|
||
@Override | ||
public V get(Object key) { | ||
if (key instanceof Pair) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional we can throw a NPE on "null" key, API doesn't say we have to but this map doesn't allow for a null key so we could do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check built-in maps that don't support null keys. We will throw an NPE in put
. I am not sure what would be the best behavior on get so I followed PartitionSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some built-in maps actually throw an exception but mainly because they call hashCode
on the provided key. I don't mind adding an extra check to see if the value is null.
Do you think that helps indicate something is not right, @RussellSpitzer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably need to do the same for remove
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the Java API description which says to throw NPE's if you don't support null keys but I really don't care either way. I think in our case we have no use case for a null pair so we probably should just not allow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll add an exception to get
and remove
.
|
||
@Override | ||
public String toString() { | ||
return partitionMaps.entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we using this.entryset()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but I didn't do that for a few reasons:
- Overhead of constructing
entrySet
as we need to createPartitionEntry
for each mapping. - The need to look up
PartitionSpec
for every pair vs once per partition map right now.
private String toString(PartitionSpec spec, Entry<StructLike, V> entry) { | ||
StructLike struct = entry.getKey(); | ||
V value = entry.getValue(); | ||
return spec.partitionToPath(struct) + " -> " + (value == this ? "(this Map)" : value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow this string creation
/foo=1/ -> (this map)
Are we assuming we would get nested PartitionMaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather we just ban putting the map into itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Banning would mean adding extra logic on each put, which I'd probably avoid (to be honest, it is highly unlikely to hit this in practice).
I copied this approach from Java AbstractMap
that does a similar trick.
public String toString() {
Iterator<Entry<K,V>> i = entrySet().iterator();
if (! i.hasNext())
return "{}";
StringBuilder sb = new StringBuilder();
sb.append('{');
for (;;) {
Entry<K,V> e = i.next();
K key = e.getKey();
V value = e.getValue();
sb.append(key == this ? "(this Map)" : key);
sb.append('=');
sb.append(value == this ? "(this Map)" : value);
if (! i.hasNext())
return sb.append('}').toString();
sb.append(',').append(' ');
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep with tradition
otherMap.put(Pair.of(BY_DATA_SPEC.specId(), Row.of("bbb")), "v2"); | ||
map.putAll(otherMap); | ||
|
||
assertThat(map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a good time to also test the equals method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below.
assertThat(map).doesNotContainKey(Pair.of(1, Row.of(1))).doesNotContainValue("value"); | ||
assertThat(map.values()).isEmpty(); | ||
assertThat(map.keySet()).isEmpty(); | ||
assertThat(map.entrySet()).isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot to check equals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate test case to check both empty and non-empty cases.
map.put(BY_DATA_SPEC.specId(), struct(BY_DATA_SPEC, "data", "aaa"), "value"); | ||
assertThat(map) | ||
.containsEntry(Pair.of(BY_DATA_SPEC.specId(), struct(BY_DATA_SPEC, "data", "aaa")), "value") | ||
.containsEntry(Pair.of(BY_DATA_SPEC.specId(), Row.of("aaa")), "value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be another good spot to do an "equals" check with two maps with the same specs but different structlikes with the same data?
} | ||
|
||
@Test | ||
public void testConcurrencyReadAccess() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Concurrency -> Concurrent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, left a few more suggestions but that's at your discretion
8064d74
to
c81a209
Compare
@@ -32,7 +33,7 @@ | |||
import org.apache.iceberg.relocated.com.google.common.collect.Maps; | |||
import org.apache.iceberg.types.Types; | |||
|
|||
public class PartitionSet implements Set<Pair<Integer, StructLike>> { | |||
public class PartitionSet extends AbstractSet<Pair<Integer, StructLike>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for proper equality.
c81a209
to
2ca4ac1
Compare
Thanks for reviewing, @qqqttt123 @jerqi @RussellSpitzer! |
This PR adds
PartitionMap
, a map that uses a pair of spec ID and partition tuple as keys. It is similar toPartitionSet
.The class will simplify places like
DeleteFileIndex
that uses the following code not related to the main logic.