-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.4: Support NOT_EQ for V2 filters #7898
Conversation
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks mostly good to me, looks like some style nits
return null; | ||
} | ||
|
||
if (predicate.name().equals(EQ)) { | ||
// comparison with null in normal equality is always null. this is probably a mistake. | ||
if (EQ.equals(predicate.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probaby can move this to the top of the case as well, to avoid the unnecessary work if Precondition fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should process the predicateChildren
first, and then with the children
to determine whether the value
is null.
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, I read it again and sorry I had a few more questions and small comments, but looks pretty close to me though.
} else { | ||
return equal(attribute, value); | ||
return notEqual(children.first(), children.second()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a question , was this originally a bug? it was returning equal(string, value), not equal(NamedReference, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My question is about the original code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string will be wrapped into NamedReference
. There are two API as follows:
public static <T> UnboundPredicate<T> equal(String name, T value) {
return new UnboundPredicate<>(Expression.Operation.EQ, ref(name), value);
}
public static <T> UnboundPredicate<T> equal(UnboundTerm<T> expr, T value) {
return new UnboundPredicate<>(Expression.Operation.EQ, expr, value);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, and you plan to pass UnboundTerm instead of string, for following pr
} | ||
} | ||
|
||
private static UnboundPredicate<Object> handleNotEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not mentioning earlier, but can we make these two methods take not a Pair, but actual arguments, so this method itself is easier to read?
Then just invoke in the caller: , handle[Not]Equal(children.first(), children.second())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, updated for the methods to be more clearly.
@@ -245,6 +251,22 @@ public static Expression convert(Predicate predicate) { | |||
return null; | |||
} | |||
|
|||
private static Pair<UnboundTerm<Object>, Object> predicateChildren(Predicate predicate) { | |||
Object value; | |||
UnboundTerm<Object> term; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, my thought is , it would be clearer to have term be of type 'NamedReference', instead of super class 'UnboundTerm', not sure what you think? (because its more specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this to be UnboundTerm
because it could be a UnboundTransform
as well in the following PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Slightly prefer to not change literal String => UnboundTerm to make this diff smaller. But I get its needed in subsequent pr, not a big deal to me.
Will wait a little to see if further comments.
@@ -39,6 +39,7 @@ | |||
|
|||
public class TestSparkV2Filters { | |||
|
|||
@SuppressWarnings("checkstyle:MethodLength") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably break this test up by operator :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay doing that in a follow-up change as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the internal test block to function testV2Filter
.
private static UnboundPredicate<Object> handleEqual(UnboundTerm<Object> term, Object value) { | ||
if (value == null) { | ||
return isNull(term); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor style suggestion here, i find these a bit more clear as if, elseif, else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style suggestion about branching condition, (make branching explicit with elseIf instead of implied via early return)
I also think it would be great if we broke up the test method rather than putting in a suppression for the method length
private static UnboundPredicate<Object> handleNotEqual(UnboundTerm<Object> term, Object value) { | ||
if (value == null) { | ||
return notNull(term); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Same here, let's use if/else if/else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
term = ref(SparkUtil.toColumnName(rightChild(predicate))); | ||
value = convertLiteral(leftChild(predicate)); | ||
} else { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I am not sure there is a lot of value in keeping term
and value
outside given that we have a nested return and not using them anywhere below. I'd probably format it like this, it up to you, though:
if (isRef(leftChild(predicate)) && isLiteral(rightChild(predicate))) {
UnboundTerm<Object> term = ref(SparkUtil.toColumnName(leftChild(predicate)));
Object value = convertLiteral(rightChild(predicate));
return Pair.of(term, value);
} else if (isRef(rightChild(predicate)) && isLiteral(leftChild(predicate))) {
UnboundTerm<Object> term = ref(SparkUtil.toColumnName(rightChild(predicate)));
Object value = convertLiteral(leftChild(predicate));
return Pair.of(term, value);
} else {
return null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I echo Russell's comments around nested if blocks. Otherwise, seems correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think comments have been addressed, I think we could revert the unnecessary (to me) test change and then it should be ready to commit
attrMap.forEach(this::testV2Filter); | ||
} | ||
|
||
private void testV2Filter(String quoted, String unquoted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit hacky to avoid the length warning, as it doesnt add the readability more than the suppression does. I think @RussellSpitzer may have meant individual test methods, like:
@Test
public void testIsNullV2Filter() {
}
@Test
public void testIsNotNullV2Filter {
}
Maybe we can revert it to what you had, and we can make a cleanup issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry for the misunderstood. Reverted and will address it in a follow-up.
|
||
private static UnboundPredicate<Object> handleNotEqual(UnboundTerm<Object> term, Object value) { | ||
if (value == null) { | ||
return notNull(term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<>
is not null safe right? So this should return null
or throw an exception if the value is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing that. Updated.
Assert.assertEquals( | ||
"NotEqualTo must match", expectedNotEq1.toString(), actualNotEq1.toString()); | ||
|
||
Predicate notEq2 = new Predicate("<>", valueAndAttr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases for null and NaN values? I think there should be since there is special handling for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
b8cf269
to
ae4cdbc
Compare
Thanks @rdblue @szehon-ho @RussellSpitzer @aokolnychyi ! |
This splits the changes for
NOT_EQ
from #7886