Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.4: Support NOT_EQ for V2 filters #7898

Merged
merged 8 commits into from
Jul 4, 2023

Conversation

ConeyLiu
Copy link
Contributor

This splits the changes for NOT_EQ from #7886

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks mostly good to me, looks like some style nits

return null;
}

if (predicate.name().equals(EQ)) {
// comparison with null in normal equality is always null. this is probably a mistake.
if (EQ.equals(predicate.name())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probaby can move this to the top of the case as well, to avoid the unnecessary work if Precondition fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should process the predicateChildren first, and then with the children to determine whether the value is null.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, I read it again and sorry I had a few more questions and small comments, but looks pretty close to me though.

} else {
return equal(attribute, value);
return notEqual(children.first(), children.second());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a question , was this originally a bug? it was returning equal(string, value), not equal(NamedReference, value)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My question is about the original code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string will be wrapped into NamedReference. There are two API as follows:

  public static <T> UnboundPredicate<T> equal(String name, T value) {
    return new UnboundPredicate<>(Expression.Operation.EQ, ref(name), value);
  }

  public static <T> UnboundPredicate<T> equal(UnboundTerm<T> expr, T value) {
    return new UnboundPredicate<>(Expression.Operation.EQ, expr, value);
  }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, and you plan to pass UnboundTerm instead of string, for following pr

}
}

private static UnboundPredicate<Object> handleNotEqual(
Copy link
Collaborator

@szehon-ho szehon-ho Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not mentioning earlier, but can we make these two methods take not a Pair, but actual arguments, so this method itself is easier to read?

Then just invoke in the caller: , handle[Not]Equal(children.first(), children.second())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, updated for the methods to be more clearly.

@@ -245,6 +251,22 @@ public static Expression convert(Predicate predicate) {
return null;
}

private static Pair<UnboundTerm<Object>, Object> predicateChildren(Predicate predicate) {
Object value;
UnboundTerm<Object> term;
Copy link
Collaborator

@szehon-ho szehon-ho Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, my thought is , it would be clearer to have term be of type 'NamedReference', instead of super class 'UnboundTerm', not sure what you think? (because its more specific)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to be UnboundTerm because it could be a UnboundTransform as well in the following PRs.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Slightly prefer to not change literal String => UnboundTerm to make this diff smaller. But I get its needed in subsequent pr, not a big deal to me.

Will wait a little to see if further comments.

@@ -39,6 +39,7 @@

public class TestSparkV2Filters {

@SuppressWarnings("checkstyle:MethodLength")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably break this test up by operator :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay doing that in a follow-up change as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the internal test block to function testV2Filter.

private static UnboundPredicate<Object> handleEqual(UnboundTerm<Object> term, Object value) {
if (value == null) {
return isNull(term);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style suggestion here, i find these a bit more clear as if, elseif, else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style suggestion about branching condition, (make branching explicit with elseIf instead of implied via early return)

I also think it would be great if we broke up the test method rather than putting in a suppression for the method length

private static UnboundPredicate<Object> handleNotEqual(UnboundTerm<Object> term, Object value) {
if (value == null) {
return notNull(term);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Same here, let's use if/else if/else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

term = ref(SparkUtil.toColumnName(rightChild(predicate)));
value = convertLiteral(leftChild(predicate));
} else {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I am not sure there is a lot of value in keeping term and value outside given that we have a nested return and not using them anywhere below. I'd probably format it like this, it up to you, though:

if (isRef(leftChild(predicate)) && isLiteral(rightChild(predicate))) {
  UnboundTerm<Object> term = ref(SparkUtil.toColumnName(leftChild(predicate)));
  Object value = convertLiteral(rightChild(predicate));
  return Pair.of(term, value);

} else if (isRef(rightChild(predicate)) && isLiteral(leftChild(predicate))) {
  UnboundTerm<Object> term = ref(SparkUtil.toColumnName(rightChild(predicate)));
  Object value = convertLiteral(leftChild(predicate));
  return Pair.of(term, value);

} else {
  return null;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I echo Russell's comments around nested if blocks. Otherwise, seems correct to me.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think comments have been addressed, I think we could revert the unnecessary (to me) test change and then it should be ready to commit

attrMap.forEach(this::testV2Filter);
}

private void testV2Filter(String quoted, String unquoted) {
Copy link
Collaborator

@szehon-ho szehon-ho Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit hacky to avoid the length warning, as it doesnt add the readability more than the suppression does. I think @RussellSpitzer may have meant individual test methods, like:

@Test
public void testIsNullV2Filter() {
}

@Test
public void testIsNotNullV2Filter {
}

Maybe we can revert it to what you had, and we can make a cleanup issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry for the misunderstood. Reverted and will address it in a follow-up.


private static UnboundPredicate<Object> handleNotEqual(UnboundTerm<Object> term, Object value) {
if (value == null) {
return notNull(term);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<> is not null safe right? So this should return null or throw an exception if the value is null.

Copy link
Contributor Author

@ConeyLiu ConeyLiu Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing that. Updated.

Assert.assertEquals(
"NotEqualTo must match", expectedNotEq1.toString(), actualNotEq1.toString());

Predicate notEq2 = new Predicate("<>", valueAndAttr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases for null and NaN values? I think there should be since there is special handling for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@rdblue rdblue merged commit b8102d5 into apache:master Jul 4, 2023
@rdblue
Copy link
Contributor

rdblue commented Jul 4, 2023

Thanks, @ConeyLiu! Now that this is in we can finish reviewing #7886.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Jul 5, 2023

Thanks @rdblue @szehon-ho @RussellSpitzer @aokolnychyi !

@ConeyLiu ConeyLiu deleted the not-eq-v2-filter branch July 5, 2023 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants