Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11677][SQL][FOLLOW-UP] Add tests for checking the ORC filter creation against pushed down filters. #10341

Closed
wants to merge 9 commits into from

Conversation

HyukjinKwon
Copy link
Member

https://issues.apache.org/jira/browse/SPARK-11677
Although it checks correctly the filters by the number of results if ORC filter-push-down is enabled, the filters themselves are not being tested.
So, this PR includes the test similarly with ParquetFilterSuite.
Since the results are checked by OrcQuerySuite, this OrcFilterSuite only checks if the appropriate filters are created.

One thing different with ParquetFilterSuite here is, it does not check the results because that is checked in OrcQuerySuite.

@HyukjinKwon
Copy link
Member Author

As I talked with @liancheng, this PR is not covering NOT, AND and OR. This is because ExpressionTree is not accessible at Hive 1.2.x.

But as I see Hive's latest codes, this is accessible. So, I think I can add some tests about ExpressionTree.Operator meaning NOT, AND and OR easily after the Hive version is upgraded.

@SparkQA
Copy link

SparkQA commented Dec 17, 2015

Test build #47878 has finished for PR 10341 at commit 5f49052.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class CrossValidator(Estimator, HasSeed):\n

@liancheng
Copy link
Contributor

Hey @HyukjinKwon, actually I hit the same issue for testing AND/OR/NOT while working on #10377, and at last I resorted to SearchArgument.toString, which produces quite structured and readable string representation of a SearchArgument. Although a little bit clunky, it turned out to be quite effective. However, note that ORC tries to "unify" comparison operators. For example, all > are transformed to <= wrapped by a not in the final string representation.

Please refer to this file for details.

@HyukjinKwon
Copy link
Member Author

@liancheng Thanks! I will try to apply that way.

@HyukjinKwon
Copy link
Member Author

In this commit, I used string expression SearchArgument.toString to check filter creation.

I am not too sure if generalising them with string template is appropriate though. I can't come up with a better idea for now.

@liancheng
Copy link
Contributor

Oh, actually I didn't mean that you should use string comparison to test all ORC filters. It's perfectly OK to use PredicateLeaf for leaf predicates, and only use string comparison to test for AND/OR/NOT. Since there are not too many cases for the latter, you probably don't need to bother using those string templates.

The only reason that I suggested using string comparison here is that ORC doesn't expose necessary interfaces similar to PredicateLeaf for composed predicates. And it should only be used as a workaround rather than a general rule.

@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48104 has finished for PR 10341 at commit 406e5ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class OrcFilterSuite extends QueryTest with OrcTest\n * implicit class IntToBinary(int: Int)\n

@HyukjinKwon
Copy link
Member Author

Ah. I will correct them soon!

@HyukjinKwon
Copy link
Member Author

In the commits above, I added tests for logical operators separately. Although it does not check all the combinations across types with logical operators, I think this would be okay because the basic comparison operators are being tested across types.

Could we add some tests for logical operators across types later if they are necessary in a way? I do not want to make the tests hacky clunky.

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48162 has finished for PR 10341 at commit b2a2c03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class OrcFilterSuite extends QueryTest with OrcTest\n * implicit class IntToBinary(int: Int)\n

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48164 has finished for PR 10341 at commit 9f7e8bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class OrcFilterSuite extends QueryTest with OrcTest\n * implicit class IntToBinary(int: Int)\n

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48165 has finished for PR 10341 at commit 2ad0a43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class OrcFilterSuite extends QueryTest with OrcTest\n * implicit class IntToBinary(int: Int)\n

@liancheng
Copy link
Contributor

Thanks for working on this! Merging to master.

@asfgit asfgit closed this in 364d244 Dec 22, 2015
@liancheng
Copy link
Contributor

Hit network issue and couldn't fetch from GitHub... Finally got it merged.

@HyukjinKwon HyukjinKwon deleted the SPARK-11677-followup branch September 23, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants