Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12218] Fixes ORC conjunction predicate push down #10377

Closed

Conversation

liancheng
Copy link
Contributor

This PR is a follow-up of PR #10362.

Two major changes:

  1. The fix introduced in [SPARK-12218] [SQL] Invalid splitting of nested AND expressions in Data Source filter API #10362 is OK for Parquet, but may disable ORC PPD in many cases

PR #10362 stops converting an AND predicate if any branch is inconvertible. On the other hand, OrcFilters combines all filters into a single big conjunction first and then tries to convert it into ORC SearchArgument. This means, if any filter is inconvertible, no filters can be pushed down. This PR fixes this issue by finding out all convertible filters first before doing the actual conversion.

The reason behind the current implementation is mostly due to the limitation of ORC SearchArgument builder, which is documented in this PR in detail.
2. Copied the AND predicate fix for ORC from #10362 to avoid merge conflict.

Same as #10362, this PR targets master (2.0.0-SNAPSHOT), branch-1.6, and branch-1.5.

@SparkQA
Copy link

SparkQA commented Dec 18, 2015

Test build #48001 has finished for PR 10377 at commit 1148608.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2015

Test build #48005 has finished for PR 10377 at commit 88ad9a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2015

Test build #48006 has finished for PR 10377 at commit fe1ab88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Dec 18, 2015

sorry.. My commit introduced a conflict...

@liancheng liancheng force-pushed the spark-12218.fix-orc-conjunction-ppd branch from fe1ab88 to e5913d2 Compare December 19, 2015 00:55
@liancheng
Copy link
Contributor Author

@yhuai Nvm, rebased.

@SparkQA
Copy link

SparkQA commented Dec 19, 2015

Test build #48042 has finished for PR 10377 at commit e5913d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* child is inconvertible. Alas, `b.startAnd()` call can't be rolled back, and `b` is inconsistent
* now.
*
* The workaround employed here is that, for `And`/`Or`/`Not`, we first try to convert their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention for And Or Not? createFilter is dealing with the top level filters (they will be connected by AND), right? I think it is important to emphasize that createFilter is for top level filters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSearchArgument is recursive, so nested And/Or/Not within top level filters are also covered.

@SparkQA
Copy link

SparkQA commented Dec 24, 2015

Test build #48270 has finished for PR 10377 at commit f877ffa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the spark-12218.fix-orc-conjunction-ppd branch from f877ffa to 6c59503 Compare December 26, 2015 10:59
@SparkQA
Copy link

SparkQA commented Dec 26, 2015

Test build #48344 has finished for PR 10377 at commit 6c59503.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Dec 28, 2015

LGTM. Merging to master.

@asfgit asfgit closed this in 8e23d8d Dec 28, 2015
@liancheng liancheng deleted the spark-12218.fix-orc-conjunction-ppd branch December 29, 2015 07:06
@liancheng
Copy link
Contributor Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants