-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the filter rewrite optimization #14464
Refactor the filter rewrite optimization #14464
Conversation
Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]>
refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]>
refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for 1a067ba: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
inline class Signed-off-by: bowenlan-amzn <[email protected]>
...rc/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/PointTreeTraversal.java
Show resolved
Hide resolved
Signed-off-by: bowenlan-amzn <[email protected]>
9040f6f
to
e896927
Compare
...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java
Show resolved
Hide resolved
...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Show resolved
Hide resolved
- remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <[email protected]>
❕ Gradle check result for 86cacab: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes @bowenlan-amzn I think this is much easier to follow than the original helper class. I think we can keep going with some cleanup but my major concern re concurrent search appears resolved.
.../src/main/java/org/opensearch/search/aggregations/bucket/filterrewrite/AggregatorBridge.java
Show resolved
Hide resolved
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor inline class Signed-off-by: bowenlan-amzn <[email protected]> * Fix a bug Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <[email protected]> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <[email protected]> * remove interface, clean up Signed-off-by: bowenlan-amzn <[email protected]> * improve doc Signed-off-by: bowenlan-amzn <[email protected]> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <[email protected]> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <[email protected]> * Address Marc's comments Signed-off-by: bowenlan-amzn <[email protected]> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <[email protected]> * remove circular dependency Signed-off-by: bowenlan-amzn <[email protected]> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]> (cherry picked from commit 170ea27) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@mch2 @bowenlan-amzn We shouldn't skip changelog for these changes. |
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic * Refactor refactor the data provider and try optimize logic * Refactor * Refactor extract segment match all logic * Refactor * Refactor inline class * Fix a bug * address comment * prepareFromSegment now doesn't return Ranges * how it looks like when introduce interfaces * remove interface, clean up * improve doc * move multirangetraversal logic to helper * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge * Address Marc's comments * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path * remove circular dependency * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info --------- (cherry picked from commit 170ea27) Signed-off-by: bowenlan-amzn <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor inline class Signed-off-by: bowenlan-amzn <[email protected]> * Fix a bug Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <[email protected]> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <[email protected]> * remove interface, clean up Signed-off-by: bowenlan-amzn <[email protected]> * improve doc Signed-off-by: bowenlan-amzn <[email protected]> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <[email protected]> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <[email protected]> * Address Marc's comments Signed-off-by: bowenlan-amzn <[email protected]> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <[email protected]> * remove circular dependency Signed-off-by: bowenlan-amzn <[email protected]> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor inline class Signed-off-by: bowenlan-amzn <[email protected]> * Fix a bug Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <[email protected]> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <[email protected]> * remove interface, clean up Signed-off-by: bowenlan-amzn <[email protected]> * improve doc Signed-off-by: bowenlan-amzn <[email protected]> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <[email protected]> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <[email protected]> * Address Marc's comments Signed-off-by: bowenlan-amzn <[email protected]> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <[email protected]> * remove circular dependency Signed-off-by: bowenlan-amzn <[email protected]> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
* Refactor Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]> * Refactor Signed-off-by: bowenlan-amzn <[email protected]> * Refactor inline class Signed-off-by: bowenlan-amzn <[email protected]> * Fix a bug Signed-off-by: bowenlan-amzn <[email protected]> * address comment Signed-off-by: bowenlan-amzn <[email protected]> * prepareFromSegment now doesn't return Ranges Signed-off-by: bowenlan-amzn <[email protected]> * how it looks like when introduce interfaces Signed-off-by: bowenlan-amzn <[email protected]> * remove interface, clean up Signed-off-by: bowenlan-amzn <[email protected]> * improve doc Signed-off-by: bowenlan-amzn <[email protected]> * move multirangetraversal logic to helper Signed-off-by: bowenlan-amzn <[email protected]> * improve the refactor package name -> filterrewrite move tree traversal logic to new class add documentation for important abstract methods add sub class for composite aggregation bridge Signed-off-by: bowenlan-amzn <[email protected]> * Address Marc's comments Signed-off-by: bowenlan-amzn <[email protected]> * Address concurrent segment search concern To save the ranges per segment, now change to a map that save ranges for segments separately. The increment document function "incrementBucketDocCount" should already be thread safe, as it's the same method used by normal aggregation execution path Signed-off-by: bowenlan-amzn <[email protected]> * remove circular dependency Signed-off-by: bowenlan-amzn <[email protected]> * Address comment - remove map of segment ranges, pass in by calling getRanges when needed - use AtomicInteger for the debug info Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
Description
As more code coming into the filter rewrite optimization, it starts to become harder to understand.
Not only making the code review slower and painful, it also will slow down the new contributors into this area. So here comes the refactoring work.
Idea
The refactoring shouldn't change any business logic.
After the refactor, reader can easily find all the important information by just reading the class doc and checking the public methods of all classes.
Refactoring
Why the name —
filter rewrite optimization
?Filter in OpenSearch world has similar meaning as query, while it indicates no relavance scoring calculated.
Rewrite in OpenSearch world can mean transform OpenSearch query into lucene query, or transform a query to perform better.
Generally speaking, the optimization rewrites the aggregation into certain filters to improve performance. Aggregation execution is plain and simple iteration and collection on all matches, while filters can take advantage of the Lucene index to get expected results in log or even constant time.
Benchmark
Using the new tool to trigger benchmark from PR #14464 (comment)
Related Issues
Resolves #14435
Check List
[ ] Functionality includes testing.[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.