Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize global ordinal includes/excludes for prefix matching #14371

Merged
merged 7 commits into from
Aug 20, 2024

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Jun 15, 2024

Description

If an aggregration specifies includes or excludes based on a regular expression, and the regular expression has a finite expansion followed by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then include/exclude the range of global ordinals that start with each prefix.

Related Issues

Resolves #14368

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 05c8e3c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 02b31a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Aug 5, 2024

❌ Gradle check result for 0f5d528: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Aug 5, 2024

❕ Gradle check result for f6e1c96: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@harshavamsi
Copy link
Contributor

#14289 is the flaky test

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Copy link
Contributor

❌ Gradle check result for 1bc728a: SUCCESS

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 1bc728a: SUCCESS

@mch2 mch2 merged commit 13163ab into opensearch-project:main Aug 20, 2024
37 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 20, 2024
* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>

* Add unit test

Signed-off-by: Michael Froh <[email protected]>

* Add changelog entry

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Signed-off-by: Michael Froh <[email protected]>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <[email protected]>

* Address comments from @mch2

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>
(cherry picked from commit 13163ab)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mch2 pushed a commit that referenced this pull request Aug 20, 2024
#15324)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.



* Add unit test



* Add changelog entry



* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.



* Improve test coverage



* Fix bug in exclude-only case with no doc values in segment



* Address comments from @mch2



---------


(cherry picked from commit 13163ab)

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
…arch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>

* Add unit test

Signed-off-by: Michael Froh <[email protected]>

* Add changelog entry

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Signed-off-by: Michael Froh <[email protected]>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <[email protected]>

* Address comments from @mch2

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>
shiv0408 added a commit to shiv0408/OpenSearch that referenced this pull request Sep 2, 2024
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>

* Add unit test

Signed-off-by: Michael Froh <[email protected]>

* Add changelog entry

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Signed-off-by: Michael Froh <[email protected]>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <[email protected]>

* Address comments from @mch2

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>

* Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <[email protected]>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: linting

Signed-off-by: Mohammad Hasnain <[email protected]>

---------

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>

* Add Settings related to Workload Management feature (opensearch-project#15028)

* add QeryGroup Service tests
Signed-off-by: Ruirui Zhang <[email protected]>

* add PR to changelog
Signed-off-by: Ruirui Zhang <[email protected]>

* change the test directory
Signed-off-by: Ruirui Zhang <[email protected]>

* modify comments to be more specific
Signed-off-by: Ruirui Zhang <[email protected]>

* add test coverage
Signed-off-by: Ruirui Zhang <[email protected]>

* remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* Update affiliation for @nknize. (opensearch-project#15322)

Signed-off-by: dblock <[email protected]>

* Add log when download completes with file size (opensearch-project#15224)

Signed-off-by: Gaurav Bafna <[email protected]>

* Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352)

Signed-off-by: Andriy Redko <[email protected]>

* Add support for index level slice count setting (opensearch-project#15336)

Signed-off-by: Ganesh Ramadurai <[email protected]>

* Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325)

* Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common.

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Remove duplicate test in ingest-common

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Adding changelog

Signed-off-by: Sarat Vemulapalli <[email protected]>

---------

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Add Delete QueryGroup API Logic (opensearch-project#14735)

* Add Delete QueryGroup API Logic
Signed-off-by: Ruirui Zhang <[email protected]>

* modify changelog
Signed-off-by: Ruirui Zhang <[email protected]>

* include comments from create pr
Signed-off-by: Ruirui Zhang <[email protected]>

* remove delete all
Signed-off-by: Ruirui Zhang <[email protected]>

* rebase and address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* rebase
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* add UT coverage
Signed-off-by: Ruirui Zhang <[email protected]>

* [Star Tree] Lucene Abstractions for Star Tree File Formats  (opensearch-project#15278)

---------
Signed-off-by: Sarthak Aggarwal <[email protected]>

* [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152)

---------
Signed-off-by: Bharathwaj G <[email protected]>

* relaxing the join validation for nodes which have only store disabled but only publication enabled

* relaxing the join validation for nodes which have only store disabled but only publication enabled

Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Signed-off-by: dblock <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Andriy Redko <[email protected]>
Signed-off-by: Ganesh Ramadurai <[email protected]>
Signed-off-by: Sarat Vemulapalli <[email protected]>
Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
Co-authored-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>
Co-authored-by: Ruirui Zhang <[email protected]>
Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
Co-authored-by: Gaurav Bafna <[email protected]>
Co-authored-by: Andriy Redko <[email protected]>
Co-authored-by: Ganesh Krishna Ramadurai <[email protected]>
Co-authored-by: Sarat Vemulapalli <[email protected]>
Co-authored-by: Sarthak Aggarwal <[email protected]>
Co-authored-by: Bharathwaj G <[email protected]>
Co-authored-by: Rajiv Kumar Vaidyanathan <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
…arch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>

* Add unit test

Signed-off-by: Michael Froh <[email protected]>

* Add changelog entry

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Signed-off-by: Michael Froh <[email protected]>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <[email protected]>

* Address comments from @mch2

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Search:Aggregations v2.16.0 Issues and PRs related to version 2.16.0 v2.17.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Aggregation include/exclude should support faster filtering on prefixes
4 participants