Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed map of subquery to subquery index in favor of storing index as part of DISI wrapper to improve hybrid query latencies by 20% #711

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Apr 26, 2024

In this PR we're continue to improve latency of hybrid query as part of meta issue #705.

Based on following flamegraph next area that may give high boost is a lookup of sub-query index by the query. That is needed to get index of sub-query and store score of that sub-query for one document (code ref).

profile_hybqr_15_04_no_concurrent

Most time (~28%) is taken by store and lookup of the index based on query as a key. Depending on the exact sub-query calculation of its hash code can be slow (hybrid query works with any type of OpenSearch query). That is a problem on large datasets as this is done for each doc by each sub-query.

We can avoid creation and usage of that query to index map by storing sub-query index at time we create collection of DISIWrapers.

As per benchmark results that gives about 20% performance boost. I've run it on 2.13 using noaa OSB workload, all times are in ms:

Before the change (baseline)

One sub-query that selects 11M documents

Bool: p50 88.2863 | p90 103.777
Hybrid: p50 299.427 | p90 319.797

One sub-query that selects 1.6K documents

Bool: p50 92.7222 | p90 98.6847
Hybrid: p50 94.5511 | p90 111.645

Three sub-query that select 15M documents

Bool: p50 98.0301 | p90 108.948
Hybrid: p50 475.319 | p90 515.999

After the change:

One sub-query that selects 11M documents

Bool: p50 97.9306 | p90 116.299
Hybrid: p50 228.696 | p90 249.665

One sub-query that selects 1.6K documents

Bool: p50 87.3152 | p90 89.3061
Hybrid: p50 89.9654 | p90 92.349

Three sub-query that select 15M documents

Bool: p50 97.9891 | p90 114.396
Hybrid: p50 353.631 | p90 377.527

Issues Resolved

#705

Check List

  • New functionality includes testing.
    • All tests pass
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.14.0 hybrid search hybrid query performance optimization labels Apr 26, 2024
…s part of disi wrapper to improve hybrid query latencies by 20%

Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski force-pushed the save_subquery_index_in_disiwrapper_instead_computing_maptoindex branch from a1bb0ce to cc89989 Compare April 26, 2024 23:21
@navneet1v navneet1v self-requested a review April 26, 2024 23:21
Copy link
Member

@VijayanB VijayanB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@martin-gaievski martin-gaievski merged commit a3bdde5 into opensearch-project:main Apr 27, 2024
75 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 27, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711)

* Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20%

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit a3bdde5)
martin-gaievski added a commit that referenced this pull request Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) (#712)

* Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20%

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit a3bdde5)

Co-authored-by: Martin Gaievski <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711)

* Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20%

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit a3bdde5)
martin-gaievski added a commit that referenced this pull request Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) (#715)

* Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20%

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit a3bdde5)

Co-authored-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants