You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hybrid query has high latency comparing to other compound queries like Boolean query. Based on results collected for 2.13 and depending on the dataset and exact query it may be up to 10 times slower than Bool. Another reason for this issue is degradation in performance of hybrid query comparing to initial release e.g. in OpenSearch 2.11.
Following are goals for this work:
bring performance of hybrid query to a level when it's comparable with bool query:
For small datasets and sub-sets it should much Bool with deviation within 20% for p90
For large datasets (10M+ documents) and if a sub-queries return large sub-set of documents (1M+ documents in sub-query result) hybrid query should perform no worse than 2x of Bool query
Multiple sub-queries can add additional overhead of no more than 20% of overall query time for p90
reach the level of performance of hybrid query released in 2.11
There were some GH issues in the past that are related to the same problem, e.g. #281. In addition to that, based on analysis of the source code and some profiling I can think of following list of items:
don't execute TopDocsCollector core collector as it takes compute and results are ignored
optimize plugin code for better performance: check for sub-optimal initializations, loops, type conversions etc.
for cases when some of sub-queries are rewritten to the same lucene form - execute only one query and copy scores
Hybrid query has high latency comparing to other compound queries like Boolean query. Based on results collected for 2.13 and depending on the dataset and exact query it may be up to 10 times slower than Bool. Another reason for this issue is degradation in performance of hybrid query comparing to initial release e.g. in OpenSearch 2.11.
Following are goals for this work:
There were some GH issues in the past that are related to the same problem, e.g. #281. In addition to that, based on analysis of the source code and some profiling I can think of following list of items:
Github issues for each child item:
The text was updated successfully, but these errors were encountered: