Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for local cache in hybrid query #663

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Apr 3, 2024

Description

In this PR we're relaxing validation of fetch and query results for case when request cache is enabled and index has 1 shard. Today's code throws exception in such scenario, this is because cached results of fetch and query are different in size.

In addition to new integ tests I've run few scenarios manually. Below is example that is similar to one reported in original GH issue:

Create index with keyword and integer fields, ingest following 8 documents:

POST /_bulk

{ "index": { "_index": "my-nlp-index" } }
{ "category": "permission", "doc_keyword": "workable", "doc_index": 4976, "doc_price": 100}
{ "index": { "_index": "my-nlp-index" } }
{ "category": "sister", "doc_keyword": "angry", "doc_index": 2231, "doc_price": 200 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "hair", "doc_keyword": "likeable", "doc_price": 25 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_index": 9871, "doc_price": 30 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "statement", "doc_keyword": "entire", "doc_index": 8242, "doc_price": 350  } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "statement", "doc_keyword": "idea", "doc_index": 5212, "doc_price": 200  } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_keyword": "bubble", "doc_index": 1298, "doc_price": 130 } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_keyword": "bubble", "doc_index": 521, "doc_price": 75  } 

Query 1

GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline&request_cache=true&preference=_local
{
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "doc_keyword": "idea"
                    }
                },
                {
                    "range": {
                        "doc_index": {
                            "gte": 20,
                            "lte": 9000
                        }
                    }
                }
            ]
        }
    }
}

Results after first execution:

{
    "took": 134,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "agNKpY4ByS1rY5UY9Bww",
                "_score": 1.0,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "idea",
                    "doc_index": 5212,
                    "doc_price": 200
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "ZQNKpY4ByS1rY5UY9Bwv",
                "_score": 0.5,
                "_source": {
                    "category": "permission",
                    "doc_keyword": "workable",
                    "doc_index": 4976,
                    "doc_price": 100
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "ZgNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "sister",
                    "doc_keyword": "angry",
                    "doc_index": 2231,
                    "doc_price": 200
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "aQNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "entire",
                    "doc_index": 8242,
                    "doc_price": 350
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "awNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "editor",
                    "doc_keyword": "bubble",
                    "doc_index": 1298,
                    "doc_price": 130
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "bANKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "editor",
                    "doc_keyword": "bubble",
                    "doc_index": 521,
                    "doc_price": 75
                }
            }
        ]
    }
}

Results of second execution of same query, previously this request result in 500 code error response

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "agNKpY4ByS1rY5UY9Bww",
                "_score": 1.0,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "idea",
                    "doc_index": 5212,
                    "doc_price": 200
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "ZQNKpY4ByS1rY5UY9Bwv",
                "_score": 0.5,
                "_source": {
                    "category": "permission",
                    "doc_keyword": "workable",
                    "doc_index": 4976,
                    "doc_price": 100
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "ZgNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "sister",
                    "doc_keyword": "angry",
                    "doc_index": 2231,
                    "doc_price": 200
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "aQNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "entire",
                    "doc_index": 8242,
                    "doc_price": 350
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "awNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "editor",
                    "doc_keyword": "bubble",
                    "doc_index": 1298,
                    "doc_price": 130
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "bANKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "editor",
                    "doc_keyword": "bubble",
                    "doc_index": 521,
                    "doc_price": 75
                }
            }
        ]
    }
}

Query 2, difference is in sub-queries of hybrid query:

{
    "query": {
        "hybrid": {
            "queries": [
                {
                    "range": {
                        "doc_index": {
                            "gte": 20,
                            "lte": 100
                        }
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "doc_keyword": "likeable"
                                }
                            },
                            {
                                "term": {
                                    "category": "statement"
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Result of first execution

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.5,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "ZwNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "hair",
                    "doc_keyword": "likeable",
                    "doc_price": 25
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "aQNKpY4ByS1rY5UY9Bww",
                "_score": 5.0E-4,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "entire",
                    "doc_index": 8242,
                    "doc_price": 350
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "agNKpY4ByS1rY5UY9Bww",
                "_score": 5.0E-4,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "idea",
                    "doc_index": 5212,
                    "doc_price": 200
                }
            }
        ]
    }
}

Results of second execution, before the fix system returns error response with code 500

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.5,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "ZwNKpY4ByS1rY5UY9Bww",
                "_score": 0.5,
                "_source": {
                    "category": "hair",
                    "doc_keyword": "likeable",
                    "doc_price": 25
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "aQNKpY4ByS1rY5UY9Bww",
                "_score": 5.0E-4,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "entire",
                    "doc_index": 8242,
                    "doc_price": 350
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "agNKpY4ByS1rY5UY9Bww",
                "_score": 5.0E-4,
                "_source": {
                    "category": "statement",
                    "doc_keyword": "idea",
                    "doc_index": 5212,
                    "doc_price": 200
                }
            }
        ]
    }
}

Issues Resolved

#606

Check List

  • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.14.0 labels Apr 3, 2024
@martin-gaievski martin-gaievski force-pushed the fix_local_cache_flag_for_hybrid_query branch from 60229e7 to 29dfb58 Compare April 3, 2024 03:49
Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.04%. Comparing base (50a6dcf) to head (7259575).

Files Patch % Lines
...arch/processor/NormalizationProcessorWorkflow.java 33.33% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #663      +/-   ##
============================================
- Coverage     84.19%   84.04%   -0.15%     
- Complexity      743      744       +1     
============================================
  Files            59       59              
  Lines          2309     2313       +4     
  Branches        370      374       +4     
============================================
  Hits           1944     1944              
  Misses          214      214              
- Partials        151      155       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski martin-gaievski force-pushed the fix_local_cache_flag_for_hybrid_query branch 2 times, most recently from dbf2af4 to 32079ec Compare April 3, 2024 16:55
@martin-gaievski martin-gaievski force-pushed the fix_local_cache_flag_for_hybrid_query branch from 32079ec to 7259575 Compare April 3, 2024 17:42
@martin-gaievski
Copy link
Member Author

BWC will keep failing unless all dependent repos are switched to 2.14 snapshot version, in particular: knn, ml-commons, common-utils. Also this PR should be merged for neural-search repo: #653

@martin-gaievski martin-gaievski merged commit cc6a6b2 into opensearch-project:main Apr 3, 2024
84 of 92 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 3, 2024
Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit cc6a6b2)
martin-gaievski added a commit that referenced this pull request Apr 3, 2024
Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit cc6a6b2)

Co-authored-by: Martin Gaievski <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 8, 2024
Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit cc6a6b2)
navneet1v pushed a commit that referenced this pull request Apr 8, 2024
Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit cc6a6b2)

Co-authored-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch backport 2.13 Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants