You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Non-linear scoring functions, particularly gradient boost decisions trees can be used as a technique used to deal with combining scores together for features which have different magnitudes and score distributions. However, currently sltr queries functions similar to bool query with a minimum_should_match of 0 with a custom scoring function meaning it cannot be used conveniently within the initial query and currently is encouraged to only be used in rescore blocks.
For example given the following featureset definition:
and a model example_model which was created using the above featureset, the following sltr query:
{
"sltr": {
"model": "example_model",
"params": {
"query_text": "the text query",
"query_embedding": [1.0, 0.4, ...]
}
}
}
Can be thought conceptually as:
{
"bool": {
"filter": {
"match_all": {}
},
"should": [
{
"match": {
"title": "the text query"
}
},
{
"match": {
"description": "the text query"
}
},
{
"knn": {
"description_vector": {
"k": 10,
"vector": [1.0, 0.4, ...]
}
}
}
],
"minimum_should_match": 0,
// plus also use a special scoring function defined by example_model
}
}
What solution would you like?
It would be great if the features used by the model could have a requirement of a minimum which should match so that the sltr:
{
"sltr": {
"model": "example_model",
"params": {
"query_text": "the text query",
"query_embedding": [1.0, 0.4, ...]
},
"minimum_should_match": 1
}
}
which would translates to roughly the following:
{
"bool": {
"should": [
{
"match": {
"title": "the text query"
}
},
{
"match": {
"description": "the text query"
}
},
{
"knn": {
"description_vector": {
"k": 10,
"vector": [1.0, 0.4, ...]
}
}
}
],
"minimum_should_match": 1,
// plus also use a special scoring function defined by example_model
}
}
What alternatives have you considered?
Its possible to work around this by having a surrounding bool query and duplicate the features as filters in that bool query:
However this has the problem that it executes the query blocks twice and it requires duplicating the definitions and ensuring the featureset and query remain in sync.
The best place to start looking is from RankerQuery.RankerWeight#scorer and RankerQuery.DisjunctionDISI#advance. You would need to compare this to how the equivalent functionality in bool query works. Likely what would need to be done to make it work is to inspect the subIteratorsPriorityQueue when advance is called and consider how many sub iterators are at the next doc ID allowing it to skip over scoring documents which don't match.
Is your feature request related to a problem?
Non-linear scoring functions, particularly gradient boost decisions trees can be used as a technique used to deal with combining scores together for features which have different magnitudes and score distributions. However, currently
sltr
queries functions similar tobool
query with aminimum_should_match
of0
with a custom scoring function meaning it cannot be used conveniently within the initial query and currently is encouraged to only be used in rescore blocks.For example given the following featureset definition:
and a model
example_model
which was created using the above featureset, the followingsltr
query:Can be thought conceptually as:
What solution would you like?
It would be great if the features used by the model could have a requirement of a minimum which should match so that the
sltr
:which would translates to roughly the following:
What alternatives have you considered?
Its possible to work around this by having a surrounding
bool
query and duplicate the features as filters in thatbool
query:However this has the problem that it executes the query blocks twice and it requires duplicating the definitions and ensuring the featureset and query remain in sync.
Do you have any additional context?
This is the equivalent feature request as o19s/elasticsearch-learning-to-rank#476 but to the OpenSearch fork.
The text was updated successfully, but these errors were encountered: