[FEATURE] sltr queries with minimum_should_match features #20

jhinch-at-atlassian-com · 2023-11-03T21:40:33Z

Is your feature request related to a problem?

Non-linear scoring functions, particularly gradient boost decisions trees can be used as a technique used to deal with combining scores together for features which have different magnitudes and score distributions. However, currently sltr queries functions similar to bool query with a minimum_should_match of 0 with a custom scoring function meaning it cannot be used conveniently within the initial query and currently is encouraged to only be used in rescore blocks.

For example given the following featureset definition:

{
  "featurset": {
    "features": [
      {
        "name": "title_text_match",
        "params": [
          "query_text"
        ],
        "template_language": "mustache",
        "template": {
          "match": {
            "title": "{{query_text}}"
          }
        }
      },
      {
        "name": "description_text_match",
        "params": [
          "query_text"
        ],
        "template_language": "mustache",
        "template": {
          "match": {
            "description": "{{query_text}}"
          }
        }
      },
      {
        "name": "description_knn_match",
        "params": [
          "query_embedding"
        ],
        "template_language": "mustache",
        "template": "{\"knn\":{\"description_vector\":{\"k\":10,\"vector\":{{#toJson}}query_embedding{{/toJson}}}}}"
      }
    ]
  }
}

and a model example_model which was created using the above featureset, the following sltr query:

{
  "sltr": {
    "model": "example_model",
    "params": {
      "query_text": "the text query",
      "query_embedding": [1.0, 0.4, ...]
     }
  }
}

Can be thought conceptually as:

{
  "bool": {
    "filter": {
      "match_all": {}
    },
    "should": [
      {
        "match": {
          "title": "the text query"
        }
      },
      {
        "match": {
          "description": "the text query"
        }
      },
      {
        "knn": {
          "description_vector": {
            "k": 10,
            "vector": [1.0, 0.4, ...]
          }
        }
      }
    ],
    "minimum_should_match": 0,
    // plus also use a special scoring function defined by example_model
  }
}

What solution would you like?

It would be great if the features used by the model could have a requirement of a minimum which should match so that the sltr:

{
  "sltr": {
    "model": "example_model",
    "params": {
      "query_text": "the text query",
      "query_embedding": [1.0, 0.4, ...]
     },
     "minimum_should_match": 1
  }
}

which would translates to roughly the following:

{
  "bool": {
    "should": [
      {
        "match": {
          "title": "the text query"
        }
      },
      {
        "match": {
          "description": "the text query"
        }
      },
      {
        "knn": {
          "description_vector": {
            "k": 10,
            "vector": [1.0, 0.4, ...]
          }
        }
      }
    ],
    "minimum_should_match": 1,
    // plus also use a special scoring function defined by example_model
  }
}

What alternatives have you considered?

Its possible to work around this by having a surrounding bool query and duplicate the features as filters in that bool query:

{
  "bool": {
    "filter": [
      {
        "match": {
          "title": "the text query"
        }
      },
      {
        "match": {
          "description": "the text query"
        }
      },
      {
        "knn": {
          "description_vector": {
            "k": 10,
            "vector": [1.0, 0.4, ...]
          }
        }
      }
    ],
    "should": {
      {
        "sltr": {
          "model": "example_model",
          "params": {
            "query_text": "the text query",
            "query_embedding": [1.0, 0.4, ...]
           }
        }
      }
    }
  }
}

However this has the problem that it executes the query blocks twice and it requires duplicating the definitions and ensuring the featureset and query remain in sync.

Do you have any additional context?

This is the equivalent feature request as o19s/elasticsearch-learning-to-rank#476 but to the OpenSearch fork.

The text was updated successfully, but these errors were encountered:

msfroh · 2023-11-08T17:39:25Z

We need to better understand how the sltr query is implemented. We have only just begun to explore the LTR plugin.

@jhinch-at-atlassian-com -- do you have any ideas of how sltr is implemented under the hood to help us get started?

@noCharger -- Can you look into this? Would be a good place to get started on understanding the plugin. Thanks!

jhinch-at-atlassian-com · 2023-11-08T20:14:02Z

The best place to start looking is from RankerQuery.RankerWeight#scorer and RankerQuery.DisjunctionDISI#advance. You would need to compare this to how the equivalent functionality in bool query works. Likely what would need to be done to make it work is to inspect the subIteratorsPriorityQueue when advance is called and consider how many sub iterators are at the next doc ID allowing it to skip over scoring documents which don't match.

noCharger · 2023-12-13T17:57:13Z

@jhinch-at-atlassian-com I like this plan and the approach we're taking to support minimum_should_match. Would you like to contribute?

jhinch-at-atlassian-com added enhancement New feature or request untriaged labels Nov 3, 2023

msfroh removed the untriaged label Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] sltr queries with minimum_should_match features #20

[FEATURE] sltr queries with minimum_should_match features #20

jhinch-at-atlassian-com commented Nov 3, 2023

msfroh commented Nov 8, 2023

jhinch-at-atlassian-com commented Nov 8, 2023

noCharger commented Dec 13, 2023

[FEATURE] sltr queries with minimum_should_match features #20

[FEATURE] sltr queries with minimum_should_match features #20

Comments

jhinch-at-atlassian-com commented Nov 3, 2023

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

msfroh commented Nov 8, 2023

jhinch-at-atlassian-com commented Nov 8, 2023

noCharger commented Dec 13, 2023