Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-field type creates broken scores when not all fields have the same docCount #44700

Closed
jpountz opened this issue Jul 22, 2019 · 13 comments · Fixed by #89016
Closed

Cross-field type creates broken scores when not all fields have the same docCount #44700

jpountz opened this issue Jul 22, 2019 · 13 comments · Fixed by #89016
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jpountz
Copy link
Contributor

jpountz commented Jul 22, 2019

This was reported at https://discuss.elastic.co/t/query-can-return-negative-score-i-couldnt-find-any-spec-on-documentation-the-minimum-value-of-score-is-not-0/191307. In this case, scores even get negative due to the fact that the docCount is less than the docFreq.

This would be addressed by #41106.

@jpountz jpountz added >bug :Search/Search Search-related issues that do not fall into other categories labels Jul 22, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@HanguChoi
Copy link

@jpountz
Do you know when this bug fix can release?
If I know whether the fix takes long or short time, then It would be helpful.

@HanguChoi
Copy link

Below is the bug detail and reproduce step which I wrote in discuss.elastic.co.

Elasticsearch version (bin/elasticsearch --version):

Version: 7.1.0, Build: default/tar/606a173/2019-05-16T00:43:15.323135Z, JVM: 1.8.0_112

Plugins installed: []

  • analysis-nori

JVM version (java -version):

java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

OS version (uname -a if on a Unix-like system):

  • Mac OS high sierra 10.13.2(17C88)

Reproduce Step:

Create index

curl -X DELETE "localhost:9700/some_index"

curl -X PUT "localhost:9700/some_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "user_id": {
        "type": "integer"
      },
      "introduction": {
        "type": "text"
      },
      "occupation_name": {
        "type": "text", 
        "fields": {
          "standard": { "type": "text" },  
          "raw": { "type": "keyword" }     
        }
      }
    }
  }
}
'

Add docs

curl -X PUT "localhost:9700/some_index/_doc/1" -H "Content-Type: application/json" -d'
{
  "user_id": 1,
  "introduction": "ruby web developer"
}
'


curl -X PUT "localhost:9700/some_index/_doc/2" -H "Content-Type: application/json" -d'
{
  "user_id": 2,
  "introduction": "ruby developer",
  "occupation_name": "ruby developer"
}
'

curl -X PUT "localhost:9700/some_index/_doc/3" -H "Content-Type: application/json" -d'
{
  "user_id": 3,
  "introduction": "ruby and rails develop"
}
'

curl -X PUT "localhost:9700/some_index/_doc/4" -H "Content-Type: application/json" -d'
{
  "user_id": 4,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/5" -H "Content-Type: application/json" -d'
{
  "user_id": 5,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/6" -H "Content-Type: application/json" -d'
{
  "user_id": 6,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/7" -H "Content-Type: application/json" -d'
{
  "user_id": 7,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/8" -H "Content-Type: application/json" -d'
{
  "user_id": 8,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/9" -H "Content-Type: application/json" -d'
{
  "user_id": 9,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/10" -H "Content-Type: application/json" -d'
{
  "user_id": 10,
  "introduction": "I develop with ruby on rails"
}
'

curl -X PUT "localhost:9700/some_index/_doc/11" -H "Content-Type: application/json" -d'
{
  "user_id": 11,
  "introduction": "I develop with ruby on rails"
}
'

Query

curl -X GET "localhost:9700/some_index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "ruby",
                "type": "cross_fields",
                "fields": [
                  "occupation_name^5",
                  "introduction^2"                  
                ],
                "slop": 10
              }
            }
          ]
        }
      },
      "functions": []
    }
  },
  "from": 0,
  "size": 10,
  "explain": true
}
'

Response

...
  "_source": {
    "user_id": 2,
    "introduction": "ruby developer",
    "occupation_name": "ruby developer"
  },
  "_explanation": {
    "value": 0.45840853,
    "description": "max of:",
    "details": [
      {
        "value": -1.1157178,
        "description": "weight(occupation_name:ruby in 0) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": -1.1157178,
            "description": "score(freq=1.0), product of:",
            "details": [
              {
                "value": 11,
                "description": "boost",
                "details": []
              },
              {
                "value": -0.22314355,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 1,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
...

@dko-slapdash
Copy link

@HanguChoi is there at least a work-around here which I can apply to my queries? The issue is opened for ~8 months, and there is a little hope that it will be fixed anytime soon.

Maybe there is a way to tweak the log(1 + (N - n + 0.5) / (n + 0.5)) formula somehow and replace it with something more dumb, but which doesn't produce negative scores?

@HanguChoi
Copy link

@dko-slapdash , I am using best_fields with tie_breaker.

It wasn't bad for our product.

@mgntrn
Copy link

mgntrn commented Nov 23, 2020

Any updates on this issue?

@adri
Copy link

adri commented Jan 5, 2021

Running into sporadic errors like this after upgrading from Elasticsearch 6.x to v7.9.3:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "exception",
        "reason" : "function score query returned an invalid score: -5.805419 for doc: 1325"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "auto-complete",
        "node" : "AR72bFoxQ0-8MfmP9np4NQ",
        "reason" : {
          "type" : "exception",
          "reason" : "function score query returned an invalid score: -5.805419 for doc: 1325"
        }
      }
    ]
  },
  "status" : 500
}

I'm also using a function_score with a multi_match query of type cross_fields. Changing cross_fields to best_fields makes the query return a result. Is there any other workaround except changing the query?

@nemphys
Copy link

nemphys commented Feb 3, 2021

I am wondering if this is the same issue with what I am currently facing:

I am using a cross-field multi-match query (with a tie breaker of 0.1) and all seems to work well.

After wrapping the query in a script_score query, I start to get negative score exceptions.

In the beginning I thought it had something to do with the script, but after changing the script to "_score", I still get the same errors.

There seems to be some king of issue when script_score is combined with cross_fields, unless the issue has only to do with script_score and the negative scores are somehow automatically mitigated by ES when in normal query mode (not script_score).

EDIT: As a temporary workaround, I have adjusted my script_score script, so that it returns 0 if _score is < 0; I suppose this messes with the scores, but at least it does not throw exceptions.

@jtibshirani
Copy link
Contributor

jtibshirani commented Feb 11, 2021

An update on this issue: we are working on #41106, which will add a new multi-fields mode based on the BM25F scoring model. The recommendation going forward will be to use this new mode instead of cross_fields, whose scoring behavior is both broken and not based on a solid principle.

@nemphys
Copy link

nemphys commented Feb 11, 2021

@jtibshirani sounds good! Is there any (approximate) timeframe on this? We are working on a project that is dependent on cross-fields due to be released within 2-3 months.

@jtibshirani
Copy link
Contributor

jtibshirani commented Feb 11, 2021

@nemphys we don't usually give time estimates (even approximate), but I am working on it now and you can follow along with progress on the issue (#41106).

@lasagar
Copy link

lasagar commented Jul 31, 2021

combined_fields cannot be used be in query_string query. Can we add this functionality

@jtibshirani
Copy link
Contributor

I merged #89016, which fixes the cross_fields scoring bug directly.

From your feedback, we realized that combined_fields is not ready as a drop-in replacement for cross_fields. This is because it doesn't support fields with different analyzers (which also means it can't easily be used in query_string). We will work on improving combined_fields, but in the meantime we decided to make this targeted fix for cross_fields.

I'm sorry that this took so long! In retrospect we could have fixed it sooner. Hopefully you were able to work around the issue in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants