Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score based on date timestamp are not working #11872

Closed
valentin-claras opened this issue Jun 25, 2015 · 5 comments
Closed

Score based on date timestamp are not working #11872

valentin-claras opened this issue Jun 25, 2015 · 5 comments

Comments

@valentin-claras
Copy link

Hi,

I tried to create a query with a score based on a date, but couldn't make it to work. The score I got is inconsistent.
Using the date alone gave me 3 identical scores where the date are actually different.

Timstamp

# Delete the index.
curl -XDELETE http://localhost:9200/test_date_scoring

# Insert 3 documents with different dates.
curl -XPUT http://localhost:9200/test_date_scoring/foo/1 -d '{
    "title": "Foo 1",
    "date": "2015-06-25T10:15:00"
}'
curl -XPUT http://localhost:9200/test_date_scoring/foo/2 -d '{
    "title": "Foo 2",
    "date": "2015-06-25T10:15:01"
}'
curl -XPUT http://localhost:9200/test_date_scoring/foo/3 -d '{
    "title": "Foo 3",
    "date": "2015-06-25T10:15:02"
}'

# Query with score based on date. The 3 scores should be different.
# Got three 1435227260000.
curl -XPOST http://localhost:9200/test_date_scoring/foo/_search -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "field_value_factor": {
                        "field": "date"
                    }
                }
            ],
            "boost_mode": "replace"
        }
    }
}'

# Result
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1435227260000,
    "hits": [
      {
        "_index": "test_date_scoring",
        "_type": "foo",
        "_id": "1",
        "_score": 1435227260000,
        "_source": {
          "title": "Foo 1",
          "date": "2015-06-25T10:15:00"
        }
      },
      {
        "_index": "test_date_scoring",
        "_type": "foo",
        "_id": "2",
        "_score": 1435227260000,
        "_source": {
          "title": "Foo 2",
          "date": "2015-06-25T10:15:01"
        }
      },
      {
        "_index": "test_date_scoring",
        "_type": "foo",
        "_id": "3",
        "_score": 1435227260000,
        "_source": {
          "title": "Foo 3",
          "date": "2015-06-25T10:15:02"
        }
      }
    ]
  }
}

I made some more tests with factor and sum, and again results are not what I expected.

Factor

# Query with score based on date and 0.001 factor. Score should be the same minus three 0.
# Got 1435227390 instead of 1435227260.
curl -XPOST http://localhost:9200/test_date_scoring/foo/_search -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "field_value_factor": {
                        "field": "date",
                        "factor": 0.001
                    }
                }
            ],
            "boost_mode": "replace"
        }
    }
}'

Sum

# Query with score based on date, then 1 is added. Score should be the same + 1.
# Got 1435227260000 instead of 1435227260001.
curl -XPOST http://localhost:9200/test_date_scoring/foo/_search -d '{
    "query": {
        "function_score": {
            "query": {
                "function_score": {
                    "functions": [
                        {
                            "field_value_factor": {
                                "field": "date"
                            }
                        }
                    ],
                    "boost_mode": "replace"
                }
            },
            "functions": [
                {
                    "weight": 1
                }
            ],
            "boost_mode": "sum"
        }
    }
}'

So my question is :
Is this the correct behavior ? If it is, how can I base my score on a date ?

@clintongormley
Copy link

If you add the ?explain parameter you will see that you are exceeding the max value for score, which is why they end up all being the same. However, if you set the factor to (eg) 0.000001 then you get different scores.

A better way to score on date (or recency, really) is to use a decay function.

@valentin-claras
Copy link
Author

Thanks for your quick answer.

Still, I'm not sure I understand your argument because when using ?explain (thanks btw, didn't know this one) I see that the maxBoost is way larger than the field value.

_explanation: {
    value: 1435227260000,
    description: "function score, product of:",
    details: [
        {
            value: 1435227260000
            description: "Math.min of",
            details: [
                {
                    value: 1435227260000,
                    description: "field value function: (doc['date'].value * factor=1.0)"
                },
                {
                    value: 3.4028235e+38,
                    description: "maxBoost"
                }
            ]
        },
        {
            value: 1,
            description: "queryBoost"
        }
    ]
}

@clintongormley
Copy link

@valentin-claras The final _score is a float, which can only represent integers accurately up to to 2^25. Timestamps are of the order of 2^40, so cannot be represented accurately, hence the rounding that you are seeing.

@valentin-claras
Copy link
Author

Yes that's what I understood later.

I'm now using a "0.0000166666666667" factor, so I can use my date with an accuracy to the minute and it's working so far (until the day I will need an accuracy to the second).

Thanks for your answer.

@sundarv85
Copy link

The value "0.0000166666666667" indeed is working. To understand it better, is it correct that the _score only store 8 digits (32-bits), there by multiplying with 0.00001 would give the score the top 8 digits of the timestamp and adding the 6ss at the end is to throw in a random multiplier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants