Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ascending sort with missing _first fails on datefields with missing values #81960

Open
stu-elastic opened this issue Dec 20, 2021 · 7 comments
Open
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@stu-elastic
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version): v8.1.0, v7.16.2 and at least v7.15.1

Description of the problem including expected versus actual behavior:

Indexing a document with a missing date time value, then ascending sorting it with "missing": "_first" results in Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4 if it would be the only document returned, ie size: 1.

The formatter is trying to format the sentinel value of -9223372036854775808.

Steps to reproduce:

PUT test
{
  "mappings" : {
    "properties" : {
      "field1" : {
        "type" : "integer"
      },
      "dt" : {
        "type" : "date",
        "format" : "strict_date_time||strict_date_time_no_millis"
      }
    }
  }
}

POST _bulk
{"index":{"_index":"test","_id":"1"}}
{"field1": 1243, "dt": "2021-12-20T23:14:20+00:00"}
{"index":{"_index":"test","_id":"2"}}
{"field2": 4567}

GET test/_search
{
  "size": 1,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "dt": {
        "missing": "_first",
        "order": "asc"
      }
    }
  ]
}

This is #73763 with targetNumericType == NumericType.DATE

Using "missing": 0 works around the issue.

@stu-elastic stu-elastic added >bug :Search/Search Search-related issues that do not fall into other categories needs:triage Requires assignment of a team area label labels Dec 20, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@idobrodushniy
Copy link

Additional details about the issue, since I faced the same problem ⬇️

TLDR
This problem ⬆️ happens as a consequence of the combination of following factors:

  • strict_date_time format is either in the field mapping or specified in the sort explicitly
  • sort is performed on this field, while some of the documents have this field equal to null.
  • the size in the query is less than the result set of documents that could be matched to the query. (this one is very important, otherwise u will not reproduce this error)

Workaround
You can use epoch_second as a format for your sorting. (or any other format that would work for you)
E.g.

"dt": {
        "format": "epoch_second
        "missing": "_first",
        "order": "desc"
}

Platforms
I managed to reproduce this issue both on ES 8.1.3 and ES 7.14.1.

My conclusion 🐛

  • There is just a bug with formatting random dates generated by ES, since if I change the format in the sort to epoch_second - all just works fine. I assume it happens right here (according to the stack trace you can find below).
  • This doesn't happen when the size value in the query is bigger than the amount of documents that could match the query in all shards. I assume ES has different behaviour for these two cases, therefore it is definitely possible to fix this problem.

Details 🔍
If strict_date_time format specified either in mapping or explicitly in sort field, then all null values will be replaced with inexistent datetime(all docs will have the same one) by ES automatically. (E.g. "-292275055-05-16T16:47:04.192Z")

Then, ES will try to format these datetime to return it in sort of every document.
As a result, parsing will end up throwing an error (please see the stack trace below) and returns an error response with a message Field Year cannot be printed as the value ... exceeds the maximum print width of 4.

This is the log I have in my ES docker container ⬇️ (what is interesting though, is that in version 8.1.3, in comparison with version 7.14.1, this is not an error but debug log).

{
    "@timestamp": "2022-10-26T23:09:11.996Z",
    "log.level": "DEBUG",
    "message": "[hvB72OhCQ1mmJsmHNzQ0_w][test][8]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[test], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={\"size\":100,\"_source\":{\"includes\":[\"_id\"],\"excludes\":[]},\"sort\":[{\"closed_datetime\":{\"order\":\"desc\"}},{\"id.exact\":{\"order\":\"desc\"}}]}}] lastShard [true]",
    "ecs.version": "1.2.0",
    "service.name": "ES_ECS",
    "event.dataset": "elasticsearch.server",
    "process.thread.name": "elasticsearch[99fcaa518426][search][T#1]",
    "log.logger": "org.elasticsearch.action.search.TransportSearchAction",
    "elasticsearch.cluster.uuid": "aQO5AwL1QXyYiy9FIbz8uA",
    "elasticsearch.node.id": "hvB72OhCQ1mmJsmHNzQ0_w",
    "elasticsearch.node.name": "99fcaa518426",
    "elasticsearch.cluster.name": "docker-cluster",
    "error.type": "java.time.DateTimeException",
    "error.message": "Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4",
    "error.stack_trace": "java.time.DateTimeException: Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4\n\tat java.base/java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2802)\n\tat java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2411)\n\tat java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2411)\n\tat java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1853)\n\tat java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1827)\n\tat org.elasticsearch.common.time.JavaDateFormatter.format(JavaDateFormatter.java:241)\n\tat org.elasticsearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:288)\n\tat org.elasticsearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:217)\n\tat org.elasticsearch.search.SearchSortValuesAndFormats.<init>(SearchSortValuesAndFormats.java:35)\n\tat org.elasticsearch.action.search.BottomSortValuesCollector.consumeTopDocs(BottomSortValuesCollector.java:65)\n\tat org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:125)\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:323)\n\tat org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:33)\n\tat org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:18)\n\tat org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:61)\n\tat org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:25)\n\tat org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)\n\tat org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:642)\n\tat org.elasticsearch.transport.TransportService$4.handleResponse(TransportService.java:718)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1339)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1417)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1397)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:41)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:38)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:19)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"
}

@howardhuanghua
Copy link
Contributor

Hi @stu-elastic @kpollich , do we have plan to fix this issue? Or related PR has already fixed it? Thanks.

@nemphys
Copy link

nemphys commented Sep 22, 2023

+1 on this one, still happens in 8.8.2.

@mkhludnev
Copy link

Users might apply any of strict_* format for sort clause. It should fix the error.

@benwtrent
Copy link
Member

benwtrent commented Jul 10, 2024

I have tried in the latest version of Elasticsearch, and it sorts just fine, where the doc with the malformed value is populated to the top.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "test",
        "_id": "2",
        "_score": null,
        "_source": {
          "field2": 4567
        },
        "sort": [
          -9223372036854776000
        ]
      }
    ]
  }
}

Now, adjusting the request, I do get an error, but it sort of makes sense to me...

GET test/_search
{
  "size": 2,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "dt": {
        "format": "strict_date_time",
        "missing": "_first",
        "order": "asc"
      }
    }
  ]
}

You are trying to format the smallest possible date time and it just fails.

Maybe I don't know the desired behavior here. Should it just pick the smallest date that fits the format?

@benwtrent benwtrent added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 10, 2024
@javanna javanna added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

10 participants