Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested search failing with _source disabled #43517

Closed
sronsiek opened this issue Jun 23, 2019 · 2 comments
Closed

Nested search failing with _source disabled #43517

sronsiek opened this issue Jun 23, 2019 · 2 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@sronsiek
Copy link

sronsiek commented Jun 23, 2019

Elasticsearch version (bin/elasticsearch --version):

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.0.1, Build: default/docker/e4efcb5/2019-04-29T12:56:03.145736Z, JVM: 12.0.1

Plugins installed: []

bin/elasticsearch-plugin install --batch ingest-attachment

JVM version (java -version):

openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment (build 12.0.1+12)
OpenJDK 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):

Elastic is running in the official elastic docker container

Linux elastic 4.4.76-1-default #1 SMP Fri Jul 14 08:48:13 UTC 2017 (9a2885c) x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

I have been upgrading an existing application from Elastic v2.1.2 to v7.0.1.

The mapping for document types has _source disabled.

One feature we use is a nested field 'history'. Each record contains an history array, each element of which contains several properties (author, created_at, state).

Search on these fields worked fine in v2.1.2, but used a, now deprecated, 'include_in_parent' flag. In v7.0.1, queries containing are seen to fail with null-pointer exceptions. A lot of bug-chasing later I found the feature can be made to work, if I add ANY one of the history properties with _source enabled in the mapping. After this all searches started working - even when they were on history fields other than the one included in _source. In the queries, _source is set to false, both at top level and within inner_hits, since all we need is the parent doc _id.

Steps to reproduce:

  1. Create template with mapping & settings (without the workaround):
curl -H 'Content-Type: application/json' -X PUT http://localhost:9200/_template/test_doc -d '
{
  "index_patterns": "test_doc*",
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "doc_body": {
        "fields": {
          "keyword": {
            "ignore_above": 256,
            "type": "keyword"
          }
        },
        "type": "text"
      },
      "history": {
        "properties": {
          "author": {
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            },
            "type": "text"
          },
          "created_at": {
            "type": "date"
          },
          "state": {
            "index": true,
            "type": "keyword"
          }
        },
        "type": "nested"
      },
      "id": {
        "index": true,
        "type": "long"
      }
    }
  },
  "order": 0,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "whitespace"
        }
      }
    },
    "index": {
      "max_inner_result_window": 1000
    },
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "refresh_interval": "1s"
  }
}'
  1. Insert first record (also creating the index):
curl -H 'Content-Type: application/json' -X POST http://localhost:9200/test_doc/_doc/1 -d '
{
    "id": 1000,
    "doc_body": "This is some body text",
    "history": [
        {
            "author": "Freddy",
            "created_at": "2019-06-23T13:10",
            "state": "created"
        },
        {
            "author": "Mike",
            "created_at": "2019-06-23T13:12",
            "state": "created"
        }
    ]
}'
  1. Perform a search on nested data:
curl -H 'Content-Type: application/json' -X POST http://localhost:9200/test_doc/_search -d '
{
    "query": {
        "bool": {
            "filter": {
                "nested": {
                    "path": "history",
                    "inner_hits": {
                        "size": 50,
                        "name": "history"
                    },
                    "query": {
                        "range": {
                            "history.created_at": {
                                "gte": "2012-01-01"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": "100",
    "sort": [
        {
            "id": "desc"
        }
    ]
}'

results in:

{"error":{"root_cause":[{"type":"null_pointer_exception","reason":null}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_doc","node":"MZot1_RnSvqSm6g7wCuisw","reason":{"type":"null_pointer_exception","reason":null}}],"caused_by":{"type":"null_pointer_exception","reason":null,"caused_by":{"type":"null_pointer_exception","reason":null}}},"status":500}

plus a lengthy exception stack in the elastic log (attached).

Now, changing the _source setting in the template:

  "mappings": {
    "_source": {
      "enabled": false
    },

with:

  "mappings": {
    "_source": {
      "includes": [
        "history.created_at"
      ]
    },

and repeating the steps will work as expected.

Possibly also relevant, elasticsearch.yml contains:

discovery.type: single-node

For Info: Aggregated searches do not appear to be affected by this issue - seen to work in both cases.

elastic_exception.log

@gwbrown gwbrown added the :Search/Search Search-related issues that do not fall into other categories label Jun 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jtibshirani
Copy link
Contributor

jtibshirani commented Jul 24, 2019

Thank you for reporting this @sronsiek! I opened #44836 to address the issue.

One short-term workaround would be to explicitly turn off source loading for the nested documents:

curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/test_doc/_search -d '
{
    "query": {
        "bool": {
            "filter": {
                "nested": {
                    "path": "history",
                    "inner_hits": {
                        "_source": false,
                        "size": 50,
                        "name": "history"
                    },
                   ...
                }
            }
        }
    },
   ...
}'

When I tried this on 7.0.1, the search completes successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants