Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing inner hits in top hits of an aggregation results since upgrade to 2.13.0 #13467

Closed
martijnbolhuis opened this issue Apr 30, 2024 · 4 comments · Fixed by #13486
Closed
Assignees
Labels
bug Something isn't working Search:Aggregations Severity-Critical v2.14.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@martijnbolhuis
Copy link

Describe the bug

I have query on a nested field names.full_name. I have enabled inner hits on this query. Furthermore, I have added an aggregation on the (non nested) field list_id and I'm using the top hits function to include results per bucket of the aggregation.

In OpenSearch version 2.12.0, the top hits included the inner hits (on names.full_name) but in version 2.13.0 these inner hits are missing.

Related component

Other

To Reproduce

The following script reproduces the problem:

# Create an index and mapping
curl -X DELETE "http://localhost:9200/names-test?pretty"
curl -X PUT "http://localhost:9200/names-test?pretty"  -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "list_id": {
        "type": "integer"
      },
      "names": {
        "type": "nested",
        "properties": {
          "full_name": {
            "type": "text"
          }
        }
      }
    }
  }
}
'

# Insert documents into the index
curl -X PUT "http://localhost:9200/names-test/_doc/1?refresh&pretty"  -H 'Content-Type: application/json' -d'
{
  "list_id": 1,
  "names": [
    {
      "full_name": "John Doe"
    },
    {
      "full_name": "John Micheal Doe"
    }
  ]
}
'

curl -X PUT "http://localhost:9200/names-test/_doc/2?refresh&pretty"  -H 'Content-Type: application/json' -d'
{
  "list_id": 2,
  "names": [
    {
      "full_name": "Jane Doe"
    },
    {
      "full_name": "Jane Michelle Doe"
    }
  ]
}
'

# Perform a query
curl -X POST "http://localhost:9200/names-test/_search?pretty"  -H 'Content-Type: application/json' -d'
{
  "query": {
    "nested": {
      "path": "names",
      "query": {
        "match": { "names.full_name": "Doe" }
      },
      "inner_hits": {}
    }
  },
  "size": 0,
  "aggs": {
    "lists": {
      "terms": {
        "field": "list_id"
      },
      "aggs": {
        "top_result": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}
'

Expected behavior

The following is the expected result which OpenSearch 2.12.0 gives. This includes inner_hits.

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "lists" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 1,
          "doc_count" : 1,
          "top_result" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.10607058,
              "hits" : [
                {
                  "_index" : "names-test",
                  "_id" : "1",
                  "_score" : 0.10607058,
                  "_source" : {
                    "list_id" : 1,
                    "names" : [
                      {
                        "full_name" : "John Doe"
                      },
                      {
                        "full_name" : "John Micheal Doe"
                      }
                    ]
                  },
                  "inner_hits" : {
                    "names" : {
                      "hits" : {
                        "total" : {
                          "value" : 2,
                          "relation" : "eq"
                        },
                        "max_score" : 0.11474907,
                        "hits" : [
                          {
                            "_index" : "names-test",
                            "_id" : "1",
                            "_nested" : {
                              "field" : "names",
                              "offset" : 0
                            },
                            "_score" : 0.11474907,
                            "_source" : {
                              "full_name" : "John Doe"
                            }
                          },
                          {
                            "_index" : "names-test",
                            "_id" : "1",
                            "_nested" : {
                              "field" : "names",
                              "offset" : 1
                            },
                            "_score" : 0.09739208,
                            "_source" : {
                              "full_name" : "John Micheal Doe"
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              ]
            }
          }
        },
        {
          "key" : 2,
          "doc_count" : 1,
          "top_result" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.10607058,
              "hits" : [
                {
                  "_index" : "names-test",
                  "_id" : "2",
                  "_score" : 0.10607058,
                  "_source" : {
                    "list_id" : 2,
                    "names" : [
                      {
                        "full_name" : "Jane Doe"
                      },
                      {
                        "full_name" : "Jane Michelle Doe"
                      }
                    ]
                  },
                  "inner_hits" : {
                    "names" : {
                      "hits" : {
                        "total" : {
                          "value" : 2,
                          "relation" : "eq"
                        },
                        "max_score" : 0.11474907,
                        "hits" : [
                          {
                            "_index" : "names-test",
                            "_id" : "2",
                            "_nested" : {
                              "field" : "names",
                              "offset" : 0
                            },
                            "_score" : 0.11474907,
                            "_source" : {
                              "full_name" : "Jane Doe"
                            }
                          },
                          {
                            "_index" : "names-test",
                            "_id" : "2",
                            "_nested" : {
                              "field" : "names",
                              "offset" : 1
                            },
                            "_score" : 0.09739208,
                            "_source" : {
                              "full_name" : "Jane Michelle Doe"
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

The actual result in 2.13.0 is the following which is missing the inner_hits:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "lists" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 1,
          "doc_count" : 1,
          "top_result" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.10607058,
              "hits" : [
                {
                  "_index" : "names-test",
                  "_id" : "1",
                  "_score" : 0.10607058,
                  "_source" : {
                    "list_id" : 1,
                    "names" : [
                      {
                        "full_name" : "John Doe"
                      },
                      {
                        "full_name" : "John Micheal Doe"
                      }
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "key" : 2,
          "doc_count" : 1,
          "top_result" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.10607058,
              "hits" : [
                {
                  "_index" : "names-test",
                  "_id" : "2",
                  "_score" : 0.10607058,
                  "_source" : {
                    "list_id" : 2,
                    "names" : [
                      {
                        "full_name" : "Jane Doe"
                      },
                      {
                        "full_name" : "Jane Michelle Doe"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Additional Details

Plugins
No plugins / default opensearch installation

Host/Environment (please complete the following information):

  • OS: Arch Linux
  • Version latest

I'm using opensearch from docker: https://hub.docker.com/layers/opensearchproject/opensearch/2.13.0/images/sha256-00f052502297cbc599af34b93605e1eb485438f0e9670dc8d82a4976da7d3feb?context=explore

Additional context

I set "size": 0 in the main query because I'm not interested in the "regular" hits but only in the aggregated top hits. If I change this to for example "size": 100, the "regular" hits will include the inner hits so there it works as expected. But I do this, the top hits still will not include the inner hits.

@martijnbolhuis martijnbolhuis added bug Something isn't working untriaged labels Apr 30, 2024
@github-actions github-actions bot added the Other label Apr 30, 2024
@dblock
Copy link
Member

dblock commented Apr 30, 2024

@martijnbolhuis Looks like a regression. Care to add a (failing) YAML REST test that reproduces this problem and try to bisect it to where the bug was introduced?

@martijnbolhuis
Copy link
Author

@dblock Thanks for your response.

The following YAML REST test reproduces the problem. It seems that commit 965d85a introduces the problem. See also #12503

I've added the following file rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/400_inner_hits.yml

setup:
  - do:
      indices.create:
          index: test_1
          body:
            settings:
              number_of_replicas: 0
            mappings:
              properties:
                list_id:
                  type: integer
                names:
                  type: nested
                  properties:
                    full_name:
                      type: text

  - do:
       bulk:
         refresh: true
         body:
           - index:
               _index: test_1
               _id:    1
           - list_id: 1
             names:
               - full_name: John Doe
               - full_name: John Micheal Doe
           - index:
               _index: test_1
               _id:    2
           - list_id: 2
             names:
               - full_name: Jane Doe
               - full_name: Jane Michelle Doe

---
"Include inner hits in top hits":
  - do:
      search:
        rest_total_hits_as_int: true
        body:
          query:
            nested:
              path: names
              query:
                match:
                  names.full_name: Doe
              inner_hits: { }
          size: 0
          aggs:
            lists:
              terms:
                field: list_id
              aggs:
                top_result:
                  top_hits:
                    size: 10

  - length: { hits.hits: 0 }
  - length: { aggregations.lists.buckets: 2 }
  - length: { aggregations.lists.buckets.0.top_result.hits.hits: 1 }
  - length: { aggregations.lists.buckets.0.top_result.hits.hits.0.inner_hits.names.hits.hits: 2 }
  - length: { aggregations.lists.buckets.1.top_result.hits.hits: 1 }
  - length: { aggregations.lists.buckets.1.top_result.hits.hits.0.inner_hits.names.hits.hits: 2 }

In the following my test fails:

git checkout 965d85aba69baf83f2f60649bc97e3eae44bff05
./gradlew --no-daemon --no-build-cache run
# In another shell
./gradlew ':rest-api-spec:yamlRestTest' --tests "org.opensearch.test.rest.ClientYamlTestSuiteIT" -Dtests.method="test {p0=search.aggregation/400_inner_hits/*}" -Dtests.cluster=localhost:9200 -Dtests.clustername=test -Dtests.rest.cluster=localhost:9200
...
Tests with failures:
 - org.opensearch.test.rest.ClientYamlTestSuiteIT.test {p0=search.aggregation/400_inner_hits/Include inner hits in top hits}

1 test completed, 1 failed

And in the following my test works:

git checkout 965d85aba69baf83f2f60649bc97e3eae44bff05~1
./gradlew --no-daemon --no-build-cache run
# In another shell
./gradlew ':rest-api-spec:yamlRestTest' --tests "org.opensearch.test.rest.ClientYamlTestSuiteIT" -Dtests.method="test {p0=search.aggregation/400_inner_hits/*}" -Dtests.cluster=localhost:9200 -Dtests.clustername=test -Dtests.rest.cluster=localhost:9200
...
BUILD SUCCESSFUL in 14s

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@martijnbolhuis Thanks for creating the detailed issue, we'd welcome a pull request to address the issue.

@dblock
Copy link
Member

dblock commented May 1, 2024

@martijnbolhuis Thank you for diving deep into this one.

cc: @jainankitk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations Severity-Critical v2.14.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Done
Status: Planned work items
4 participants