Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Range aggregation ignores query #15169

Closed
marko-bekhta opened this issue Aug 8, 2024 · 10 comments · Fixed by #15194
Closed

[BUG] Range aggregation ignores query #15169

marko-bekhta opened this issue Aug 8, 2024 · 10 comments · Fixed by #15194
Assignees

Comments

@marko-bekhta
Copy link

marko-bekhta commented Aug 8, 2024

Describe the bug

Range aggregation ignores query. This is working ok in 2.15.0 and started to fail in the recent 2.16.0.

Related component

Search:Aggregations

To Reproduce

This problem was initially noticed with a routing filter, but it can also be reproduced with other queries, e.g., a range query.

  1. Create a simple index
PUT http://localhost:32772/range-agg-test-000001
Content-Type: application/json

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "Integer": {
        "type": "integer"
      }
    }
  }
}
  1. add a few documents to the index:
POST http://localhost:32772/range-agg-test-000001/_doc/1?routing=route1
Content-Type: application/json

{
  "Integer": -2147483648
}
POST http://localhost:32772/range-agg-test-000001/_doc/2?routing=route1
Content-Type: application/json

{
  "Integer": -2147483648
}
POST http://localhost:32772/range-agg-test-000001/_doc/200?routing=route2
Content-Type: application/json

{
  "Integer": -2147483648
}

3.1. Try to get a range aggregation and use the filter on a route:

POST http://localhost:32772/range-agg-test-000001/_search
Content-Type: application/json

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": [
        {
          "terms": {
            "_routing": [
              "route1"
            ]
          }
        }
      ]
    }
  },
  "aggregations": {
    "aggregationName": {
      "range": {
        "field": "Integer",
        "keyed": true,
        "ranges": [
          {
            "to": 0,
            "key": "0"
          },
          {
            "from": 0,
            "key": "1"
          }
        ]
      }
    }
  },
  "_source": false
}

3.2. Alternatively, try a simple > query instead:

POST http://localhost:32768/range-agg-test-000001/_search
Content-Type: application/json

{
  "query": {
    "range": {
      "Integer": {
        "gte": 10
      }
    }
  },
  "aggregations": {
    "aggregationName": {
      "range": {
        "field": "Integer",
        "keyed": true,
        "ranges": [
          {
            "to": 0,
            "key": "0"
          },
          {
            "from": 0,
            "key": "1"
          }
        ]
      }
    }
  },
  "_source": false
}

In both cases, the result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "aggregationName": {
      "buckets": {
        "0": {
          "to": 0.0,
          "doc_count": 3
        },
        "1": {
          "from": 0.0,
          "doc_count": 0
        }
      }
    }
  }
}

Expected behavior

It would be expected to get

  • in the case of 3.1: 2 documents in the bucket "0" as there are only two documents with the route1 :
{
  "aggregationName": {
    "buckets": {
      "0": {
        "to": 0.0,
        "doc_count": 2
      },
      "1": {
        "from": 0.0,
        "doc_count": 0
      }
    }
  }
}
  • in the case of 3.2: 0 doc counts in both ranges, since none of the values are > 0
{
  "aggregationName": {
    "buckets": {
      "0": {
        "to": 0.0,
        "doc_count": 0
      },
      "1": {
        "from": 0.0,
        "doc_count": 0
      }
    }
  }
}

Additional Details

Plugins

ec6d9b89d3cf opensearch-alerting                  2.16.0.0
ec6d9b89d3cf opensearch-anomaly-detection         2.16.0.0
ec6d9b89d3cf opensearch-asynchronous-search       2.16.0.0
ec6d9b89d3cf opensearch-cross-cluster-replication 2.16.0.0
ec6d9b89d3cf opensearch-custom-codecs             2.16.0.0
ec6d9b89d3cf opensearch-flow-framework            2.16.0.0
ec6d9b89d3cf opensearch-geospatial                2.16.0.0
ec6d9b89d3cf opensearch-index-management          2.16.0.0
ec6d9b89d3cf opensearch-job-scheduler             2.16.0.0
ec6d9b89d3cf opensearch-knn                       2.16.0.0
ec6d9b89d3cf opensearch-ml                        2.16.0.0
ec6d9b89d3cf opensearch-neural-search             2.16.0.0
ec6d9b89d3cf opensearch-notifications             2.16.0.0
ec6d9b89d3cf opensearch-notifications-core        2.16.0.0
ec6d9b89d3cf opensearch-observability             2.16.0.0
ec6d9b89d3cf opensearch-performance-analyzer      2.16.0.0
ec6d9b89d3cf opensearch-reports-scheduler         2.16.0.0
ec6d9b89d3cf opensearch-security                  2.16.0.0
ec6d9b89d3cf opensearch-security-analytics        2.16.0.0
ec6d9b89d3cf opensearch-skills                    2.16.0.0
ec6d9b89d3cf opensearch-sql                       2.16.0.0
ec6d9b89d3cf query-insights                       2.16.0.0

Host/Environment (please complete the following information):

Additional context
Running OpenSearch using the official opensearchproject/opensearch:2.16.0 image.

@marko-bekhta marko-bekhta added bug Something isn't working untriaged labels Aug 8, 2024
@harshavamsi
Copy link
Contributor

harshavamsi commented Aug 8, 2024

Haven't tried to reproduce yet, but most likely due to #13865

@bowenlan-amzn can you take a look?

@marko-bekhta
Copy link
Author

Thanks for taking a look. Yeah, I was looking at the change log, and the #13865 also caught my eye.

By the way, a question not entirely related to the issue at hand: do you happen to have a snapshot build of OpenSearch that we could run tests against? Ideally, it would be a snapshot docker image, so it'll be easier to integrate the testing into our CI. That'd help catch this kind of issues before the release..

@harshavamsi
Copy link
Contributor

I could repro this, in both cases we get

    "aggregations": {
        "aggregationName": {
            "buckets": {
                "0": {
                    "to": 0.0,
                    "doc_count": 3
                },
                "1": {
                    "from": 0.0,
                    "doc_count": 0
                }
            }
        }
    }

As for the snapshots, I know we have them here -- https://aws.oss.sonatype.org/content/repositories/snapshots/org/opensearch/opensearch/

@dennisoelkers
Copy link

dennisoelkers commented Aug 12, 2024

I can also verify that this is happening when using a date_range aggregation. All filters of the query are ignored when it is used in 2.16.0.

@bowenlan-amzn
Copy link
Member

@marko-bekhta @dennisoelkers sorry for the late reply. Please try set this cluster setting search.max_aggregation_rewrite_filters to 0 to disable this filter rewrite optimization, it should fall back to the default aggregation path.

@getsaurabh02
Copy link
Member

@dennisoelkers can we confirm if disabling the dynamic setting search.max_aggregation_rewrite_filters helps with mitigating the issue here?

@drewmiranda-gl
Copy link

drewmiranda-gl commented Aug 14, 2024

@getsaurabh02 I've just tested and it does appear to properly return 0:

Compare (first item is more recent, after applying search.max_aggregation_rewrite_filters: 0. Second item is before setting search.max_aggregation_rewrite_filters)

image

I'll defer to Dennis for his expertise though.

@drewmiranda-gl
Copy link

So far we've gotten 2 (3 counting my test above) positive acknowledgments that the workaround does resolve the issue. 🙌

This is great news.

@dblock
Copy link
Member

dblock commented Aug 15, 2024

Maybe someone on this thread could add some agg examples to https://github.com/opensearch-project/opensearch-api-specification/tree/main/tests and make sure we have those specs? We have lots of users asking for agg support in clients (example)

@dennisoelkers
Copy link

@getsaurabh02: I can verify that it the setting is fixing for us now, results are as expected again, at least for the cases that I tested (using a date_range aggregation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Done
8 participants