Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - Aggs show 0 buckets incorrectly when some filters are applied #868

Open
justincorrigible opened this issue Jan 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@justincorrigible
Copy link
Member

justincorrigible commented Jan 9, 2024

As identified in icgc-argo/roadmap#1057, some filters produce 0 buckets inaccurately.
this ticket will serve as documentation log for the research into this issue, and to link its eventual fix.

Thus far, the working theory is that there's something wrong with the aggs filtering for array nested fields, and specifically for "in" operations.

Example
Using the Argo ticket, we can run a testing GraphQL query like this one, with no filters.

query ($SQON: JSON) {
  file {
    hits (filters: $SQON) {
      total
    } 
    aggregations(
      filters: $SQON
      include_missing: true
      aggregations_filter_themselves: true
    ) {
      donors__donor_id {
        bucket_count
        buckets {
          doc_count
          key
        }
      }
    }
  }
}

any anonymous user can see 1660 docs in the dev environment, as seen in the Arranger's GraphQL response:

{
 "data": {
   "file": {
     "hits": {
       "total": 1660
     },
     "aggregations": {
       "donors__donor_id": {
         "bucket_count": 6,
         "buckets": [
           {
             "doc_count": 877,
             "key": "DO250472"
           },
           {
             "doc_count": 478,
             "key": "DO253000"
           },
           {
             "doc_count": 163,
             "key": "DO35085"
           },
           {
             "doc_count": 138,
             "key": "DO252999"
           },
           {
             "doc_count": 3,
             "key": "DO250326"
           },
           {
             "doc_count": 1,
             "key": "DO250391"
           }
         ]
       }
     }
   }
 }
}

Now lets assume the following SQON:

{
  "content": {
    "fieldName": "donors.specimens.specimen_tissue_source",
    "value": "Solid tissue"
  },
  "op": "in"
}

Note: donors here, is technically an array of those, and so are specimens.

...which results in this response (aka the problem):

{
  "data": {
    "file": {
      "hits": {
        "total": 18
      },
      "aggregations": {
        "donors__donor_id": {
          "bucket_count": 0,
          "buckets": []
        }
      }
    }
  }
}

but then, if you turn the SQON to use a "not_in" operation, we get this correct response:

{
  "data": {
    "file": {
      "hits": {
        "total": 1642
      },
      "aggregations": {
        "donors__donor_id": {
          "bucket_count": 5,
          "buckets": [
            {
              "doc_count": 877,
              "key": "DO250472"
            },
            {
              "doc_count": 478,
              "key": "DO253000"
            },
            {
              "doc_count": 148,
              "key": "DO35085"
            },
            {
              "doc_count": 138,
              "key": "DO252999"
            },
            {
              "doc_count": 1,
              "key": "DO250391"
            }
          ]
        }
      }
    }
  }
}

Notice the totals are 1660 = 18 + 1642, which tracks with the fact that the SQONs are not entirely broken 🤣

@justincorrigible justincorrigible added the bug Something isn't working label Jan 9, 2024
@justincorrigible
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant