-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to allow queries to bypass the query cache policy #16259
Comments
Related to #16031 |
@rendel i'm curious as to how you figured out that your queries are heavy to construct on small segments? That seems counterintuitive. Could you provide some examples? |
we have developed a custom query which embeds a large number of terms to perform a semi-join between indexes (see siren-join plugin). The terms are encoded in a byte array for performance consideration, and decoded lazily at query execution time. The decoding of the terms is the heavy part. We are caching them using a cache key. The issue now is that this decoding is always done for small segments. |
If a query is slow when it is not cached, I don't think the cache is to blame. It is something that users would hit anyway after a merge or a restart. I actually think not caching on small segments is very important as:
While I think there are things to improve based on the feedback that was given in #16031, I don't think we should make it possible to cache on all segments. |
@jpountz I would agree that for mainstream cases - the standard Lucene queries - should not be cached on small segments and that the new caching policy is well adapted for those kind of queries. However, there exist very legitimate cases, when this policy is too restrictive. We are not asking to change the high-level api (e.g., query dsl) but just to give that option at low level for advanced users that - like us - are building on top of Elasticsearch. Something at the java Lucene Query api level, where people creating a new custom Lucene Query can have somehow some control on the cache policy. Maybe this is something that should be implemented at a Lucene level instead of Elasticsearch ? Without such a control, we would have to fallback to alternative options that are not very optimal:
|
We have currently a performance issue with the new query cache policy. We have queries that are quite heavy to construct and compute, even on small segments. The
UsageTrackingQueryCachingPolicy
(which useCacheOnLargeSegments
) will always discard the caching of our queries on small segments. This leads to a significant drop of performance (5x to 10x) in our scenarios.Another limitation of the
UsageTrackingQueryCachingPolicy
is that there is no easy way to indicate him that our queries are costly to build, apart from subclassing our queries with MultiTermQuery so that it is picked up by theUsageTrackingQueryCachingPolicy#isCostly
.At the moment, the only solution we have is to configure elasticsearch to switch back to the
QueryCachingPolicy.ALWAYS_CACHE
cache policy.The text was updated successfully, but these errors were encountered: