You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As more users are starting to embed TraceQL queries in dashboards it has become apparent that we need to improve our caching to handle repeated queries with slightly adjusted time ranges (e.g. auto refreshing dashboards). Currently we only cache parquet footers and bloom filters.
Let's add a cache at the query-frontend at the individual "job" level. After a query is broken into a stream of jobs we will cache based on the individual job url. This takes into account the query, block id, row groups, etc. For a given job the results are immutable b/c the blocks don't change. So if we have previously executed a query we can expect the results to be the same.
Caveats:
We can only rely on cache if the start/end time ranges completely encapsulate the block. Use the metadata to determine this. If the start/end overlap the block we have to issue the job to the queriers b/c cache can't be trusted.
Start/end time ranges need to be stripped from the url before hashing for cache. This way as dashboard slowly moves across a time range we will generally be pulling from cache for most blocks and only issues requests to the queriers for blocks on the edges of the time ranges and new blocks created by compactors.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.
As more users are starting to embed TraceQL queries in dashboards it has become apparent that we need to improve our caching to handle repeated queries with slightly adjusted time ranges (e.g. auto refreshing dashboards). Currently we only cache parquet footers and bloom filters.
Let's add a cache at the query-frontend at the individual "job" level. After a query is broken into a stream of jobs we will cache based on the individual job url. This takes into account the query, block id, row groups, etc. For a given job the results are immutable b/c the blocks don't change. So if we have previously executed a query we can expect the results to be the same.
Caveats:
We can only rely on cache if the start/end time ranges completely encapsulate the block. Use the metadata to determine this. If the start/end overlap the block we have to issue the job to the queriers b/c cache can't be trusted.
Start/end time ranges need to be stripped from the url before hashing for cache. This way as dashboard slowly moves across a time range we will generally be pulling from cache for most blocks and only issues requests to the queriers for blocks on the edges of the time ranges and new blocks created by compactors.
The text was updated successfully, but these errors were encountered: