Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert cache to a ttl cache #77

Open
wants to merge 1 commit into
base: db_main
Choose a base branch
from

Conversation

christopherzli
Copy link

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

pending test

@christopherzli christopherzli requested review from a team, jnyi, hczhu-db and yuchen-db and removed request for a team August 28, 2024 00:09
Copy link
Collaborator

@hczhu-db hczhu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

}
f.lruCache.Add(queryExpressionNormalized, queryExpressionRangeLength)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm Add() is thread-safe? Multiple goroutines from the HTTP server may call this function here concurrently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it uses a mutex

@@ -159,7 +156,7 @@ func (f *FailedQueryCache) UpdateFailedQueryCache(err error, query url.Values, q
queryExpressionRangeLength := getQueryRangeSeconds(query)
// TODO(hc.zhu): add a flag for the threshold
// The current gateway timeout is 5 minutes, so we cache the failed query running longer than 5 minutes - 10 seconds.
if queryResponseTime > time.Second * (60 * 5 - 10) {
if queryResponseTime > time.Second*(60*5-10) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the magic number a flag since you are touching this part? :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt, it was due to go format

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do it in a separate PR

@@ -34,19 +34,16 @@ func verifyMetricCount(t *testing.T, reg *prometheus.Registry, expectedCount int

func TestNewFailedQueryCache(t *testing.T) {
reg := prometheus.NewRegistry()
cache, err := NewFailedQueryCache(2, reg)
cache := NewFailedQueryCache(2, 0, reg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test cases where the TTL is not 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 0 TTL mean? Never expire?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means no ttl

@@ -48,6 +48,7 @@ type HandlerConfig struct {
QueryStatsEnabled bool `yaml:"query_stats_enabled"`
LogFailedQueries bool `yaml:"log_failed_queries"`
FailedQueryCacheCapacity int `yaml:"failed_query_cache_capacity"`
FailedQueryTTL time.Duration `yaml:"failed_query_ttl"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a cmd line flag for this config.
image

cachedHits prometheus.Counter
cachedQueries prometheus.Gauge
}

func NewFailedQueryCache(capacity int, reg prometheus.Registerer) (*FailedQueryCache, error) {
func NewFailedQueryCache(capacity int, ttlDuration time.Duration, reg prometheus.Registerer) *FailedQueryCache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the eventual goal of a TTL is to cache expensive queries as well. If a query fetches too many time series, add it to the cache.
Can you rename the cache like ExpensiveQueryCache and make it cache both failed queries (usually expensive ones), long-running ones (> 4 minutes), and the ones fetching too many time series?

Copy link
Collaborator

@hczhu-db hczhu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants