-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve poller missing index error condition #3223
Conversation
83add0c
to
764f8fc
Compare
tempoDistributor := util.NewTempoDistributor() | ||
tempoQueryFrontend := util.NewTempoQueryFrontend() | ||
tempoQuerier := util.NewTempoQuerier() | ||
require.NoError(t, s.StartAndWaitReady(tempoIngester1, tempoIngester2, tempoIngester3, tempoDistributor, tempoQueryFrontend, tempoQuerier)) | ||
require.NoError(t, s.StartAndWaitReady(tempoIngester1, tempoIngester2, tempoIngester3, tempoDistributor, tempoQueryFrontend, tempoQuerier, tempoCompactor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this will just test that the compactor starts? did you intend to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compactor needs to start in the tests now, since without it, the index won't get written.
@@ -223,12 +235,13 @@ func (p *Poller) pollTenantAndCreateIndex( | |||
|
|||
// polling fallback is true, log the error and continue in this method to completely poll the backend | |||
level.Error(p.logger).Log("msg", "failed to pull bucket index for tenant. falling back to polling", "tenant", tenantID, "err", err) | |||
} else { | |||
metricTenantIndexBuilder.WithLabelValues(tenantID).Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to move this? is this to distinguish between a "true" tenant index builder and a compactor that's just falling through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid setting the builder to true when the poller doesn't own the writing. I think this is what we intend. Or are you thinking that even if not configured to be the writer, if we fall though and end up writing, that we should metric as such?
if !builder { | ||
metricTenantIndexBuilder.WithLabelValues(tenantID).Set(0) | ||
|
||
i, err := p.reader.TenantIndex(derivedCtx, tenantID) | ||
err = p.tenantIndexPollError(i, err) | ||
|
||
if errors.Is(err, backend.ErrDoesNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the negative here is if we manually delete a tenant index to force recreation then only the assigned tenant index builders will attempt to recreate. which means that certain queriers will be unaware of any blocks in the backend for a tenant until it gets recreated (multiple minutes).
instead we could
- log info/debug "no tenant index found. falling through to polling"
- falling through to the blocklist poll below
- if the full blocklist list call returns an empty list log a warning. "no blocks found for tenant that should exist"
- raise the metric
tempod_blocklist_poll_errors_total
to build alerts off of/warn the operator to clean up partial blocks
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to avoid polling in the case of a missing index for non-builders. The only signal that we have currently that we wish to delete the tenant, is to delete the index. If you wanted to force a recreation of the index, you could corrupt the index, which would cause the poller to poll and recreate it.
f0b57a8
to
a9d6203
Compare
Closing for non-response. |
The tenant index deletion was originally put in as TCO win, but did not have the desired effect and surfaced other issues in the system. Related grafana#2678 Related grafana#2754 Related grafana#2781 Related grafana#2878 Related grafana#3115 Related grafana#3223 Due to the number of issues here, and causing considerable noise on the pager, perhaps the right thing to do is back out the tenant deletion. Raising here for discussion.
What this PR does:
Here we handle an error condition for a missing tenant index on non-index-builder pollers. This avoids unnecessary work when a tenant has been deleted, but remains in the tenant list due to the behavior described in #2754.
For non-index-builders:
This change resulted in the compactor being required as part of some tests. Previously, if the index was not available, all pollers would attempt to build and write the index.
Which issue(s) this PR fixes:
Related #2754
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]