Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout on lifecycler heartbeat #6212

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

anna-tran
Copy link
Contributor

@anna-tran anna-tran commented Sep 13, 2024

What this PR does:
Adds a timeout context to the lifecycler heartbeat to prevent requests to the ring from waiting indefinitely in case of a network issue.

Which issue(s) this PR fixes:
Fixes #6211

Checklist

  • [ N/A ] Tests updated
  • [ X ] Documentation added
  • [ X ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pkg/ring/kv/dynamodb/client.go Outdated Show resolved Hide resolved
pkg/ring/kv/dynamodb/dynamodb.go Outdated Show resolved Hide resolved
@alanprot
Copy link
Member

alanprot commented Sep 16, 2024

I think we can just add a context with timeout on the heartbeat call here?

func (i *Lifecycler) heartbeat() {
i.lifecyclerMetrics.consulHeartbeats.Inc()
if err := i.updateConsul(context.Background()); err != nil {
level.Error(i.logger).Log("msg", "failed to write to the KV store, sleeping", "ring", i.RingName, "err", err)
}
}

Something like:

func (i *Lifecycler) heartbeat(ctx context.Context) {
	i.lifecyclerMetrics.consulHeartbeats.Inc()
	ctx, _ = context.WithTimeout(ctx, i.cfg.HeartbeatPeriod)
	if err := i.updateConsul(context.Background()); err != nil {
		level.Error(i.logger).Log("msg", "failed to write to the KV store, sleeping", "ring", i.RingName, "err", err)
	}
}

This would make sure we always start a new heartbeat?

Looking where the go routine got stuck, this would fix the problem:

https://github.com/golang/go/blob/7529d09a11496a77ccbffe245607fbd256200991/src/net/http/transport.go#L2734-L2738

I think if we do that we dont need any extra flag at all and this would work for all kv client

@yeya24
Copy link
Contributor

yeya24 commented Sep 17, 2024

Looks good. Let's update the PR title and description based on the latest change.

@anna-tran anna-tran changed the title Add timeout on http requests for kv store DDB client Add timeout on lifecycler heartbeat Sep 17, 2024
@yeya24 yeya24 merged commit b4c37ce into cortexproject:master Sep 17, 2024
16 checks passed
@anna-tran anna-tran deleted the kv-ddb-timeout branch September 17, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingester stops Ring CAS when lifecycler hangs on query to KV store
3 participants