-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(appset): Retry on conflict when updating status #19663
fix(appset): Retry on conflict when updating status #19663
Conversation
# Context: When updating the status of the applicationset object it can happen that it fails due to a conflict since the resourceVersion has changed due to a different update. This makes the reconcile fails and we need to wait until the following reconcile loop until it updates the relevant status fields and hope that the update calls don't fail again due a conflict. It can even happen that it gets stuck constantly due to this erriors. A better approach I would say is retrying when there is a conflict error with the newest version of the object, so we make sure we update the object with the latest version always. This has been raised in issue argoproj#19535 that failing due to conflicts can make the reconcile not able to proceed. # What does this PR? - Wraps all the `Update().Status` calls inside a retry function that will retry when the update fails due a conflict. - Adds appset to fake client subresources, if not the client can not correctly determine the status subresource. Refer to: kubernetes-sigs/controller-runtime#2386, and kubernetes-sigs/controller-runtime#2362. Signed-off-by: Carlos Rejano <[email protected]>
❌ Preview Environment deleted from BunnyshellAvailable commands (reply to this comment):
|
❌ Preview Environment deleted from BunnyshellAvailable commands (reply to this comment):
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19663 +/- ##
=========================================
Coverage ? 55.83%
=========================================
Files ? 320
Lines ? 44251
Branches ? 0
=========================================
Hits ? 24709
Misses ? 16980
Partials ? 2562 ☔ View full report in Codecov by Sentry. |
if err := r.Get(ctx, namespacedName, applicationSet); err != nil { | ||
if client.IgnoreNotFound(err) != nil { | ||
return nil | ||
err := retry.RetryOnConflict(retry.DefaultRetry, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you note down how many times it's going to retry and for how long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, in this case it's using this backOff defined:
var DefaultRetry = wait.Backoff{
Steps: 5,
Duration: 10 * time.Millisecond,
Factor: 1.0,
Jitter: 0.1,
}
We would be doing around 5 calls every 10 ms.
Should I add it as a comment in the code? Or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's add it as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Context:
When updating the status of the applicationset object it can happen
that it fails due to a conflict since the resourceVersion has changed
because of a previous update. This makes the reconcile fails and
we need to wait until the following reconcile loop updates the
relevant status fields and hope that the update calls don't fail again
due to a conflict. It can even happen that it gets stuck constantly due
to these erriors.
A better approach I would say is retrying when there is a conflict
error with the newest version of the object, so we make sure we update
the object with the latest version always.
This has been raised in issue #19535 that failing due to conflicts can
make the reconciliation not able to proceed.
What does this PR?
Update().Status
calls inside a retry function that will retry when the update fails due to a conflict.