-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log error in remediation #128
Log error in remediation #128
Conversation
Signed-off-by: Michael Shitrit <[email protected]>
Signed-off-by: Michael Shitrit <[email protected]>
6d9dbaa
to
5fe43e6
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mshitrit The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test ? |
@mshitrit: The following commands are available to trigger required jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test 4.15-openshift-e2e |
if err := utils.RemoveTaint(r.Client, far.Name, taint); err != nil { | ||
if apiErrors.IsConflict(err) { | ||
r.Log.Info("Failed to remove taint from node due to node update, retrying... ,", "node name", node.Name, "taint key", taint.Key, "taint effect", taint.Effect) | ||
return ctrl.Result{RequeueAfter: time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why requeue after one second and not immediately afterwards since this conflict should not happen multiple times, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving the update a bit of time to finish, no point in stressing the system and spamming the log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how frequently does it happen and will it happen one after another multiple times if we don't wait? In case you happen to reproduce this race error.
My only concern is just with the interfering of exponential back-off that we "lose" when we set the exact time to requeue. Not sure if it is relevant though, as this back-off would happen in case of an earlier error that is probably unrelated to removing taints.
Up to you, waiting for one second is still a reasonable time for fast FAR execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in case we want to utilize the exponential back-off mechanism we must return an error which will also be logged as such.
IIUC it is something we want to avoid (logging an error) since this behavior is not uncommon and expected .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requeue: True
also results in rate limited requeue
https://github.com/kubernetes-sigs/controller-runtime/blob/56159419231e985c091ef3e7a8a3dee40ddf1d73/pkg/internal/controller/controller.go#L341
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requeue: True also results in rate limited requeue
I agree this is what I originally meant. But IIRC Michael suspects that this error is recurrent (I didn't see a proof for that), it seems more like a temporary error to me.
sed -r -i "s|createdAt: .*|createdAt: \"\"|;" ${BUNDLE_CSV} | ||
|
||
.PHONY: full-gen | ||
full-gen: go-verify manifests generate manifests fmt bundle fix-imports bundle-reset ## generates all automatically generated content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: redundant space in manifests generate
/lgtm |
ECOPROJECT-1505