-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop Update Status When a Finalizer is Missing #84
Stop Update Status When a Finalizer is Missing #84
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: razo7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9aff69a
to
e133467
Compare
… missing When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon.
e133467
to
9df5951
Compare
/test 4.13-openshift-e2e |
@@ -247,6 +247,11 @@ func isTimedOutByNHC(far *v1alpha1.FenceAgentsRemediation) bool { | |||
|
|||
// updateStatus updates the CR status, and returns an error if it fails | |||
func (r *FenceAgentsRemediationReconciler) updateStatus(ctx context.Context, far *v1alpha1.FenceAgentsRemediation) error { | |||
// When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is no finalizer, but the conditions are not "True"? Could we still get this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug that we faced was a race condition between two operators - when NHC removed far CR, and in the meantime, the reconcile loop was finished and tried to update the CR status. The suggested fix will prevent updateStatus from updating the status of CR at the end of the reconcile loop when some conditions values have been met (remediation is completed) and a finalizer was removed (it has to be created beforehand based on the values of the conditions).
What if there is no finalizer,
There is no finalizer in two scenarios:
- Finalizer couldn't be created once we have reached this function - due to NHC timeout or far CR name being invalid.
- Or it was created and deleted already - therefore, the node was fully remediated.
Could we still get this error?
I don't think so, but can you see a way/chance when we should still see this kind of error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then
/lgtm
Giving others a chance to review as well, feel free to unhold
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the check should be:
if deletionTimestamp exists + finalizer doesn't exist -> skip status update
, because that means the CR is about to be garbage collected.
The conditions don't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds even better using deletionTimestamp
.
/retest |
@@ -247,6 +247,11 @@ func isTimedOutByNHC(far *v1alpha1.FenceAgentsRemediation) bool { | |||
|
|||
// updateStatus updates the CR status, and returns an error if it fails | |||
func (r *FenceAgentsRemediationReconciler) updateStatus(ctx context.Context, far *v1alpha1.FenceAgentsRemediation) error { | |||
// When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the check should be:
if deletionTimestamp exists + finalizer doesn't exist -> skip status update
, because that means the CR is about to be garbage collected.
The conditions don't matter.
Cleaner condition using deletionTimestamp ratherthan CR conditions status
fdf30a6
to
a046ac6
Compare
/lgtm |
/retest |
/unhold |
When FAR doesn't include a finalizer then there is no need to update the status, since it will be removed soon.
Should fix ECOPROJECT-1590