Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Update Status When a Finalizer is Missing #84

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Sep 28, 2023

When FAR doesn't include a finalizer then there is no need to update the status, since it will be removed soon.

Should fix ECOPROJECT-1590

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 28, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7 razo7 force-pushed the stop-update-status-no-finalizer branch from 9aff69a to e133467 Compare September 28, 2023 11:21
… missing

When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon.
@razo7 razo7 force-pushed the stop-update-status-no-finalizer branch from e133467 to 9df5951 Compare September 28, 2023 13:19
@razo7
Copy link
Member Author

razo7 commented Sep 28, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@@ -247,6 +247,11 @@ func isTimedOutByNHC(far *v1alpha1.FenceAgentsRemediation) bool {

// updateStatus updates the CR status, and returns an error if it fails
func (r *FenceAgentsRemediationReconciler) updateStatus(ctx context.Context, far *v1alpha1.FenceAgentsRemediation) error {
// When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is no finalizer, but the conditions are not "True"? Could we still get this error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug that we faced was a race condition between two operators - when NHC removed far CR, and in the meantime, the reconcile loop was finished and tried to update the CR status. The suggested fix will prevent updateStatus from updating the status of CR at the end of the reconcile loop when some conditions values have been met (remediation is completed) and a finalizer was removed (it has to be created beforehand based on the values of the conditions).

What if there is no finalizer,

There is no finalizer in two scenarios:

  1. Finalizer couldn't be created once we have reached this function - due to NHC timeout or far CR name being invalid.
  2. Or it was created and deleted already - therefore, the node was fully remediated.

Could we still get this error?

I don't think so, but can you see a way/chance when we should still see this kind of error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then

/lgtm
Giving others a chance to review as well, feel free to unhold
/hold

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the check should be:
if deletionTimestamp exists + finalizer doesn't exist -> skip status update, because that means the CR is about to be garbage collected.
The conditions don't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds even better using deletionTimestamp.

@razo7
Copy link
Member Author

razo7 commented Oct 2, 2023

/retest

@@ -247,6 +247,11 @@ func isTimedOutByNHC(far *v1alpha1.FenceAgentsRemediation) bool {

// updateStatus updates the CR status, and returns an error if it fails
func (r *FenceAgentsRemediationReconciler) updateStatus(ctx context.Context, far *v1alpha1.FenceAgentsRemediation) error {
// When FAR doesn't include a finalizer after successful remediation then there is no need to update the status, since it will be removed soon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the check should be:
if deletionTimestamp exists + finalizer doesn't exist -> skip status update, because that means the CR is about to be garbage collected.
The conditions don't matter.

@openshift-ci openshift-ci bot removed the lgtm label Oct 4, 2023
Cleaner condition using deletionTimestamp ratherthan CR conditions status
@razo7 razo7 force-pushed the stop-update-status-no-finalizer branch from fdf30a6 to a046ac6 Compare October 4, 2023 15:50
@slintes
Copy link
Member

slintes commented Oct 4, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Oct 4, 2023
@slintes
Copy link
Member

slintes commented Oct 4, 2023

/retest

@razo7
Copy link
Member Author

razo7 commented Oct 5, 2023

/unhold

@openshift-ci openshift-ci bot merged commit b2d3419 into medik8s:main Oct 5, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants