Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Remediation When NHC Timed Out Annotation Exists #72

Merged
merged 3 commits into from
Aug 1, 2023

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Jul 31, 2023

Stop remediation when NHC timed out annotation exists

ECOPROJECT-1485

The annotation is used by Node Healthcheck Operator to stop the remediaiton for escalating remediation scenario
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 31, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@razo7
Copy link
Member Author

razo7 commented Jul 31, 2023

/test 4.13-openshift-e2e

@@ -129,6 +132,13 @@ func (r *FenceAgentsRemediationReconciler) Reconcile(ctx context.Context, req ct
return emptyResult, nil
}

// Check NHC timeout annotation
if isTimedOutByNHC(far) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go before the if block above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can

if isTimedOutByNHC(far) {
r.Log.Info("FAR remediation was stopped by Node Healthcheck Operator")
// TODO: update status and return its error
return emptyResult, errors.New(errorNhcTimedOut)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why returning an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that if we wouldn't return an error, then on the next reconcile the CR will be processed and will pass this if on the way to execute FA and delete workloads. But thinking about it again, the CR will be stopped here until the NHC annotation would be removed from the CR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CR will be stopped here until the NHC annotation would be removed from the CR

exactly, and removing the annotation will trigger a reconcile automatically, no need to requeue ourself 🙂

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

putting a hold on it to let you decide if you want this one to be merged first, or the condition PR 🙂

/hold

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Member Author

razo7 commented Jul 31, 2023

I prefer that whatever can be merged will be merged. There is no precedence between them.

/unhold

@razo7
Copy link
Member Author

razo7 commented Jul 31, 2023

/retest

1 similar comment
@razo7
Copy link
Member Author

razo7 commented Aug 1, 2023

/retest

@openshift-merge-robot openshift-merge-robot merged commit 65bb045 into medik8s:main Aug 1, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants