Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Two Reasons for Updating and Stopping Remediation #76

Merged

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Aug 14, 2023

  • Adding two new reasons in order to update the CR three conditions to false:
    - RemediationInterruptedByNHC - FAR CR name doesn't match a node name
    - RemediationFinishedNodeNotFound - Node Healthcheck timeout annotation has been set
  • Don't stop an update of condition from false to true.

When either FAR CR name doesn't match a node name or Node Healthcheck timeout annotation has been set we update the CR three conditions to false
@razo7
Copy link
Member Author

razo7 commented Aug 14, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

RemediationStoppedByNHC -> RemediationInterruptedByNHC. NHC is signalling a timeout that interrupts the remediation and it does not stopping/terminating a remediation. FAR does that when it notice the annotation.
@razo7 razo7 changed the title Add Two Reasons for Updating and Stoping Remediation Add Two Reasons for Updating and Stopping Remediation Aug 15, 2023
RemediationStartedConditionMessage = "FAR CR was found, the CR name matches one of the cluster nodes, and a finalizer was set"
FenceAgentSucceededConditionMessage = "FAR taint was added, fence agent command has been created and executed successfully"
RemediationFinishedConditionMessage = "The unhealthy node was fully remediated (it was tainted, fenced using FA and all the node resources have been deleted)"
RemediationInterruptedConditionMessage = "FAR CR name doesn't match a node name or Node Healthcheck timeout annotation has been set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one reason for actual 2 reasons isn't that helpful for users, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I neglected the value of separating the messages.

…mediationInterruptedByNHC

Clearer message for the users
@razo7
Copy link
Member Author

razo7 commented Aug 15, 2023

/retest

3 similar comments
@razo7
Copy link
Member Author

razo7 commented Aug 15, 2023

/retest

@razo7
Copy link
Member Author

razo7 commented Aug 15, 2023

/retest

@razo7
Copy link
Member Author

razo7 commented Aug 15, 2023

/retest

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nits...

@@ -200,9 +200,9 @@ var _ = Describe("FAR Controller", func() {
testPodDeletion(testPodName, resourceDeletionWasTriggered)

By("Not having any condition set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message does not fit anymore

processingConditionStatus = metav1.ConditionFalse
fenceAgentActionSucceededConditionStatus = metav1.ConditionFalse
succeededConditionStatus = metav1.ConditionFalse
if reason == v1alpha1.RemediationFinishedNodeNotFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO a switch makes more sense here, which errors when a reason isn't handled. Prevents forgetting this part when adding another reason to the case statement above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

It helps us forgetting this part when adding another reason to the case statement above.
@@ -166,7 +166,7 @@ var _ = Describe("FAR Controller", func() {
testVADeletion(vaName2, resourceDeletionWasTriggered)
testPodDeletion(testPodName, resourceDeletionWasTriggered)

By("Having Succeed condition set to true")
By("Having Succeeded, FenceAgentActionSucceeded conditions set to true, and Processing set to false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a general note: I would avoid being too detailed in the these comments, and use something more generic like "Verifying correct conditions". The details are very readable in the actual test code, and in the test logs in case a test fails. That way you don't need to update the comment ever time the test changes 🙂

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Member Author

razo7 commented Aug 16, 2023

/retest

1 similar comment
@razo7
Copy link
Member Author

razo7 commented Aug 16, 2023

/retest

@openshift-merge-robot openshift-merge-robot merged commit 200c062 into medik8s:main Aug 16, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants