-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Status Conditions for FAR CR #69
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: razo7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use conditions
Some resources in the v1 API contain fields called phase, and associated message, reason, and other status fields. The pattern of using phase is deprecated. Newer API types should use conditions instead. Phase was essentially a state-machine enumeration field, that contradicted system-design principles and hampered evolution, since adding new enum values breaks backward compatibility. Rather than encouraging clients to infer implicit properties from phases, we prefer to explicitly expose the individual conditions that clients need to monitor. Conditions also have the benefit that it is possible to create some conditions with uniform meaning across all resource types, while still exposing others that are unique to specific resource types. See #7856 for more details and discussion.
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
/test 4.12-openshift-e2e |
/test 4.13-openshift-e2e |
|
||
const ( | ||
// RemediationStarted - CR was found, its name matches a node, and a finalizer was set | ||
RemediationStarted ProcessingChangeReason = "remediationStarted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasons should be CamelCase
|
||
// Represents the observations of a FenceAgentsRemediation's current state. | ||
// Known .status.conditions.type are: "Processing", "FenceAgentActionSucceeded", and "Succeeded". | ||
// +patchMergeKey=type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patch* doesn't work on CRDs, remove please
r.Log.Error(err, "Fence Agent response wasn't a success message", "CR's Name", req.Name) | ||
return emptyResult, err | ||
|
||
if meta.IsStatusConditionPresentAndEqual(far.Status.Conditions, commonConditions.ProcessingType, metav1.ConditionTrue) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMHO the check for the processing condition should be done earlier
- a better check for the FASucceeded condition would be != True. Then you can set the condition to False in case execution failed.
- I think the code which prepares command execution can be moved from above inside this if block? (getting the pod, building params...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a better check for the FASucceeded condition would be != True.
👍🏻
I think the code which prepares command execution can be moved from above inside this if block? (getting the pod, building params..
I though of that, and I went with the current implementation in order to limit the code sections which are affected by the status conditions. Having said that do you still see a greater value of adding the suggested code under the if block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you can set the condition to False in case execution failed.
ATM there is no ProcessingChangeReason for this use case. But I might add something for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that do you still see a greater value of adding the suggested code under the if block?
yes, why executing all that code when it's not used 🤷🏼♂️
ATM there is no ProcessingChangeReason
tbh, I dislike this "one reason for updating all conditions" pattern anyway, and this is why...
} | ||
r.Log.Info("FenceAgentsRemediation CR has completed to remediate the node", "Node Name", req.Name) | ||
|
||
return ctrl.Result{Requeue: true}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to requeue here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Can be discarded
// +listType=map | ||
// +listMapKey=type | ||
// +operator-sdk:csv:customresourcedefinitions:type=status,displayName="conditions",xDescriptors="urn:alm:descriptor:io.kubernetes.conditions" | ||
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"` | ||
Conditions []metav1.Condition `json:"conditions,omitempty" protobuf:"bytes,1,rep,name=conditions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed in the first review: why the protobuf
tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have followed the example of conditions
from https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties, and it seems like the example is already adding this protobuf anyway in the CSV description. So I will delete it from
fenceagentsremediation_types.go
to align with how we set conditions
in other operators (without the protobuf tag), e.g. NHC's conditions .
Conditions []metav1.Condition
json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doc is primarily targeting core k8s types. So in general it's good to follow its recommendations, but not everything applies to CRDs as well 🙂
/test 4.13-openshift-e2e |
r.Log.Error(err, "Invalid sharedParameters/nodeParameters from CR - edit/recreate the CR", "CR's Name", req.Name) | ||
return emptyResult, nil | ||
} | ||
// Add FAR (medik8s) remediation taint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the taint be in this block as well? 🤔
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
|
||
var ( | ||
processingConditionStatus, fenceAgentActionSucceededConditionStatus, succeededConditionStatus metav1.ConditionStatus | ||
conditinMessage string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: conditionMessage
const ( | ||
// FARFinalizer is a finalizer for a FenceAgentsRemediation CR deletion | ||
FARFinalizer string = "fence-agents-remediation.medik8s.io/far-finalizer" | ||
// Taints | ||
FARNoExecuteTaintKey = "medik8s.io/fence-agents-remediation" | ||
// FenceAgentActionSucceededType is the condition type used to signal whether the Fence Agent action was succeeded successfully or not | ||
FenceAgentActionSucceededType = "FenceAgentActionSucceeded" | ||
// error status messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these errors are used for tests only, correct? They shouldn't be in the api package then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved them to errors.go
file
log.Info("Pod exist", "pod", podName) | ||
} | ||
} | ||
|
||
// verifyStatusCondition checks whether the status condition is set with the expected value | ||
func verifyStatusCondition(testFAR *v1alpha1.FenceAgentsRemediation, conditionType, expectedResult string, conditionStatus metav1.ConditionStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block on this, but using this "expectedResult" string is strange at least. Type and status params should be enough for the tests, not? Either they match or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I added the expectedResult
so the Eventually can "catch" an expected error, e.g., in unit-test when the conditions haven't been set.
I have changed the name to verifyExpectedStatusConditionError
as it better captures the essence of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could make conditionStatus
a pointer and use nil
, if you want to test that a condition isn't set 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the test would fail, since eventually will expect utils.ConditionSetAndMatchSuccess
while verifyStatusCondition/verifyExpectedStatusConditionError would return utils.ConditionSetButNoMatchError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works: slintes@55781ea?diff=unified
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
Processing, FenceAgentActionSucceeded, and Succeeded status conditions have been added to verify FAR remediation status
Processing, FenceAgentActionSucceeded, and Succeeded status conditions help exclude the FA execution and workload deletion sections from every reconcile call. It would help us avoid any second (and more) remediation
Unit tests and e2e tests have been added to verify the expected behaviour by looking whether the status conditions have been set and what's their value
/test 4.13-openshift-e2e |
/retest |
oh, CI missed to add the label /lgtm |
Adding three status conditions (
Processing
,FenceAgentActionSucceeded
, andSucceeded
) to help FAR in two ways:Moreover, each status condition includes a reason and a message based on ProcessingChangeReason that changed the condition value.
The PR also updates reconcile structure :
ECOPROJECT-1411
ECOPROJECT-1484