-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support and Handle Taints #53
Support and Handle Taints #53
Conversation
fa23382
to
0303d6b
Compare
/retest |
1cf6e76
to
2286224
Compare
903326b
to
7149fb8
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks much better now 👍
however, some important comments inside
// taintExists checks if the given taint exists in list of taints. Returns true if exists false otherwise. | ||
func taintExists(taints []corev1.Taint, taintToFind *corev1.Taint) bool { | ||
for _, taint := range taints { | ||
if taint.MatchTaint(taintToFind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up, this doesn't check for equal taint value!
Maybe it's a good idea to use unique taint keys per operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's a good idea to use unique taint keys per operator.
I agree, and the MatchTaint
func declaration supports that:
// MatchTaint checks if the taint matches taintToMatch. Taints are unique by key:effect,
// if the two taints have same key:effect, regard as they match.
How about "medik8s.io/far-remediation"
for FAR, and probably SNR should change it's key to include it's name, so maybe "medik8s.io/snr-remediation"
.
CC: @mshitrit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following this understanding and previous review do you still support the decision to have a value for the taint? I am less pro for having for the value now, but I don't have a strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have a key including the remediator, an empty value is ok.
I would prefer the full written remediator name though, not just the abbreviation. This is visible information on the node, others should not have to guess what far
or snr
mean IMHO.
/cc @mshitrit fyi
@slintes: GitHub didn't allow me to request PR reviews from the following users: fyi. Note that only medik8s members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
44ebd0b
to
2c50034
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good to me now, just one comment inline.
Please add tests for the finalizer and taint 🙂
Declaring remediation taint with NoExecute effect to drain pods and not schedule them again on the unhealthy node
Remove node taint when a CR is about to be deleted using a finalizer
We need to add two new rbac rules: delete, and patch for adding and removing taint on a cluster node
The E2E test is redundant after medik8s#48 PR, and it also fails here when we try to add/remove taint for a node which doesn't exist - dummy-node
Apparently a taint is considered unique by key:effect, reagrdless of the taint's value - if the two taints share the same key:effect. then are considered equal. We modify the taint's key to be unique and not similar to SNR's remediation taint
The API server is updating taints as well while nodes get healthy/unhealthy, so patch can overwrite changes made by API server
When CR/node is invalid. then there is no need to add/remove finalizer and taints
Test it on valid and invalid CRs
2c50034
to
c18caf9
Compare
Share variables and lowercase var name
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit inside, leaving it up to you if you want to address it.
/hold
Eventually(func() bool { | ||
Expect(k8sClient.Get(context.Background(), nodeNamespacedName, node)).To(Succeed()) | ||
Expect(k8sClient.Get(context.Background(), farNamespacedName, underTestFAR)).To(Succeed()) | ||
res, _ := cliCommandsEquality(underTestFAR) | ||
return controllerutil.ContainsFinalizer(underTestFAR, v1alpha1.FARFinalizer) && utils.TaintExists(node.Spec.Taints, &FARNoExecuteTaint) && res | ||
return controllerutil.ContainsFinalizer(underTestFAR, v1alpha1.FARFinalizer) && utils.TaintExists(node.Spec.Taints, &farNoExecuteTaint) && res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doing two tests (finalizer and taint) at the same time has the disadvantage that you don't know what's missing in case it fails. Better test everything on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and I will update it in a follow up PR
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: razo7, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
In the process of supporting workload deletion we are supporting the Medik8s remediation taint (with FAR name value), and NoExecute effect:
delete
, andupdate
) - It is needed for adding and removing taint on a cluster node.ECOPROJECT-1321 and ECOPROJECT-1322