Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and Handle Taints #53

Merged
merged 9 commits into from
Jun 19, 2023

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Jun 7, 2023

In the process of supporting workload deletion we are supporting the Medik8s remediation taint (with FAR name value), and NoExecute effect:

  • Adding taint support - Specifically supporting the Medik8s remediation taint, and possibly in the future more taints.
  • Support the taint removal with a Finalizer
  • Add two RBAC verbs on node resource (delete, and update) - It is needed for adding and removing taint on a cluster node.
  • Remove redundant dummy E2E test
  • Addition/removal of taint or finalizer will only be handled after Node-CR validation
  • UT for valid and invalid Node-CR

ECOPROJECT-1321 and ECOPROJECT-1322

@openshift-ci openshift-ci bot requested review from clobrano and slintes June 7, 2023 14:23
@openshift-ci openshift-ci bot added the approved label Jun 7, 2023
@razo7 razo7 force-pushed the remediation-NoExecute-taint branch 3 times, most recently from fa23382 to 0303d6b Compare June 8, 2023 15:26
@razo7
Copy link
Member Author

razo7 commented Jun 11, 2023

/retest

@razo7 razo7 changed the title [WIP] Support and Handle Taints Support and Handle Taints Jun 11, 2023
pkg/utils/taints.go Outdated Show resolved Hide resolved
pkg/utils/nodes.go Outdated Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
pkg/utils/nodes.go Outdated Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
@razo7 razo7 force-pushed the remediation-NoExecute-taint branch from 1cf6e76 to 2286224 Compare June 12, 2023 13:16
pkg/utils/taints.go Outdated Show resolved Hide resolved
@razo7 razo7 force-pushed the remediation-NoExecute-taint branch from 903326b to 7149fb8 Compare June 14, 2023 14:40
@razo7
Copy link
Member Author

razo7 commented Jun 15, 2023

/retest

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better now 👍
however, some important comments inside

pkg/utils/nodes.go Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
// taintExists checks if the given taint exists in list of taints. Returns true if exists false otherwise.
func taintExists(taints []corev1.Taint, taintToFind *corev1.Taint) bool {
for _, taint := range taints {
if taint.MatchTaint(taintToFind) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up, this doesn't check for equal taint value!
Maybe it's a good idea to use unique taint keys per operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a good idea to use unique taint keys per operator.

I agree, and the MatchTaint func declaration supports that:

// MatchTaint checks if the taint matches taintToMatch. Taints are unique by key:effect,
// if the two taints have same key:effect, regard as they match.

How about "medik8s.io/far-remediation" for FAR, and probably SNR should change it's key to include it's name, so maybe "medik8s.io/snr-remediation".
CC: @mshitrit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following this understanding and previous review do you still support the decision to have a value for the taint? I am less pro for having for the value now, but I don't have a strong preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have a key including the remediator, an empty value is ok.
I would prefer the full written remediator name though, not just the abbreviation. This is visible information on the node, others should not have to guess what far or snr mean IMHO.

/cc @mshitrit fyi

pkg/utils/taints.go Show resolved Hide resolved
pkg/utils/taints.go Outdated Show resolved Hide resolved
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 15, 2023

@slintes: GitHub didn't allow me to request PR reviews from the following users: fyi.

Note that only medik8s members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

When we have a key including the remediator, an empty value is ok.
I would prefer the full written remediator name though, not just the abbreviation. This is visible information on the node, others should not have to guess what far or snr mean IMHO.

/cc @mshitrit fyi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@razo7 razo7 force-pushed the remediation-NoExecute-taint branch 2 times, most recently from 44ebd0b to 2c50034 Compare June 15, 2023 14:13
@razo7
Copy link
Member Author

razo7 commented Jun 15, 2023

/retest

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good to me now, just one comment inline.
Please add tests for the finalizer and taint 🙂

Declaring remediation taint with NoExecute effect to drain pods and not schedule them again on the unhealthy node
Remove node taint when a CR is about to be deleted using a finalizer
We need to add two new rbac rules: delete, and patch for adding and removing taint on a cluster node
The E2E test is redundant after medik8s#48 PR, and it also fails here when we try to add/remove taint for a node which doesn't exist - dummy-node
Apparently a taint is considered unique by key:effect, reagrdless of the taint's value -  if the two taints share the same key:effect. then are considered equal. We modify the taint's key to be unique and not similar to SNR's remediation taint
The API server is updating taints as well while nodes get healthy/unhealthy, so patch can overwrite changes made by API server
When CR/node is invalid. then there is no need to add/remove finalizer and taints
Test it on valid and invalid CRs
@razo7 razo7 force-pushed the remediation-NoExecute-taint branch from 2c50034 to c18caf9 Compare June 18, 2023 15:45
Share variables and lowercase var name
@razo7
Copy link
Member Author

razo7 commented Jun 19, 2023

/retest

1 similar comment
@razo7
Copy link
Member Author

razo7 commented Jun 19, 2023

/retest

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit inside, leaving it up to you if you want to address it.

/hold

Eventually(func() bool {
Expect(k8sClient.Get(context.Background(), nodeNamespacedName, node)).To(Succeed())
Expect(k8sClient.Get(context.Background(), farNamespacedName, underTestFAR)).To(Succeed())
res, _ := cliCommandsEquality(underTestFAR)
return controllerutil.ContainsFinalizer(underTestFAR, v1alpha1.FARFinalizer) && utils.TaintExists(node.Spec.Taints, &FARNoExecuteTaint) && res
return controllerutil.ContainsFinalizer(underTestFAR, v1alpha1.FARFinalizer) && utils.TaintExists(node.Spec.Taints, &farNoExecuteTaint) && res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doing two tests (finalizer and taint) at the same time has the disadvantage that you don't know what's missing in case it fails. Better test everything on its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and I will update it in a follow up PR

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Member Author

razo7 commented Jun 19, 2023

/unhold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants