Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a nodeSelector strategy #311

Closed

Conversation

pmundt
Copy link

@pmundt pmundt commented Jun 3, 2020

Presently there is only the node affinity strategy which checks node
label constraints, while in practice, pods may be constrained by either
of the affinity or nodeSelector terms. We therefore add a new strategy
that carries out label checking directly on the nodeSelector terms.

For anyone wishing a comprehensive eviction strategy based off of sudden
label non-existence, this should be used together with the nodeAffinity
strategy to ensure that both cases are caught.

Presently there is only the node affinity strategy which checks node
label constraints, while in practice, pods may be constrained by either
of the affinity or nodeSelector terms. We therefore add a new strategy
that carries out label checking directly on the nodeSelector terms.

For anyone wishing a comprehensive eviction strategy based off of sudden
label non-existence, this should be used together with the nodeAffinity
strategy to ensure that both cases are caught.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 3, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @pmundt. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 3, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pmundt
To complete the pull request process, please assign ravisantoshgudimetla
You can assign the PR to them by writing /assign @ravisantoshgudimetla in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pmundt
Copy link
Author

pmundt commented Jun 3, 2020

Note that I've reworked this so it has no dependencies on any of the other outstanding PRs.

@pmundt
Copy link
Author

pmundt commented Jun 3, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2020
The default behaviour is to not deschedule a Pod if it cannot be placed
anywhere else, this is fine for the default case where nodes are more
or less homogeneous, but it does not suit the case of heterogeneous
clusters where specific nodes may have their own unique hardware
configurations that are not be reproduced anywhere else within the
cluster.

In the case of heterogeneous accelerators, for example, there may be
pods that have a hard dependency on a specific resource (e.g. including
container runtimes geared at a specific accelerator), where allowing
them to continue running would produce undesired and unpredictable
behaviour. Consider the case of a Pod with a hard dependency on a
USB-attached accelerator, which may disappear during the lifecycle of
the Pod.
@pmundt
Copy link
Author

pmundt commented Jun 3, 2020

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2020
Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 3, 2020
@pmundt
Copy link
Author

pmundt commented Jun 3, 2020

/retest

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmundt overall this looks good, I'm happy with the approach we settled on and think we found some good spots to refactor along the way. Thanks for being responsive and accepting our feedback.

I just had one nit, that the new param struct can probably just be called NodeSelection (since it's implied that they are all settings). Then I noticed there were some spots where you had mixed up NodeSelection/NodeSelector so I tried to point out all of those for you. If the tests all pass then it looks good to me

/cc @ingvagabund

README.md Show resolved Hide resolved
pkg/api/types.go Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
examples/node-affinity.yml Show resolved Hide resolved
examples/node-affinity.yml Outdated Show resolved Hide resolved
@pmundt
Copy link
Author

pmundt commented Jun 3, 2020

@damemi Thanks for the feedback, and good spotting on the typos, all of these issues should now be addressed.

@pmundt pmundt requested a review from damemi June 3, 2020 18:32
Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmundt thanks, I don't see anything else that stands out, but I'll give some of the other reviewers a chance to look over this too before merging too quick.

/kind feature
/kind api-change

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jun 3, 2020
README.md Outdated Show resolved Hide resolved
@seanmalloy
Copy link
Member

@damemi similar to my comment here #314 (comment). Do we need to bump apiVersion to descheduler/v1alpha2?

Co-authored-by: Sean Malloy <[email protected]>
Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider merging RemovePodsViolatingNodeSelector and RemovePodsViolatingNodeAffinity into a single strategy on the code level? E.g. RemovePodsViolatingNodeConstraints? Given both strategies share nodeSelection.

You also mention:

As both nodeSelector and nodeAffinity provide mechanisms for constraining pods to nodes with specific labels,
it is recommended to use both eviction strategies when scanning for pods to evict on a label change basis.

What about to also deprecate RemovePodsViolatingNodeAffinity and allow to have:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemovePodsViolatingNodeConstraints":
    enabled: true
    params:
      degradationAllowed: true
      respectNodeSelector: true   // or a different name for the flag
      nodeAffinityType:
        - "requiredDuringSchedulingIgnoredDuringExecution"

Once you allow degradation for RemovePodsViolatingNodeSelector, what's the benefit of disallowing it for RemovePodsViolatingNodeAffinity? Otherwise, disruption should be allowed for either both or none.

examples/node-affinity.yml Show resolved Hide resolved
@@ -34,7 +34,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter
klog.V(1).Infof("NodeAffinityType not set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if strategy.Params == nil || strategy.Params.NodeSelection == nil

@@ -49,7 +49,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter

for _, pod := range pods {
if pod.Spec.Affinity != nil && pod.Spec.Affinity.NodeAffinity != nil && pod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
if !nodeutil.PodFitsCurrentNode(pod, node) && nodeutil.PodFitsAnyNode(pod, nodes) {
if !nodeutil.PodFitsCurrentNode(pod, node) && (nodeutil.PodFitsAnyNode(pod, nodes) || strategy.Params.NodeSelection.DegradationAllowed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(strategy.Params.NodeSelection.DegradationAllowed || nodeutil.PodFitsAnyNode(pod, nodes)) so nodeutil.PodFitsAnyNode does not have to be called at all when strategy.Params.NodeSelection.DegradationAllowed is true. nodeutil.PodFitsAnyNode might eventually become expensive to compute.

DegradationAllowed: false,
},
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do not change strategy.Params inside any strategy to avoid side effects.

var degradationAllowed bool
if strategy.Params != nil && strategy.Params.NodeSelection != nil {
	degradationAllowed = strategy.Params.NodeSelection.DegradationAllowed
}

for _, node := range nodes {
	klog.V(1).Infof("Processing node: %#v\n", node.Name)
	...
			if !nodeutil.PodFitsCurrentNode(pod, node) && (degradationAllowed || nodeutil.PodFitsAnyNode(pod, nodes)) {
			...

},
{
description: "Pod is scheduled on node without matching labels, another schedulable node available, maxPodsToEvict set to 1, should not be evicted",
expectedEvictedPodCount: 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod is scheduled on node without matching labels, another schedulable node available, maxPodsToEvict set to 1, should not be evicted contradicts expectedEvictedPodCount set to 1.

@damemi
Copy link
Contributor

damemi commented Jun 4, 2020

@damemi similar to my comment here #314 (comment). Do we need to bump apiVersion to descheduler/v1alpha2?

Commented in the other thread too, but for reference here I agree. Though I think this can be merged to v1alpha1 (since this is an alpha api it can break any time) and we bump the version number in #314 for our 1.19 release.

I also think @ingvagabund makes a good point in #311 (review), these 2 strategies are so similar that I wonder if NodeSelector really needs to be its own strategy? Or could we just add pod.Spec.NodeSelector != nil to the list of checks in NodeAffinity and rename the strategy to RemovePodsViolatingNodeConstraints. @pmundt what do you think?

@aveshagarwal
Copy link
Contributor

I also think @ingvagabund makes a good point in #311 (review), these 2 strategies are so similar that I wonder if NodeSelector really needs to be its own strategy?

I agree that it would be better to have just one strategy for this.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 8, 2020
@k8s-ci-robot
Copy link
Contributor

@pmundt: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@seanmalloy
Copy link
Member

@pmundt this PR requires a rebase and there is also a merge conflict. Do you intend to finish this PR?

@pmundt
Copy link
Author

pmundt commented Jun 25, 2020

@seanmalloy My apologies, I haven't had any time to look at this for the last few weeks. I've just now had the opportunity to come back to this, and will now try to merge the strategies and address the remaining review comments.

@seanmalloy
Copy link
Member

@pmundt no problem. Thanks!

@seanmalloy
Copy link
Member

Greetings @pmundt we just completed the descheduler v0.19.0(k8s v1.19) release cycle. We are starting to work on the features for the descheduler v0.20.0(k8s v1.20.0) release. Are you planning on continuing to work on this feature enhancement?

@k8s-ci-robot
Copy link
Contributor

@pmundt: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-descheduler-verify-master 5a90733 link /test pull-descheduler-verify-master
pull-descheduler-unit-test-master-master 5a90733 link /test pull-descheduler-unit-test-master-master
pull-descheduler-verify-build-master 5a90733 link /test pull-descheduler-verify-build-master
pull-descheduler-test-e2e-k8s-release-1-17-1-17 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-17-1-17
pull-descheduler-test-e2e-k8s-release-1-18-1-18 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-18-1-18
pull-descheduler-test-e2e-k8s-master-1-18 5a90733 link /test pull-descheduler-test-e2e-k8s-master-1-18
pull-descheduler-test-e2e-k8s-master-1-16 5a90733 link /test pull-descheduler-test-e2e-k8s-master-1-16
pull-descheduler-test-e2e-k8s-release-1-17-1-15 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-17-1-15
pull-descheduler-test-e2e-k8s-release-1-17-1-16 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-17-1-16
pull-descheduler-test-e2e-k8s-release-1-18-1-16 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-18-1-16
pull-descheduler-test-e2e-k8s-master-1-17 5a90733 link /test pull-descheduler-test-e2e-k8s-master-1-17
pull-descheduler-test-e2e-k8s-release-1-18-1-17 5a90733 link /test pull-descheduler-test-e2e-k8s-release-1-18-1-17
pull-descheduler-test-e2e-k8s-master-1-19 5a90733 link /test pull-descheduler-test-e2e-k8s-master-1-19
pull-descheduler-test-e2e-k8s-master-1-20 5a90733 link /test pull-descheduler-test-e2e-k8s-master-1-20

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 8, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants