-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a nodeSelector strategy #311
Add a nodeSelector strategy #311
Conversation
Presently there is only the node affinity strategy which checks node label constraints, while in practice, pods may be constrained by either of the affinity or nodeSelector terms. We therefore add a new strategy that carries out label checking directly on the nodeSelector terms. For anyone wishing a comprehensive eviction strategy based off of sudden label non-existence, this should be used together with the nodeAffinity strategy to ensure that both cases are caught.
Hi @pmundt. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pmundt The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Note that I've reworked this so it has no dependencies on any of the other outstanding PRs. |
/hold |
The default behaviour is to not deschedule a Pod if it cannot be placed anywhere else, this is fine for the default case where nodes are more or less homogeneous, but it does not suit the case of heterogeneous clusters where specific nodes may have their own unique hardware configurations that are not be reproduced anywhere else within the cluster. In the case of heterogeneous accelerators, for example, there may be pods that have a hard dependency on a specific resource (e.g. including container runtimes geared at a specific accelerator), where allowing them to continue running would produce undesired and unpredictable behaviour. Consider the case of a Pod with a hard dependency on a USB-attached accelerator, which may disappear during the lifecycle of the Pod.
9f56016
to
1a01970
Compare
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmundt overall this looks good, I'm happy with the approach we settled on and think we found some good spots to refactor along the way. Thanks for being responsive and accepting our feedback.
I just had one nit, that the new param struct can probably just be called NodeSelection
(since it's implied that they are all settings). Then I noticed there were some spots where you had mixed up NodeSelection/NodeSelector so I tried to point out all of those for you. If the tests all pass then it looks good to me
/cc @ingvagabund
@damemi Thanks for the feedback, and good spotting on the typos, all of these issues should now be addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmundt thanks, I don't see anything else that stands out, but I'll give some of the other reviewers a chance to look over this too before merging too quick.
/kind feature
/kind api-change
@damemi similar to my comment here #314 (comment). Do we need to bump |
Co-authored-by: Sean Malloy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider merging RemovePodsViolatingNodeSelector
and RemovePodsViolatingNodeAffinity
into a single strategy on the code level? E.g. RemovePodsViolatingNodeConstraints
? Given both strategies share nodeSelection
.
You also mention:
As both
nodeSelector
andnodeAffinity
provide mechanisms for constraining pods to nodes with specific labels,
it is recommended to use both eviction strategies when scanning for pods to evict on a label change basis.
What about to also deprecate RemovePodsViolatingNodeAffinity
and allow to have:
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
"RemovePodsViolatingNodeConstraints":
enabled: true
params:
degradationAllowed: true
respectNodeSelector: true // or a different name for the flag
nodeAffinityType:
- "requiredDuringSchedulingIgnoredDuringExecution"
Once you allow degradation for RemovePodsViolatingNodeSelector
, what's the benefit of disallowing it for RemovePodsViolatingNodeAffinity
? Otherwise, disruption should be allowed for either both or none.
@@ -34,7 +34,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter | |||
klog.V(1).Infof("NodeAffinityType not set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if strategy.Params == nil || strategy.Params.NodeSelection == nil
@@ -49,7 +49,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter | |||
|
|||
for _, pod := range pods { | |||
if pod.Spec.Affinity != nil && pod.Spec.Affinity.NodeAffinity != nil && pod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil { | |||
if !nodeutil.PodFitsCurrentNode(pod, node) && nodeutil.PodFitsAnyNode(pod, nodes) { | |||
if !nodeutil.PodFitsCurrentNode(pod, node) && (nodeutil.PodFitsAnyNode(pod, nodes) || strategy.Params.NodeSelection.DegradationAllowed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(strategy.Params.NodeSelection.DegradationAllowed || nodeutil.PodFitsAnyNode(pod, nodes))
so nodeutil.PodFitsAnyNode
does not have to be called at all when strategy.Params.NodeSelection.DegradationAllowed
is true. nodeutil.PodFitsAnyNode
might eventually become expensive to compute.
DegradationAllowed: false, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, do not change strategy.Params
inside any strategy to avoid side effects.
var degradationAllowed bool
if strategy.Params != nil && strategy.Params.NodeSelection != nil {
degradationAllowed = strategy.Params.NodeSelection.DegradationAllowed
}
for _, node := range nodes {
klog.V(1).Infof("Processing node: %#v\n", node.Name)
...
if !nodeutil.PodFitsCurrentNode(pod, node) && (degradationAllowed || nodeutil.PodFitsAnyNode(pod, nodes)) {
...
}, | ||
{ | ||
description: "Pod is scheduled on node without matching labels, another schedulable node available, maxPodsToEvict set to 1, should not be evicted", | ||
expectedEvictedPodCount: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pod is scheduled on node without matching labels, another schedulable node available, maxPodsToEvict set to 1, should not be evicted
contradicts expectedEvictedPodCount
set to 1
.
Commented in the other thread too, but for reference here I agree. Though I think this can be merged to v1alpha1 (since this is an alpha api it can break any time) and we bump the version number in #314 for our 1.19 release. I also think @ingvagabund makes a good point in #311 (review), these 2 strategies are so similar that I wonder if NodeSelector really needs to be its own strategy? Or could we just add |
I agree that it would be better to have just one strategy for this. |
@pmundt: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@pmundt this PR requires a rebase and there is also a merge conflict. Do you intend to finish this PR? |
@seanmalloy My apologies, I haven't had any time to look at this for the last few weeks. I've just now had the opportunity to come back to this, and will now try to merge the strategies and address the remaining review comments. |
@pmundt no problem. Thanks! |
Greetings @pmundt we just completed the descheduler v0.19.0(k8s v1.19) release cycle. We are starting to work on the features for the descheduler v0.20.0(k8s v1.20.0) release. Are you planning on continuing to work on this feature enhancement? |
@pmundt: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Presently there is only the node affinity strategy which checks node
label constraints, while in practice, pods may be constrained by either
of the affinity or nodeSelector terms. We therefore add a new strategy
that carries out label checking directly on the nodeSelector terms.
For anyone wishing a comprehensive eviction strategy based off of sudden
label non-existence, this should be used together with the nodeAffinity
strategy to ensure that both cases are caught.