-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect Node taints with tolerations (if exist) #181
Respect Node taints with tolerations (if exist) #181
Conversation
Welcome @jw-s! |
/assign @aveshagarwal |
d90d4bd
to
a19cd22
Compare
canBeEvicted := true | ||
taintNodeLoop: | ||
for _, taintsForNode := range taintsOfNodes { | ||
if len(pod.Spec.Tolerations) < len(taintsForNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is taintsForNode
guaranteed to be always duplicate items free?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The k8s api rejects taints with the same key and same effect
So technically you can have a slice of Taint
with same key but different effect values. Which means a pod needs to either tolerate this key with Exists
value or have two tolerations with same key but different effect values i.e NoSchedule
and NoExecute
@@ -270,6 +278,35 @@ func evictPods(inputPods []*v1.Pod, | |||
} | |||
} | |||
|
|||
func podToleratesTaints(pod *v1.Pod, taintsOfNodes [][]v1.Taint) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is podToleratesTaints
supposed to return true
if there is at least one node that can tolerate the pod? If not, can you more clarify on what the function is supposed to be doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly what the function does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just wanted to be sure. Given podToleratesTaints
got quite shortened and it uses helperUtil.TolerationsTolerateTaintsWithFilter
, excessive testing of podToleratesTaints
is optional now.
I also checked all the unit tests under lownodeutilization_test.go
and only individual functions are tested:
- evictPodsFromTargetNodes
- sortPodsBasedOnPriority
- validateThresholds
Go coverage shows the main LowNodeUtilization
function is not tested at all. So it's hard to see how your code will interact with it. I would rather refactor/extend the unit tests first so some portion of the tests is ran through LowNodeUtilization
so we can see how individual bits interacts with each other from the public API perspective.
@ravisantoshgudimetla what's your PoV?
a19cd22
to
996e774
Compare
@@ -233,13 +236,19 @@ func evictPods(inputPods []*v1.Pod, | |||
totalCpu *float64, | |||
totalMem *float64, | |||
podsEvicted *int, | |||
dryRun bool, maxPodsToEvict int) { | |||
dryRun bool, maxPodsToEvict int, taintsOfLowNodes [][]v1.Taint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point it's hard to say which taints belongs to which node. Can you replace the list of list with a map of lists? So later in the code podToleratesTaints
can print a list of nodes whose taints are not tolerated by the pod with e.g. V(5)
verbosity.
@@ -270,6 +278,35 @@ func evictPods(inputPods []*v1.Pod, | |||
} | |||
} | |||
|
|||
func podToleratesTaints(pod *v1.Pod, taintsOfNodes [][]v1.Taint) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just wanted to be sure. Given podToleratesTaints
got quite shortened and it uses helperUtil.TolerationsTolerateTaintsWithFilter
, excessive testing of podToleratesTaints
is optional now.
I also checked all the unit tests under lownodeutilization_test.go
and only individual functions are tested:
- evictPodsFromTargetNodes
- sortPodsBasedOnPriority
- validateThresholds
Go coverage shows the main LowNodeUtilization
function is not tested at all. So it's hard to see how your code will interact with it. I would rather refactor/extend the unit tests first so some portion of the tests is ran through LowNodeUtilization
so we can see how individual bits interacts with each other from the public API perspective.
@ravisantoshgudimetla what's your PoV?
@jw-s I managed to put together this to test with func newFake(objects ...runtime.Object) *core.Fake {
scheme := runtime.NewScheme()
codecs := serializer.NewCodecFactory(scheme)
fake.AddToScheme(scheme)
o := core.NewObjectTracker(scheme, codecs.UniversalDecoder())
for _, obj := range objects {
if err := o.Add(obj); err != nil {
panic(err)
}
}
fakePtr := core.Fake{}
fakePtr.AddReactor("list", "pods", func(action core.Action) (bool, runtime.Object, error) {
objs, err := o.List(
schema.GroupVersionResource{Group: "", Version: "v1", Resource: "pods"},
schema.GroupVersionKind{Group: "", Version: "v1", Kind: "Pod"},
action.GetNamespace(),
)
if err != nil {
return true, nil, err
}
obj := &v1.PodList{
Items: []v1.Pod{},
}
for _, pod := range objs.(*v1.PodList).Items {
podFieldSet := fields.Set(map[string]string{
"spec.nodeName": pod.Spec.NodeName,
"status.phase": string(pod.Status.Phase),
})
match := action.(core.ListAction).GetListRestrictions().Fields.Matches(podFieldSet)
if !match {
continue
}
obj.Items = append(obj.Items, *pod.DeepCopy())
}
return true, obj, nil
})
fakePtr.AddReactor("*", "*", core.ObjectReaction(o))
fakePtr.AddWatchReactor("*", core.DefaultWatchReactor(watch.NewFake(), nil))
return &fakePtr
}
func TestWithTaints(t *testing.T) {
strategy := api.DeschedulerStrategy{
Enabled: true,
Params: api.StrategyParameters{
NodeResourceUtilizationThresholds: api.NodeResourceUtilizationThresholds{
Thresholds: api.ResourceThresholds{
v1.ResourcePods: 0.2,
},
TargetThresholds: api.ResourceThresholds{
v1.ResourcePods: 0.7,
},
},
},
}
n1 := test.BuildTestNode("n1", 1000, 3000, 10)
n2 := test.BuildTestNode("n2", 1000, 3000, 10)
n3 := test.BuildTestNode("n3", 1000, 3000, 10)
n3withTaints := n3.DeepCopy()
n3withTaints.Spec.Taints = []v1.Taint{
{
Key: "key",
Value: "value",
Effect: v1.TaintEffectNoSchedule,
},
}
tests := []struct {
nodes []*v1.Node
evictionsExpected int
}{
{
nodes: []*v1.Node{n1, n2, n3},
evictionsExpected: 1,
},
{
nodes: []*v1.Node{n1, n2, n3withTaints},
evictionsExpected: 0,
},
}
for _, item := range tests {
var objs []runtime.Object
for _, node := range item.nodes {
objs = append(objs, node)
}
for i := 0; i < 4; i++ {
// populate the first node
p := test.BuildTestPod(fmt.Sprintf("p1_%v", i), 200, 0, n1.Name)
p.ObjectMeta.OwnerReferences = test.GetReplicaSetOwnerRefList()
objs = append(objs, p)
// populate the second node
if i < 3 {
p = test.BuildTestPod(fmt.Sprintf("p2_%v", i), 200, 0, n2.Name)
p.ObjectMeta.OwnerReferences = test.GetReplicaSetOwnerRefList()
objs = append(objs, p)
}
}
fakePtr := newFake(objs...)
var evictionCounter int
fakePtr.PrependReactor("post", "pods", func(action core.Action) (bool, runtime.Object, error) {
act, ok := action.(core.GetActionImpl)
if !ok || act.Subresource != "eviction" || act.Resource.Resource != "pods" {
return false, nil, nil
}
evictionCounter++
return true, nil, nil
})
fakeClient := &fake.Clientset{Fake: *fakePtr}
ds := &options.DeschedulerServer{
Client: fakeClient,
DeschedulerConfiguration: componentconfig.DeschedulerConfiguration{
EvictLocalStoragePods: false,
},
}
nodePodCount := InitializeNodePodCount(item.nodes)
LowNodeUtilization(ds, strategy, "policy/v1", item.nodes, nodePodCount)
if item.evictionsExpected != evictionCounter {
t.Errorf("Expected %v evictions, got %v", item.evictionsExpected, evictionCounter)
}
}
} With new imports:
I am pretty sure the code can be more optimized. |
@ingvagabund I will add this test early next week! |
46dcf7d
to
7d462b8
Compare
@ingvagabund I've made the changes however the build is failing due to linting, |
7d462b8
to
5bf958a
Compare
5bf958a
to
bfc365e
Compare
bfc365e
to
3cbfde6
Compare
/retest |
@jw-s: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -232,15 +237,22 @@ func evictPods(inputPods []*v1.Pod, | |||
totalCPU *float64, | |||
totalMem *float64, | |||
podsEvicted *int, | |||
dryRun bool, maxPodsToEvict int) { | |||
dryRun bool, maxPodsToEvict int, taintsOfLowNodes map[string][]v1.Taint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each parameter of evictPods
lies on its own line. Please, do the same for taintsOfLowNodes
.
@@ -269,6 +281,21 @@ func evictPods(inputPods []*v1.Pod, | |||
} | |||
} | |||
|
|||
// podToleratesTaints returns true if a pod tolerates one node's taints | |||
func podToleratesTaints(pod *v1.Pod, taintsOfNodes map[string][]v1.Taint) bool { | |||
canBeEvicted := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit though this way it's shorter and easier to read given it's sufficient to find first node where the pod tolerates the taints:
for nodeName, taintsForNode := range taintsOfNodes {
if len(pod.Spec.Tolerations) >= len(taintsForNode) && helperUtil.TolerationsTolerateTaintsWithFilter(pod.Spec.Tolerations, taintsForNode, nil) {
return true
}
klog.V(5).Infof("pod: %#v doesn't tolerate node %s's taints", pod.Name, nodeName)
}
return false
4f3a837
to
cddcd24
Compare
Since enabling go modules by default we can no longer import anything from diff --git a/pkg/descheduler/strategies/lownodeutilization.go b/pkg/descheduler/strategies/lownodeutilization.go
index 85d3415..69cbeb0 100644
--- a/pkg/descheduler/strategies/lownodeutilization.go
+++ b/pkg/descheduler/strategies/lownodeutilization.go
@@ -24,7 +24,6 @@ import (
clientset "k8s.io/client-go/kubernetes"
"k8s.io/klog"
- helperUtil "k8s.io/kubernetes/pkg/apis/core/v1/helper"
"sigs.k8s.io/descheduler/cmd/descheduler/app/options"
"sigs.k8s.io/descheduler/pkg/api"
"sigs.k8s.io/descheduler/pkg/descheduler/evictions"
@@ -285,7 +284,7 @@ func evictPods(inputPods []*v1.Pod,
// podToleratesTaints returns true if a pod tolerates one node's taints
func podToleratesTaints(pod *v1.Pod, taintsOfNodes map[string][]v1.Taint) bool {
for nodeName, taintsForNode := range taintsOfNodes {
- if len(pod.Spec.Tolerations) >= len(taintsForNode) && helperUtil.TolerationsTolerateTaintsWithFilter(pod.Spec.Tolerations, taintsForNode, nil) {
+ if len(pod.Spec.Tolerations) >= len(taintsForNode) && utils.TolerationsTolerateTaintsWithFilter(pod.Spec.Tolerations, taintsForNode, nil) {
return true
}
klog.V(5).Infof("pod: %#v doesn't tolerate node %s's taints", pod.Name, nodeName)
diff --git a/pkg/descheduler/strategies/lownodeutilization_test.go b/pkg/descheduler/strategies/lownodeutilization_test.go
index 9b15961..bab1e4a 100644
--- a/pkg/descheduler/strategies/lownodeutilization_test.go
+++ b/pkg/descheduler/strategies/lownodeutilization_test.go
@@ -499,7 +499,7 @@ func TestWithTaints(t *testing.T) {
},
}
- nodePodCount := InitializeNodePodCount(item.nodes)
+ nodePodCount := utils.InitializeNodePodCount(item.nodes)
LowNodeUtilization(ds, strategy, "policy/v1", item.nodes, nodePodCount)
if item.evictionsExpected != evictionCounter {
diff --git a/pkg/utils/pod.go b/pkg/utils/pod.go
index 4eafdca..4f33e60 100644
--- a/pkg/utils/pod.go
+++ b/pkg/utils/pod.go
@@ -2,6 +2,7 @@ package utils
import (
"fmt"
+
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
utilfeature "k8s.io/apiserver/pkg/util/feature"
@@ -179,3 +180,49 @@ func maxResourceList(list, new v1.ResourceList) {
}
}
}
+
+// TolerationsTolerateTaint checks if taint is tolerated by any of the tolerations.
+func TolerationsTolerateTaint(tolerations []v1.Toleration, taint *v1.Taint) bool {
+ for i := range tolerations {
+ if tolerations[i].ToleratesTaint(taint) {
+ return true
+ }
+ }
+ return false
+}
+
+type taintsFilterFunc func(*v1.Taint) bool
+
+// TolerationsTolerateTaintsWithFilter checks if given tolerations tolerates
+// all the taints that apply to the filter in given taint list.
+// DEPRECATED: Please use FindMatchingUntoleratedTaint instead.
+func TolerationsTolerateTaintsWithFilter(tolerations []v1.Toleration, taints []v1.Taint, applyFilter taintsFilterFunc) bool {
+ _, isUntolerated := FindMatchingUntoleratedTaint(taints, tolerations, applyFilter)
+ return !isUntolerated
+}
+
+// FindMatchingUntoleratedTaint checks if the given tolerations tolerates
+// all the filtered taints, and returns the first taint without a toleration
+func FindMatchingUntoleratedTaint(taints []v1.Taint, tolerations []v1.Toleration, inclusionFilter taintsFilterFunc) (v1.Taint, bool) {
+ filteredTaints := getFilteredTaints(taints, inclusionFilter)
+ for _, taint := range filteredTaints {
+ if !TolerationsTolerateTaint(tolerations, &taint) {
+ return taint, true
+ }
+ }
+ return v1.Taint{}, false
+}
+
+func getFilteredTaints(taints []v1.Taint, inclusionFilter taintsFilterFunc) []v1.Taint {
+ if inclusionFilter == nil {
+ return taints
+ }
+ filteredTaints := []v1.Taint{}
+ for _, taint := range taints {
+ if !inclusionFilter(&taint) {
+ continue
+ }
+ filteredTaints = append(filteredTaints, taint)
+ }
+ return filteredTaints
+} |
@jw-s sorry this PR review takes so long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also help to read the tests more easily and see individual cases:
diff --git a/pkg/descheduler/strategies/lownodeutilization_test.go b/pkg/descheduler/strategies/lownodeutilization_test.go
index 9b15961..9da1512 100644
--- a/pkg/descheduler/strategies/lownodeutilization_test.go
+++ b/pkg/descheduler/strategies/lownodeutilization_test.go
@@ -393,7 +393,7 @@ func TestWithTaints(t *testing.T) {
},
}
- n1 := test.BuildTestNode("n1", 1000, 3000, 10)
+ n1 := test.BuildTestNode("n1", 2000, 3000, 10)
n2 := test.BuildTestNode("n2", 1000, 3000, 10)
n3 := test.BuildTestNode("n3", 1000, 3000, 10)
n3withTaints := n3.DeepCopy()
@@ -414,11 +414,13 @@ func TestWithTaints(t *testing.T) {
}
tests := []struct {
+ name string
nodes []*v1.Node
pods []*v1.Pod
evictionsExpected int
}{
{
+ name: "No taints",
nodes: []*v1.Node{n1, n2, n3},
pods: []*v1.Pod{
//Node 1 pods
@@ -436,6 +438,7 @@ func TestWithTaints(t *testing.T) {
evictionsExpected: 1,
},
{
+ name: "Taint that is not tolerated",
nodes: []*v1.Node{n1, n3withTaints},
pods: []*v1.Pod{
//Node 1 pods
@@ -453,6 +456,7 @@ func TestWithTaints(t *testing.T) {
evictionsExpected: 0,
},
{
+ name: "Taint that is tolerated",
nodes: []*v1.Node{n1, n3withTaints},
pods: []*v1.Pod{
//Node 1 pods
@@ -472,38 +476,40 @@ func TestWithTaints(t *testing.T) {
}
for _, item := range tests {
- var objs []runtime.Object
- for _, node := range item.nodes {
- objs = append(objs, node)
- }
-
- for _, pod := range item.pods {
- pod.ObjectMeta.OwnerReferences = test.GetReplicaSetOwnerRefList()
- objs = append(objs, pod)
- }
+ t.Run(item.name, func(t *testing.T) {
+ var objs []runtime.Object
+ for _, node := range item.nodes {
+ objs = append(objs, node)
+ }
- fakePtr := newFake(objs...)
- var evictionCounter int
- fakePtr.PrependReactor("create", "pods", func(action core.Action) (bool, runtime.Object, error) {
- if action.GetSubresource() != "eviction" || action.GetResource().Resource != "pods" {
- return false, nil, nil
+ for _, pod := range item.pods {
+ pod.ObjectMeta.OwnerReferences = test.GetReplicaSetOwnerRefList()
+ objs = append(objs, pod)
}
- evictionCounter++
- return true, nil, nil
- })
- ds := &options.DeschedulerServer{
- Client: &fake.Clientset{Fake: *fakePtr},
- DeschedulerConfiguration: componentconfig.DeschedulerConfiguration{
- EvictLocalStoragePods: false,
- },
- }
+ fakePtr := newFake(objs...)
+ var evictionCounter int
+ fakePtr.PrependReactor("create", "pods", func(action core.Action) (bool, runtime.Object, error) {
+ if action.GetSubresource() != "eviction" || action.GetResource().Resource != "pods" {
+ return false, nil, nil
+ }
+ evictionCounter++
+ return true, nil, nil
+ })
- nodePodCount := InitializeNodePodCount(item.nodes)
- LowNodeUtilization(ds, strategy, "policy/v1", item.nodes, nodePodCount)
+ ds := &options.DeschedulerServer{
+ Client: &fake.Clientset{Fake: *fakePtr},
+ DeschedulerConfiguration: componentconfig.DeschedulerConfiguration{
+ EvictLocalStoragePods: false,
+ },
+ }
- if item.evictionsExpected != evictionCounter {
- t.Errorf("Expected %v evictions, got %v", item.evictionsExpected, evictionCounter)
- }
+ nodePodCount := utils.InitializeNodePodCount(item.nodes)
+ LowNodeUtilization(ds, strategy, "policy/v1", item.nodes, nodePodCount)
+
+ if item.evictionsExpected != evictionCounter {
+ t.Errorf("Expected %v evictions, got %v", item.evictionsExpected, evictionCounter)
+ }
+ })
}
}
@aveshagarwal @damemi this is a good candidate for merging |
cddcd24
to
4757132
Compare
No worries! I've made the changes you suggested. |
will review it today. |
/lgtm |
@aveshagarwal ping |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aveshagarwal, jw-s The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating
…rations Respect Node taints with tolerations (if exist)
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating
If a node has a taint and is under-utilised, the descheduler will evict pods based on qos, priority etc but doesn't check if these "suitable" pods tolerate this taint, thus evicting and goes into an infinite loop of "reshuffling". This PR adds this check.
If multiple Nodes are under-utilised and both have taints, all the descheduler cares about if pods can tolerate taints of one of the nodes as then the default scheduler will schedule it onto one of the nodes.