Add a strategy for taints and tolerations #131

ravisantoshgudimetla · 2019-01-15T03:53:23Z

Recently one of the users requested a strategy for taints and tolerations. While I don't have cycles to work on this, I would be more than happy to review if anyone in the community is interested to work on.

paktek123 · 2019-03-18T11:45:19Z

any more details, please expand, I would be interested in contributing

chadswen · 2019-03-18T20:19:04Z

@paktek123 Take a look at this issue for a use case currently implemented with Draino that could be replaced by descheduler: kubernetes/node-problem-detector#199 (comment)

aveshagarwal · 2019-03-19T19:04:54Z

any more details, please expand, I would be interested in contributing

Lets say a pod was scheduled on a node where there is mismatch between pod's toleration and node's taints, and it could be due to that the pod was scheduled on a node whose taints were later updated. Since, pod's tolerations were only checked at the time admission, and later node's taints were updated, pod continued running on the node, so it only applies for NoSchedule taints.

In summary, it would work as follows:

get a list of nods (which already exists)
get a list of pods on each node (which already exists)
Verify that the node's taints (NoSchedule) are still satisfied by its pods' tolerations.
As per previous step, If not satisfied, evict the pod.
If yes (still satisfied), continue checking other pods until you have reached the end of list of pods for that node.
Repeat above for all nodes.

paktek123 · 2019-03-26T12:13:52Z

Thanks for the explanation, I will hopefully try to contribute next week

warmchang · 2019-04-11T14:56:45Z

Nice idea! It is also very useful for customizing the migration time of POD.

fejta-bot · 2019-07-10T15:52:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-08-09T16:39:09Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-09-08T17:36:36Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-09-08T17:36:44Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

damemi · 2019-10-09T19:19:54Z

/reopen

k8s-ci-robot · 2019-10-09T19:19:56Z

@damemi: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2019-11-08T19:28:45Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-11-08T19:28:53Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

isindir · 2021-06-10T14:36:16Z

/reopen

k8s-ci-robot · 2021-06-10T14:36:19Z

@isindir: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

seanmalloy · 2021-08-20T05:07:57Z

/reopen

k8s-ci-robot · 2021-08-20T05:08:07Z

@seanmalloy: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

seanmalloy · 2021-08-20T05:08:20Z

/remove-lifecycle rotten
/kind feature

pravarag · 2021-08-21T07:52:23Z

@seanmalloy @damemi looks like this issue hasn't been picked up in long time. Is it okay if I give it a try? will post my queries here if any.

/assign

StevenACoffman · 2021-08-21T14:38:00Z

Please do!

pravarag · 2021-08-23T09:56:24Z

@damemi @aveshagarwal one query around this description mentioned here, Are we looking to add a new check in api itself like somewhere here: https://github.com/kubernetes-sigs/descheduler/tree/master/pkg/descheduler/node or this feature is going to be a new change w.r.t command line options as well maybe here: https://github.com/kubernetes-sigs/descheduler/tree/master/cmd/descheduler/app ?

damemi · 2021-08-23T15:04:25Z

I'm not actually sure why this issue is still open, I might have mistakenly reopened it when it went stale... we have a Taints/Tolerations strategy that was merged out of this issue (https://github.com/kubernetes-sigs/descheduler#removepodsviolatingnodetaints)

Is there something that strategy is missing from the discussion here? Or can we close this?

StevenACoffman · 2021-08-23T15:51:49Z

Just checking, but is it now possible to:

Detect permanent node problems and set Node Conditions using the Node Problem Detector and the scheduler's TaintNodesByCondition functionality.
Configure Descheduler to deschedule pods based on taints to cordon and drain nodes when they exhibit the NPD's KernelDeadlock condition, or a variant of KernelDeadlock we call VolumeTaskHung.
Let the Cluster Autoscaler scale down underutilised nodes, including the nodes Descheduler has drained.

If so, is there an example?

damemi · 2021-08-23T16:10:11Z

@StevenACoffman yes, the descheduler will evict any pods that are currently running on a node that has any NoSchedule taint that the pods do not tolerate. So, your use case should work with the RemovePodsViolatingNodeTaints strategy

pravarag · 2021-08-24T01:54:13Z

Thanks @damemi for clarifying, I guess this issue will be closed now 🙂

damemi · 2021-08-24T02:49:15Z

/close

k8s-ci-robot · 2021-08-24T02:49:22Z

@damemi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ravisantoshgudimetla added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 15, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 10, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 9, 2019

swatisehgal mentioned this issue Aug 12, 2019

Strategy to consider taints and tolerations in Descheduler #175

Merged

k8s-ci-robot closed this as completed Sep 8, 2019

k8s-ci-robot reopened this Oct 9, 2019

k8s-ci-robot closed this as completed Nov 8, 2019

k8s-ci-robot reopened this Aug 20, 2021

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 20, 2021

k8s-ci-robot assigned pravarag Aug 21, 2021

k8s-ci-robot closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a strategy for taints and tolerations #131

Add a strategy for taints and tolerations #131

ravisantoshgudimetla commented Jan 15, 2019

paktek123 commented Mar 18, 2019 •

edited

Loading

chadswen commented Mar 18, 2019

aveshagarwal commented Mar 19, 2019

paktek123 commented Mar 26, 2019

warmchang commented Apr 11, 2019

fejta-bot commented Jul 10, 2019

fejta-bot commented Aug 9, 2019

fejta-bot commented Sep 8, 2019

k8s-ci-robot commented Sep 8, 2019

damemi commented Oct 9, 2019

k8s-ci-robot commented Oct 9, 2019

fejta-bot commented Nov 8, 2019

k8s-ci-robot commented Nov 8, 2019

isindir commented Jun 10, 2021

k8s-ci-robot commented Jun 10, 2021

seanmalloy commented Aug 20, 2021

k8s-ci-robot commented Aug 20, 2021

seanmalloy commented Aug 20, 2021

pravarag commented Aug 21, 2021

StevenACoffman commented Aug 21, 2021

pravarag commented Aug 23, 2021

damemi commented Aug 23, 2021

StevenACoffman commented Aug 23, 2021

damemi commented Aug 23, 2021

pravarag commented Aug 24, 2021

damemi commented Aug 24, 2021

k8s-ci-robot commented Aug 24, 2021

Add a strategy for taints and tolerations #131

Add a strategy for taints and tolerations #131

Comments

ravisantoshgudimetla commented Jan 15, 2019

paktek123 commented Mar 18, 2019 • edited Loading

chadswen commented Mar 18, 2019

aveshagarwal commented Mar 19, 2019

paktek123 commented Mar 26, 2019

warmchang commented Apr 11, 2019

fejta-bot commented Jul 10, 2019

fejta-bot commented Aug 9, 2019

fejta-bot commented Sep 8, 2019

k8s-ci-robot commented Sep 8, 2019

damemi commented Oct 9, 2019

k8s-ci-robot commented Oct 9, 2019

fejta-bot commented Nov 8, 2019

k8s-ci-robot commented Nov 8, 2019

isindir commented Jun 10, 2021

k8s-ci-robot commented Jun 10, 2021

seanmalloy commented Aug 20, 2021

k8s-ci-robot commented Aug 20, 2021

seanmalloy commented Aug 20, 2021

pravarag commented Aug 21, 2021

StevenACoffman commented Aug 21, 2021

pravarag commented Aug 23, 2021

damemi commented Aug 23, 2021

StevenACoffman commented Aug 23, 2021

damemi commented Aug 23, 2021

pravarag commented Aug 24, 2021

damemi commented Aug 24, 2021

k8s-ci-robot commented Aug 24, 2021

paktek123 commented Mar 18, 2019 •

edited

Loading