Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POD Label Selector evaluation to be taken into account to decide POD eviction #163

Closed
VF-mbrauer opened this issue Jun 11, 2019 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@VF-mbrauer
Copy link

It would be very helpful if we could decide upon a label which PODs are allowed to be evicted to an another node, rather than any POD on a node where the Threshold are met.
So this would give more flexibility. Imagine a Cluster consists of nonPROD and PROD PODs.
We could control that the nonPROD ones will be moved first or only.
Critical POD definition is not an option to mark PODS as important in order no to be evicted.
This will bring possibly other side effects with the normal scheduling of the nodes, etc.

@aveshagarwal
Copy link
Contributor

Regarding differentiation among pods which to evict, more than label selectors, i'd say that should be done based on their priority. In your scenario, nonPROD pods should be getting lower priority than PROD ones. And descheduler should evict lower priority pods first. We already have eviction based on priority here: https://github.com/kubernetes-incubator/descheduler/blob/master/pkg/descheduler/strategies/lownodeutilization.go#L198 .

So you could try assigning low priority to your nonProd pods than Prod pods, see if descheduler does what you are expecting.

Regarding eviction based on label selectors, (though one simple way could be that having a a script that chooses pods based on their label selectors and then deletes them), I'd like to understand, what are benefits or use cases you have in mind that can not be achieved via priorities, and we can discuss them here so that we understand adding eviction based on label selectors makes sense.

@VF-mbrauer
Copy link
Author

Thanks for your answer on this matter.

Yes, we took also in consideration to use the priority option, to flag the PODs in order to evict according to this. But we think that could have a drawback (risk) also as setting this flag in the Deployments will also affect the normal scheduler and probably other sections, which are consuming this option/flag. Therefore in case of Node Shutdown, or any other activities this prior-flag will always play a role inside the whole cluster scheduling processes.

An another use-case of having a Label-Selector could be, that we just mark the ones which we would like to evict as part of the descheduler process. And this independent of the Priority and any other relationship dependency. So it could be a special test or staging environment and there we just want to have some of them handled.
I think Priority is an important flag, but not for this use-case of re-shuffling PODs due to resource re-balancing.

So do you see that as an option to integrate the Label-Selector as a further decision criteria?
Thanks a lot.

@aveshagarwal
Copy link
Contributor

Thanks for your answer on this matter.

Yes, we took also in consideration to use the priority option, to flag the PODs in order to evict according to this. But we think that could have a drawback (risk) also as setting this flag in the Deployments will also affect the normal scheduler and probably other sections, which are consuming this option/flag. Therefore in case of Node Shutdown, or any other activities this prior-flag will always play a role inside the whole cluster scheduling processes.

An another use-case of having a Label-Selector could be, that we just mark the ones which we would like to evict as part of the descheduler process. And this independent of the Priority and any other relationship dependency. So it could be a special test or staging environment and there we just want to have some of them handled.

Descheduler evicts pods when it sees some sort of imbalance in a cluster and is reasonably hopeful that evicted pods would be moved to other nodes by the default scheduler to reduce the imbalance. Just evicting pods based on some labels in itself might not be helpful from Descheduler's point of view unless there is some imbalance. So If you mean, (and it seems based on your original comment) that in case of low node utilization strategy (where some nodes are much more utilized than other other nodes), you want to first evict pods with some labels (which is prioritization that you want to evict pods with certain labels first).

Some issues I think that It could lead to priority inversion where pods with higher priority with certain labels could be evicted first, and might have some bad effects in a cluster because the default scheduler might kill some low priority pods to make room for this high priority pod and descheduler and the default scheduler could go in a loop too.

Also, it seems that you dont to want evict all pods (which could have been evicted by descheduler) but "some pods with certain labels" , which is a bit different because in general, descheduler determines a set of pods and tries to evict all of them rather some of them. Thats why it seems more like use case where running a script to evict those certain pods could be achieved easily.

That said, IMHO, i only see very marginal use case of the scenario you described in this issue from deschduler's point of view. So to clarify further, let me ask you:

Are you trying to evict only some pods with certain labels for LowNodeUtilization strategy? That means you want descheduler to stop as soon as all pods with that certain labels have been evicted?

If yes, you could create a PR, and lets see how it looks. But again we should be aware of the issues it might cause from descheduler's point of view and the use case seems only marginally helpful to me.

I think Priority is an important flag, but not for this use-case of re-shuffling PODs due to resource re-balancing.

So do you see that as an option to integrate the Label-Selector as a further decision criteria?
Thanks a lot.

@VF-mbrauer
Copy link
Author

Descheduler evicts pods when it sees some sort of imbalance in a cluster and is reasonably hopeful that evicted pods would be moved to other nodes by the default scheduler to reduce the imbalance. Just evicting pods based on some labels in itself might not be helpful from Descheduler's point of view unless there is some imbalance. So If you mean, (and it seems based on your original comment) that in case of low node utilization strategy (where some nodes are much more utilized than other other nodes), you want to first evict pods with some labels (which is prioritization that you want to evict pods with certain labels first).

Yes, that's correct. It could be configurable with 3 different options.
So the Label-Function could be set as MODE1 or MODE2 or MODE3

  • When MODE1, Descheduler will work as it does already today.
  • When MODE2 the Descheduler will try to move the Labeled PODs first and then it would NOT continue with other PODs which are NOT labeled.
  • When MODE3 the Descheduler will try to move the LabeledPODs first and then if there is still room to move further PODs it will continue with the other PODs also.

Some issues I think that It could lead to priority inversion where pods with higher priority with certain labels could be evicted first, and might have some bad effects in a cluster because the default scheduler might kill some low priority pods to make room for this high priority pod and descheduler and the default scheduler could go in a loop too.

It should not destroy the current concept of Descheduler. So it should still respect the priority-flag for certain PODs. So it will not influence here as the Label-Selector will just help to fine-tune. It basically just filters the PODs independent of its priority. So the order should be that Descheduler will check first the Priority options and then for the ones where it decides to do something with them it will check the second option of the label selector.

Also, it seems that you dont to want evict all pods (which could have been evicted by descheduler) but "some pods with certain labels" , which is a bit different because in general, descheduler determines a set of pods and tries to evict all of them rather some of them. Thats why it seems more like use case where running a script to evict those certain pods could be achieved easily.

No, please see my answers to this in my first reply above (the one with the 3 options MODE1, MODE2 and MODE3).
We don't want to create additional scripts. We want make use of a product which is already out there (Descheduler). Possibly some tweaks as asked here in this thread, but in genral the product seems to be usable. Why should I create something on my own. Let's try to create an "all-in-one" product, where people don't need to create so many additional scripts/apps to get this archived.

That said, IMHO, i only see very marginal use case of the scenario you described in this issue from deschduler's point of view. So to clarify further, let me ask you:

I would see this different and a challenge to improve and extend the functionality and flexibility of such a tool as you have created it

Are you trying to evict only some pods with certain labels for LowNodeUtilization strategy? That means you want descheduler to stop as soon as all pods with that certain labels have been evicted?

See above my first reply (MODE1, MODE2, MODE3)

If yes, you could create a PR, and lets see how it looks. But again we should be aware of the issues it might cause from descheduler's point of view and the use case seems only marginally helpful to me.

This is with all the product out there, if it seems not to be helpful from your view. maybe other people would like to have such features. Also nobody uses the complete functionality of a product at all. Each individual will pick the best of it features what it needs for their business. Therefore to add special features might help other also, even if it is not directly useful from your view

Thanks again for your answer, please let me know about my comments above. If you don't like the concept at all, then this is also fine for me. Then I need to think if it still seems to be interesting to implement of if I need to create something on my own, which I don't want to be honest.

Looking forward to hear back from you.

@aveshagarwal
Copy link
Contributor

In my previous comment, i already suggested: "...you could create a PR...". What else you are looking for?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 27, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants