-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add New PodLifeTime Strategy #274
Add New PodLifeTime Strategy #274
Conversation
/hold I'm going to test this code out on some real world clusters before asking for a review. All the unit tests that I created are passing, so it appears the code is "correct". But I'd like to try it out in a real cluster to see how it behaves. |
Closing to re-run Travis CI tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @seanmalloy, I know in #205 (comment) I linked to some API conventions saying that we should only use seconds as an int
for an interval, but it's since been pointed out to me that for config APIs such as this (where the only consumer is this component), that standard isn't important.
So, you can hate me for this but if you think it would be better as a Go time.Duration
(as originally suggested in #205 (comment)) then I'll lgtm that too if you want to switch it back.
fd19701
to
e78d632
Compare
@damemi thanks for the feedback. I'm fine with using an |
/hold cancel @damemi and @ingvagabund PTAL when you have some time. Any feedback you have would be greatly appreciated. I was able to successfully test the new strategy on a real cluster, it seems to be working correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
it's a pretty straightforward strategy, looks good to me
@ingvagabund and @damemi please take another look when you have some time. I added two commits since you reviewed last.
I'd also like to squash commits prior to merging, so the commit history is not so messy. |
/kind feature |
@seanmalloy lgtm, +1 for squashing |
aee1345
to
a9919f1
Compare
/assign @ravisantoshgudimetla @aveshagarwal Commits have been squashed. @ravisantoshgudimetla and @aveshagarwal please review when you have some time. Thanks! |
@seanmalloy could you explain what is the use case for this or in what practical scenarios, this strategy is going to be useful? |
@aveshagarwal here is my real world use case .... Running a k8s cluster in an on-prem data center(not a public cloud) I would like to be able to ensure that application pods are resilient to restarts. Therefore I would like to evict all pods that are considered old(i.e. older than 7 days). This helps prevent application teams from treating pods like pets. Treating pods like pets instead of cattle can happen when an application initially migrates from virtual machines into k8s. It also helps make underlying node patching less of a special event because all support teams are conditioned to pods being evicted constantly. The new descheduler strategy implemented in this PR could be used to ensure that old pods get evicted. This new descheduler strategy could also be used in a k8s cluster running in a public cloud, but in my opinion it is easier to run k8s cluster autoscaler and just delete old nodes(i.e. older than 7 days). Deleting old nodes could be accomplished using node-problem-detetor and the descheduler I do have some proprietary code that currently implements deleting k8s nodes older than X days in a public cloud environment. I'm hoping to leverage the descheduler in public cloud and on-prem to ensure that old pods are evicted regularly. Also, I did present this new strategy in a SIG scheduling meeting earlier this year. A formal proposal Google Doc was not created for SIG scheduling to review. |
hmm.. interesting use case. Seems like a training strategy for applications (and also application developers/admins) for setting their expectations right in k8s environments. Might not work for stateful applications always. Could you add something about this use case to README too where you explained about this strategy?
But it might not be feasible to always delete node as they might contain pods with various lifetimes some old some new. Also, it might not be feasible in every cluster. But I get your point i think.
Thats good to know. |
a9919f1
to
49e1c75
Compare
The new PodLifeTime descheduler strategy can be used to evict pods that were created more than the configured number of seconds ago. In the below example pods created more than 24 hours ago will be evicted. ```` apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "PodLifeTime": enabled: true params: maxPodLifeTimeSeconds: 86400 ````
49e1c75
to
668d727
Compare
@aveshagarwal see commit 05496b2. I decided to add this to the user guide instead of the README. I'm afraid the README will get to long. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aveshagarwal, seanmalloy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
05496b2
to
643cd47
Compare
@aveshagarwal I fixed your last review comment. Let me know if any other changes are required. |
Closing to re-run Travis CI tests. |
thanks @seanmalloy for this PR. |
…ime-strategy Add New PodLifeTime Strategy
The new PodLifeTime descheduler strategy can be used to evict pods that were created more than the configured number of seconds ago.
In the below example pods created more than 24 hours ago will be evicted.
Implements feature #205.