Add New PodLifeTime Strategy #274

seanmalloy · 2020-05-01T06:20:47Z

The new PodLifeTime descheduler strategy can be used to evict pods that were created more than the configured number of seconds ago.

In the below example pods created more than 24 hours ago will be evicted.

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "PodLifeTime":
    enabled: true
    params:
      maxPodLifeTimeSeconds: 86400

Implements feature #205.

seanmalloy · 2020-05-01T06:27:35Z

/hold

I'm going to test this code out on some real world clusters before asking for a review. All the unit tests that I created are passing, so it appears the code is "correct". But I'd like to try it out in a real cluster to see how it behaves.

seanmalloy · 2020-05-01T06:31:27Z

Closing to re-run Travis CI tests.

damemi

Hi @seanmalloy, I know in #205 (comment) I linked to some API conventions saying that we should only use seconds as an int for an interval, but it's since been pointed out to me that for config APIs such as this (where the only consumer is this component), that standard isn't important.

So, you can hate me for this but if you think it would be better as a Go time.Duration (as originally suggested in #205 (comment)) then I'll lgtm that too if you want to switch it back.

seanmalloy · 2020-05-03T04:19:18Z

@damemi thanks for the feedback. I'm fine with using an int for the maxPodLifeTimeSeconds field or switching it to a more generic maxPodLifeTime field and using a time.Duration. Either way works for me. If the consensus is that a time.Duration would be better then I'd be more than happy to change it.

pkg/descheduler/strategies/pod_lifetime.go

seanmalloy · 2020-05-03T04:36:44Z

/hold cancel
/assign @damemi @ingvagabund

@damemi and @ingvagabund PTAL when you have some time. Any feedback you have would be greatly appreciated. I was able to successfully test the new strategy on a real cluster, it seems to be working correctly.

ingvagabund

Looks good in overall

pkg/api/v1alpha1/types.go

pkg/descheduler/strategies/pod_lifetime.go

pkg/descheduler/strategies/pod_lifetime_test.go

damemi

/lgtm
it's a pretty straightforward strategy, looks good to me

seanmalloy · 2020-05-06T04:50:15Z

@ingvagabund and @damemi please take another look when you have some time. I added two commits since you reviewed last.

51eed62 refactor to simplify pointer usage in unit tests
aee1345 prevent nil pointer dereference when MaxPodLifeTimeSeconds is not set in the policy config file

I'd also like to squash commits prior to merging, so the commit history is not so messy.

seanmalloy · 2020-05-06T04:58:13Z

/kind feature

ingvagabund · 2020-05-06T09:42:27Z

@seanmalloy lgtm, +1 for squashing

seanmalloy · 2020-05-06T19:37:15Z

/assign @ravisantoshgudimetla @aveshagarwal

Commits have been squashed. @ravisantoshgudimetla and @aveshagarwal please review when you have some time. Thanks!

aveshagarwal · 2020-05-07T19:19:33Z

@seanmalloy could you explain what is the use case for this or in what practical scenarios, this strategy is going to be useful?

seanmalloy · 2020-05-07T20:43:55Z

@aveshagarwal here is my real world use case ....

Running a k8s cluster in an on-prem data center(not a public cloud) I would like to be able to ensure that application pods are resilient to restarts. Therefore I would like to evict all pods that are considered old(i.e. older than 7 days). This helps prevent application teams from treating pods like pets. Treating pods like pets instead of cattle can happen when an application initially migrates from virtual machines into k8s. It also helps make underlying node patching less of a special event because all support teams are conditioned to pods being evicted constantly. The new descheduler strategy implemented in this PR could be used to ensure that old pods get evicted.

This new descheduler strategy could also be used in a k8s cluster running in a public cloud, but in my opinion it is easier to run k8s cluster autoscaler and just delete old nodes(i.e. older than 7 days). Deleting old nodes could be accomplished using node-problem-detetor and the descheduler RemovePodsViolatingNodeTaints strategy.

I do have some proprietary code that currently implements deleting k8s nodes older than X days in a public cloud environment. I'm hoping to leverage the descheduler in public cloud and on-prem to ensure that old pods are evicted regularly.

Also, I did present this new strategy in a SIG scheduling meeting earlier this year. A formal proposal Google Doc was not created for SIG scheduling to review.

pkg/descheduler/descheduler.go

aveshagarwal · 2020-05-08T01:01:27Z

@aveshagarwal here is my real world use case ....

Running a k8s cluster in an on-prem data center(not a public cloud) I would like to be able to ensure that application pods are resilient to restarts. Therefore I would like to evict all pods that are considered old(i.e. older than 7 days). This helps prevent application teams from treating pods like pets. Treating pods like pets instead of cattle can happen when an application initially migrates from virtual machines into k8s. It also helps make underlying node patching less of a special event because all support teams are conditioned to pods being evicted constantly. The new descheduler strategy implemented in this PR could be used to ensure that old pods get evicted.

hmm.. interesting use case. Seems like a training strategy for applications (and also application developers/admins) for setting their expectations right in k8s environments. Might not work for stateful applications always.

Could you add something about this use case to README too where you explained about this strategy?

This new descheduler strategy could also be used in a k8s cluster running in a public cloud, but in my opinion it is easier to run k8s cluster autoscaler and just delete old nodes(i.e. older than 7 days). Deleting old nodes could be accomplished using node-problem-detetor and the descheduler RemovePodsViolatingNodeTaints strategy.

But it might not be feasible to always delete node as they might contain pods with various lifetimes some old some new. Also, it might not be feasible in every cluster. But I get your point i think.

I do have some proprietary code that currently implements deleting k8s nodes older than X days in a public cloud environment. I'm hoping to leverage the descheduler in public cloud and on-prem to ensure that old pods are evicted regularly.

Also, I did present this new strategy in a SIG scheduling meeting earlier this year. A formal proposal Google Doc was not created for SIG scheduling to review.

Thats good to know.

The new PodLifeTime descheduler strategy can be used to evict pods that were created more than the configured number of seconds ago. In the below example pods created more than 24 hours ago will be evicted. ```` apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "PodLifeTime": enabled: true params: maxPodLifeTimeSeconds: 86400 ````

seanmalloy · 2020-05-08T05:42:30Z

Could you add something about this use case to README too where you explained about this strategy?

@aveshagarwal see commit 05496b2. I decided to add this to the user guide instead of the README. I'm afraid the README will get to long.

docs/user-guide.md

aveshagarwal · 2020-05-08T13:52:18Z

/approve

k8s-ci-robot · 2020-05-08T13:52:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aveshagarwal, seanmalloy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [aveshagarwal]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

seanmalloy · 2020-05-08T14:07:27Z

@aveshagarwal I fixed your last review comment. Let me know if any other changes are required.

seanmalloy · 2020-05-08T14:08:34Z

Closing to re-run Travis CI tests.

aveshagarwal · 2020-05-08T14:28:18Z

thanks @seanmalloy for this PR.
/lgtm

…ime-strategy Add New PodLifeTime Strategy

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 1, 2020

k8s-ci-robot requested review from damemi and k82cn May 1, 2020 06:21

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2020

seanmalloy closed this May 1, 2020

seanmalloy reopened this May 1, 2020

damemi reviewed May 1, 2020

View reviewed changes

seanmalloy force-pushed the pod-lifetime-strategy branch from fd19701 to e78d632 Compare May 3, 2020 04:13

seanmalloy commented May 3, 2020

View reviewed changes

pkg/descheduler/strategies/pod_lifetime.go Show resolved Hide resolved

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2020

k8s-ci-robot assigned damemi and ingvagabund May 3, 2020

ingvagabund reviewed May 4, 2020

View reviewed changes

pkg/api/v1alpha1/types.go Outdated Show resolved Hide resolved

pkg/descheduler/strategies/pod_lifetime.go Show resolved Hide resolved

pkg/descheduler/strategies/pod_lifetime.go Show resolved Hide resolved

pkg/descheduler/strategies/pod_lifetime.go Show resolved Hide resolved

ingvagabund reviewed May 5, 2020

View reviewed changes

pkg/descheduler/strategies/pod_lifetime_test.go Outdated Show resolved Hide resolved

damemi approved these changes May 5, 2020

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 5, 2020

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 6, 2020

seanmalloy force-pushed the pod-lifetime-strategy branch from aee1345 to a9919f1 Compare May 6, 2020 19:35

k8s-ci-robot assigned aveshagarwal and ravisantoshgudimetla May 6, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2020

aveshagarwal reviewed May 8, 2020

View reviewed changes

pkg/descheduler/descheduler.go Outdated Show resolved Hide resolved

seanmalloy force-pushed the pod-lifetime-strategy branch from a9919f1 to 49e1c75 Compare May 8, 2020 03:56

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 8, 2020

seanmalloy added 2 commits May 7, 2020 23:10

Add VS Code File To .gitignore

668d727

seanmalloy force-pushed the pod-lifetime-strategy branch from 49e1c75 to 668d727 Compare May 8, 2020 04:10

aveshagarwal reviewed May 8, 2020

View reviewed changes

docs/user-guide.md Outdated Show resolved Hide resolved

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2020

Add initial production use cases section to user guide

643cd47

seanmalloy force-pushed the pod-lifetime-strategy branch from 05496b2 to 643cd47 Compare May 8, 2020 14:02

seanmalloy closed this May 8, 2020

seanmalloy reopened this May 8, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 8, 2020

k8s-ci-robot merged commit c01cfcf into kubernetes-sigs:master May 8, 2020

seanmalloy deleted the pod-lifetime-strategy branch May 8, 2020 14:30

seanmalloy mentioned this pull request May 8, 2020

Strategy For Pod Life Time #205

Closed

seanmalloy mentioned this pull request May 20, 2020

Kubernetes 1.19 Release Cycle #284

Closed

4 tasks

briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022

Merge pull request kubernetes-sigs#274 from KohlsTechnology/pod-lifet…

57ad432

…ime-strategy Add New PodLifeTime Strategy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add New PodLifeTime Strategy #274

Add New PodLifeTime Strategy #274

seanmalloy commented May 1, 2020

seanmalloy commented May 1, 2020 •

edited

Loading

seanmalloy commented May 1, 2020

damemi left a comment

seanmalloy commented May 3, 2020

seanmalloy commented May 3, 2020

ingvagabund left a comment

damemi left a comment

seanmalloy commented May 6, 2020

seanmalloy commented May 6, 2020

ingvagabund commented May 6, 2020

seanmalloy commented May 6, 2020

aveshagarwal commented May 7, 2020

seanmalloy commented May 7, 2020

aveshagarwal commented May 8, 2020 •

edited

Loading

seanmalloy commented May 8, 2020

aveshagarwal commented May 8, 2020

k8s-ci-robot commented May 8, 2020

seanmalloy commented May 8, 2020

seanmalloy commented May 8, 2020

aveshagarwal commented May 8, 2020

Add New PodLifeTime Strategy #274

Add New PodLifeTime Strategy #274

Conversation

seanmalloy commented May 1, 2020

seanmalloy commented May 1, 2020 • edited Loading

seanmalloy commented May 1, 2020

damemi left a comment

Choose a reason for hiding this comment

seanmalloy commented May 3, 2020

seanmalloy commented May 3, 2020

ingvagabund left a comment

Choose a reason for hiding this comment

damemi left a comment

Choose a reason for hiding this comment

seanmalloy commented May 6, 2020

seanmalloy commented May 6, 2020

ingvagabund commented May 6, 2020

seanmalloy commented May 6, 2020

aveshagarwal commented May 7, 2020

seanmalloy commented May 7, 2020

aveshagarwal commented May 8, 2020 • edited Loading

seanmalloy commented May 8, 2020

aveshagarwal commented May 8, 2020

k8s-ci-robot commented May 8, 2020

seanmalloy commented May 8, 2020

seanmalloy commented May 8, 2020

aveshagarwal commented May 8, 2020

seanmalloy commented May 1, 2020 •

edited

Loading

aveshagarwal commented May 8, 2020 •

edited

Loading