Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod disruption schedule #1719

Open
jukie opened this issue Sep 29, 2024 · 3 comments · May be fixed by #1720
Open

Pod disruption schedule #1719

jukie opened this issue Sep 29, 2024 · 3 comments · May be fixed by #1720
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@jukie
Copy link

jukie commented Sep 29, 2024

Description

What problem are you trying to solve?
I have some workloads that are sensitive to interruptions at certain points of the day and thus are using the karpenter.sh/do-not-disrupt annotation. I'd like the ability to allow disruptions to these pods at specific points via cron format schedule.

How important is this feature to you?
In order to allow reclaiming nodes for expiration or underutilization I'm currently running my own controller that watches DisruptionBlocked events and then removes the do-not-disrupt annotation if the pods are marked with another one indicating the schedule for when disruptions are allowed. I'd like something similar to be added upstream and get rid of my own controller.

  1. karpenter.sh/disruption-schedule - cron format of when disruptions are allowed (e.g. 0 14 * * 6)
  2. karpenter.sh/disruption-schedule-duration - duration for which the schedule is active (e.g. 3h)
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jukie jukie added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 29, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 29, 2024
@jukie jukie changed the title Pod disruption window Pod disruption schedule Sep 29, 2024
@jukie jukie linked a pull request Sep 29, 2024 that will close this issue
@njtran
Copy link
Contributor

njtran commented Oct 1, 2024

Is this for your job/task related pods? Would it be sufficient for you if the do-not-disrupt annotation respected a duration string for how long it couldn't be disrupted, and then otherwise is fine to ignore?

@jukie
Copy link
Author

jukie commented Oct 2, 2024

This would be for always-running job/task workers or singleton services. terminationGracePeriod solves the duration piece for do-not-disrupt and gives us a guaranteed max lifetime for a node but the use case would be for workloads that want to allow disruption at specific times of the day such as a legacy monolith that only runs during business hours. In that scenario I want an extension on do-not-disrupt so that if a node is marked for disruption during the disruption-schedule then it's safe to disrupt immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants