-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy for Balancing Pods across topology domains #146
Comments
/assign @krmayankk |
@krmayankk As you know, we are adding even pod spreading in 1.15. Given that effort, I don't think it makes sense to add the same (or a very similar) feature in parallel to another component. If timing is a concern, you may want to fork a component and customize it for you use-cases in the meantime that we release even pod spreading. |
Not sure i understand. descheduler will just evict if the pod imbalance is detected. Are you against adding this in descheduler ?
Descheduler will just do this at runtime based on a new policy. Not sure i understand about the same API, since descheduler cannot work on an API that is being still worked on . May you are saying, that later descheduler can be modified to understand the pod topology fields being introduced in the even pod spreading . Is my understanding correct ? |
I think, that's what @bsalamat meant but I will wait for his comments. Tbh I am wondering do we need to use the API? We can obviously vendor the scheduler code and use it once we have functions that are specific to pod spreading. |
Your understanding is right. We don't want to introduce a new API just for Descheduler and change it in the future. |
@bsalamat Since the new features for even spreading has an api which is dependent on fields in the pod, we cannot use it now . So we would have to introduce a new policy in descheduler which is similar in semantics to the pod fields being introduced in even pod spreading . Once that feature is readily available we can converge . The whole point of descheduler being out of core is that we can easily adapt and not wait for the core changes which in this case will take many months to even come up as beta. The descheduler new policy could still utilize the following api for determining which pods to delete and when the pods fields are available, we just need to make it use the pod fields.
|
It is really up to the maintainers of the Descheduler to decide. Since our incubator projects have external users, you should be careful with API changes IMO. At lease, you should clearly mark such new API as experimental. |
i have started working on this PR and will get this up by friday. |
For anyone following along, the initial api and the initial implementation is in the PR #154 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hi folks, we faced the same issue pods do not balance cross ability zones after rolling update. In 1.15 Kubernetes introduce new Even Pods Spread to solve this issue, is there any solution that could apply in an old version of Kubernetes? |
@axot The "EvenPodsSpread" feature is postponed to be available in 1.16 as an alpha feature. BTW: if your requirement of even spread upon scheduling time, or runtime (i.e. after the initial scheduling decision was made), or both? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
After performing some availability tests (power-off an entire availability zone) we frequently end up with all Deployment Pods running in the surviving site (third availability zone is master-only). So having topology-based descheduling would greatly help assuring application availability. |
/kind feature |
@krmayankk Any update on this issue? |
BTW: PodTopologySpread (a.k.a EvenPodsSpread) will be promoted to beta in 1.18 in k/k/ |
/unassign @krmayankk |
@seanmalloy @Huang-Wei are we taking the same approach as the PR i had started ? Is there any work happening in k/k to solve runtime scheduling anytime soon ? |
@krmayankk Yes, we will basically be following the same approach. It's just some data structures and logic have been optimized which may be able to reused here. And there is no significant work on runtime scheduling yet. |
@damemi @aveshagarwal @ravisantoshgudimetla @Huang-Wei @krmayankk please let me know if you have any thoughts on the below proposed API changes for this strategy. Thanks! Here is the original API proposed by @krmayankk. I noticed that there are parameters for namespace and label selector. At this point in time the descheduler does not have any options for selecting pods by namespace or label.
I propose removing the parameters for namespace and label selector. Something like this:
I do believe that it would be useful for the descheduler to restrict the pods it considers for eviction by namespace and label selector, but in my opinion these should be handled separately. See #251 for namespaces and #195 for label selectors. |
Doing a bit more thinking it might be possible to create this strategy without any parameters. Just use the So all that might be needed would be:
|
@seanmalloy you're right, the old API was proposed in the context of co-existing with PodTopologySpread - by then it was still alpha. Now PodTopologySpread is going to be GA, or at least beta in 1.18, you can leverage the existing canonical API fields for sure. |
I believe this strategy should be named
|
Can we still target this for 1.19 descheduler @seanmalloy? @Huang-Wei PodTopologySpread was graduated to Beta right? |
@damemi yep ... i have a branch with this feature here ... https://github.com/KohlsTechnology/descheduler/tree/evenpod. The code doesn't work yet, but it does compile. :-) I believe I can submit a PR for review in a few weeks. |
@damemi yes, it was beta in 1.18, and will be ga in 1.19. BTW: is descheduler going to release a 1.19 version when k8s v1.19.0 is out? |
@Huang-Wei yes that is the plan. Going forward the descheduler should release a new version shortly after each k8s minor release. |
@seanmalloy @Huang-Wei is this in progress , can you link a PR here ? |
@krmayankk I'm hoping to open a pull request for this by the end of the week. My intent is to have the new strategy available for descheduler release v0.19.0 which should be released sometime in August. Here is the unfinished code that I'll be working towards completing this week: |
Still no pull request. I'll starting working on this again the week of August 10th. Sorry for the delay. |
1.19 GA is planned for August 25th, so I think as long as we have it before then it can make our 1.19 cut |
Sorry but I don't have time to finish this. See details in WIP pull request #383. /unassign |
/assign I think I can pick this one up. Thanks for starting the work @seanmalloy! |
New PR open here: #413. Please provide feedback to help optimize what I've got! |
This feature will ship as part of descheduler release v0.20.0. |
The currently well known topology domains labels on the nodes are
Descheduler currently supports the following options:
RemoveDuplicates
,LowNodeUtilization
,RemovePodsViolatingInterPodAntiAffinity
andRemovePodsViolatingNodeAffinity
. None of these options support balancing pods across topology domains. The closest one to mimic this behavior would be LowNodeUtilization which basically based on two thresholds, evicts pods which don't fall between a range but is sub optimal.Currently there is a KEP for even pod spreading, but that will take some time to come in. In the meantime , we need a policy in descheduler which does the following:-
The text was updated successfully, but these errors were encountered: