-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime Even Pod Spreading #154
Conversation
Welcome @krmayankk! |
FYI @ravisantoshgudimetla @bsalamat @Huang-Wei @aveshagarwal this PR is to initiate API discussion |
c94a2a8
to
85b0603
Compare
/assign @bsalamat @ravisantoshgudimetla @Huang-Wei |
@krmayankk: GitHub didn't allow me to assign the following users: Huang-Wei. Note that only kubernetes-incubator members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -0,0 +1,150 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add UT once we reach consensus on the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to 2019..
// scheduling it onto zone1(zone2) would make the ActualSkew(2) violate MaxSkew(1) | ||
// - if MaxSkew is 2, incoming pod can be scheduled to any zone. | ||
// It's a required value. Default value is 1 and 0 is not allowed. | ||
MaxSkew int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huang-Wei - I believe this is inline with what you're proposing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@krmayankk if the upstream API is available, I think you can simply vendor that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the descheduler is looking at TopologySpreadConstraint's as defined in DeschedulerPolicy. Later these constraints will come directly from the Pod itself. So there is nothing to vendor, just the source of the TopologySpreadConstraint will change @Huang-Wei
return | ||
} | ||
|
||
fmt.Printf("Found following parameters for TopologySpreadConstraint %v\n", strategy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to klog or glog..
constraint api.NamespacedTopologySpreadConstraint) { | ||
|
||
if len(constraint.TopologySpreadConstraints) != 1 { | ||
glog.V(1).Infof("We currently only support 1 topology spread constraint per namespace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we need to explicitly document this along with reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This is a significant limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lean to delivering a full implementation which respects all constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huang-Wei so per namespace, do an AND of all constraints ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krmayankk Yes.
// does this pod labels match the constraint label selector | ||
// TODO: This is intentional that it only looks at the first constraint | ||
selector, err := metav1.LabelSelectorAsSelector(constraint.TopologySpreadConstraints[0].LabelSelector) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log and continue..
continue | ||
} | ||
if !selector.Matches(labels.Set(pod.Labels)) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, if you think, you're going to log heavily, please increase the log-level or append all the failures and log them at the end
if int32(podsInTopo-minPodsForGivenTopo) >= constraint.TopologySpreadConstraints[0].MaxSkew { | ||
//we need to evict maxSkew-(podsInTopo-minPodsForGivenTopo)) | ||
countToEvict := constraint.TopologySpreadConstraints[0].MaxSkew - int32(podsInTopo-minPodsForGivenTopo) | ||
podsListToEvict := GetPodsToEvict(countToEvict, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this public method? Do you want to expose it for testing?
} | ||
|
||
// GetPodFullName returns a name that uniquely identifies a pod. | ||
func GetPodFullName(pod *v1.Pod) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we using this function?
if !selector.Matches(labels.Set(pod.Labels)) { | ||
continue | ||
} | ||
// TODO: Need to determine if the topokey already present in the node or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we need to address this as part of the PR.
@@ -60,11 +60,13 @@ func Run(rs *options.DeschedulerServer) error { | |||
return nil | |||
} | |||
|
|||
glog.V(1).Infof("Reached here \n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: remove this upon merging.
|
||
type topologyPairSet map[topologyPair]struct{} | ||
|
||
// finnd all nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reword the comments as well as following ones.
} | ||
|
||
fmt.Printf("Found following parameters for TopologySpreadConstraint %v\n", strategy) | ||
for _, topoConstraints := range strategy.Params.NamespacedTopologySpreadConstraints { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topoConstraints
is sort of "aggregated" topologyConstraints group by namespace? If so, suggest to rename NamespacedTopologySpreadConstraints
to AggTopologySpreadConstraintsByNs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have described the DeschedulerPolicy in the description. Here is what it looks like: Basically this loop is just reading the per namespace constraints specified in the policy. May be TopologySpreadConstraintsPerNamespace
?
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
"TopologySpreadConstraint":
enabled: true
params:
namespacedtopologyspreadconstraints:
- namespace: sam-system
topologyspreadconstraints:
- maxSkew: 1
topologyKey: failure-domain.beta.kubernetes.io/zone
labelSelector:
matchLabels:
apptype: server
Please remove pkg/descheduler/strategies/.pod_antiaffinity.go.swp. |
topologyPairToPods := make(map[topologyPair]podSet) | ||
for _, node := range nodes { | ||
glog.V(1).Infof("Processing node: %#v\n", node.Name) | ||
pods, err := podutil.ListEvictablePodsOnNode(client, node, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe clientset is able to filter pods by namespace. We can pass in namespace
hence don't need to check if pod.Namespace != constraint.Namespace
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listing pods for each node will cause N API requests (N=len(nodes)). It's better to change the logic to:
for each topologySpreadConstraint
pre-fetch all qualified pods globally (or use similar cache in descheduler if appropriate)
process the pods list:
(1) filter out the pods whose node doesn't have needed topologyKey
(2) so that we can get a map[nodeName]podSet, and also we know the minimum match number
for nodeName, podSet := range map[nodeName]podSet
needEvictNum := len(podSet) - minMatch - maxSkew
if needEvictNum > 0
evict needEvictNum pods from this Node
Note: this math is sort of brute force, we can come up with better math later.
For example, for a 5/1/3 cluster, and maxSkew is 1; the brute force math above will
evict 3/0/1 pods from each topology, but if we consider the math "dynamically", we
should only evict 2/0/0 pods, so that eventually it can become 3/3/3.
for _, node := range nodes { | ||
glog.V(1).Infof("Processing node: %#v\n", node.Name) | ||
pods, err := podutil.ListEvictablePodsOnNode(client, node, false) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete? Should log and continue.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign |
@krmayankk I'm not sure I like the API design in policy "namespacedtopologyspreadconstraints". I checked existing descheduler policy examples, and all of them offers pretty simple parameters to evict pods violating that kind of policy. IMO we should offer a neat policy for EvenPodsSpread, i.e. just enabled or disable, or probably additionally provides an option like AntiAffinity in case upstream feature supports more With current API, users must explicitly specify every topologySpreadConstraints which isn't practical. So I want to stop reviewing here to get a concensus on the API first. cc @ravisantoshgudimetla on the API design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the API. I think it is generally fine. As discussed yesterday in the SIG meeting, we should keep the API as an alpha version and wait for feedback from our users before committing to backward compatibility and longer term support.
@@ -48,6 +48,9 @@ type DeschedulerStrategy struct { | |||
type StrategyParameters struct { | |||
NodeResourceUtilizationThresholds NodeResourceUtilizationThresholds | |||
NodeAffinityType []string | |||
// TopologySpreadConstraints describes how a group of pods should be spread across topology | |||
// domains. Descheduler will use these constraints to decide which pods to evict. | |||
NamespacedTopologySpreadConstraints []NamespacedTopologySpreadConstraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this name unnecessarily long? It feels a bit like adding part of its documentation to the name.
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@krmayankk: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
/kind feature |
@krmayankk are you planning to continue working on this pull request? It would be great if you could rebase and resolve the merge conflicts. I believe the even pod spreading feature in the scheduler is being promoted to beta in the k8s v1.18, so this will be a very useful feature once k8s v1.18 is released. |
@seanmalloy Correct, the PodTopologySpread (featuregate EvenPodsSpread) is gonna be beta in 1.18. So it makes great sense to vendor the 1.18 k/k codebase upon the implementation of this PR. |
@krmayankk the master branch has been updated with the k/k v1.18 vendor dependencies. Are you planning on continuing to work on this pull request? I'm willing to continue working on this feature and use your original commits as a starting point if you do not have time to complete this work. Thanks! |
/close I started working on the updated code based on this PR. The new branch is is here: https://github.com/KohlsTechnology/descheduler/tree/evenpod. I'm hoping to submit a new PR in the next few weeks. See also, my comment regarding possible API changes #146 (comment). |
@seanmalloy: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fixes #146
API based on KEP
The Descheduler policy is basically TopologyConstraints per namespace. Its described as follows: