Runtime Even Pod Spreading #154

krmayankk · 2019-05-23T07:46:22Z

Fixes #146

API based on KEP

The Descheduler policy is basically TopologyConstraints per namespace. Its described as follows:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "TopologySpreadConstraint":
     enabled: true
     params:
        namespacedtopologyspreadconstraints:
         - namespace: sam-system
           topologyspreadconstraints:
            - maxSkew: 1
              topologyKey: failure-domain.beta.kubernetes.io/zone
              labelSelector:
                      matchLabels:
                              apptype: server

k8s-ci-robot · 2019-05-23T07:46:23Z

Welcome @krmayankk!

It looks like this is your first PR to kubernetes-incubator/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-incubator/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

krmayankk · 2019-05-23T07:47:14Z

FYI @ravisantoshgudimetla @bsalamat @Huang-Wei @aveshagarwal this PR is to initiate API discussion

krmayankk · 2019-05-29T00:26:42Z

/assign @bsalamat @ravisantoshgudimetla @Huang-Wei

k8s-ci-robot · 2019-05-29T00:26:44Z

@krmayankk: GitHub didn't allow me to assign the following users: Huang-Wei.

Note that only kubernetes-incubator members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @bsalamat @ravisantoshgudimetla @Huang-Wei

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

krmayankk · 2019-05-29T01:41:50Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

@@ -0,0 +1,150 @@
+/*
+Copyright 2017 The Kubernetes Authors.


TODO: Add UT once we reach consensus on the api

Change it to 2019..

ravisantoshgudimetla · 2019-06-10T17:24:12Z

pkg/api/types.go

+	// scheduling it onto zone1(zone2) would make the ActualSkew(2) violate MaxSkew(1)
+	// - if MaxSkew is 2, incoming pod can be scheduled to any zone.
+	// It's a required value. Default value is 1 and 0 is not allowed.
+	MaxSkew int32


@Huang-Wei - I believe this is inline with what you're proposing...

Yes.

@krmayankk if the upstream API is available, I think you can simply vendor that?

Currently the descheduler is looking at TopologySpreadConstraint's as defined in DeschedulerPolicy. Later these constraints will come directly from the Pod itself. So there is nothing to vendor, just the source of the TopologySpreadConstraint will change @Huang-Wei

ravisantoshgudimetla · 2019-06-10T17:25:15Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+		return
+	}
+
+	fmt.Printf("Found following parameters for TopologySpreadConstraint %v\n", strategy)


Please change to klog or glog..

ravisantoshgudimetla · 2019-06-10T17:30:58Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+	constraint api.NamespacedTopologySpreadConstraint) {
+
+	if len(constraint.TopologySpreadConstraints) != 1 {
+		glog.V(1).Infof("We currently only support 1 topology spread constraint per namespace")


I think, we need to explicitly document this along with reason.

Agree. This is a significant limitation.

I'm lean to delivering a full implementation which respects all constraints.

@Huang-Wei so per namespace, do an AND of all constraints ?

@krmayankk Yes.

ravisantoshgudimetla · 2019-06-10T17:52:47Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+			// does this pod labels match the constraint label selector
+			// TODO: This is intentional that it only looks at the first constraint
+			selector, err := metav1.LabelSelectorAsSelector(constraint.TopologySpreadConstraints[0].LabelSelector)
+			if err != nil {


log and continue..

ravisantoshgudimetla · 2019-06-10T17:53:39Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+				continue
+			}
+			if !selector.Matches(labels.Set(pod.Labels)) {
+				continue


same as above, if you think, you're going to log heavily, please increase the log-level or append all the failures and log them at the end

ravisantoshgudimetla · 2019-06-10T17:54:37Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+		if int32(podsInTopo-minPodsForGivenTopo) >= constraint.TopologySpreadConstraints[0].MaxSkew {
+			//we need to evict maxSkew-(podsInTopo-minPodsForGivenTopo))
+			countToEvict := constraint.TopologySpreadConstraints[0].MaxSkew - int32(podsInTopo-minPodsForGivenTopo)
+			podsListToEvict := GetPodsToEvict(countToEvict, v)


Why is this public method? Do you want to expose it for testing?

ravisantoshgudimetla · 2019-06-10T17:56:13Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+}
+
+// GetPodFullName returns a name that uniquely identifies a pod.
+func GetPodFullName(pod *v1.Pod) string {


Where are we using this function?

ravisantoshgudimetla · 2019-06-10T17:58:12Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+			if !selector.Matches(labels.Set(pod.Labels)) {
+				continue
+			}
+			// TODO: Need to determine if the topokey already present in the node or not


I think, we need to address this as part of the PR.

Huang-Wei · 2019-06-14T01:47:49Z

pkg/descheduler/descheduler.go

@@ -60,11 +60,13 @@ func Run(rs *options.DeschedulerServer) error {
 		return nil
 	}

+	glog.V(1).Infof("Reached here \n")


TODO: remove this upon merging.

Huang-Wei · 2019-06-14T02:30:44Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+
+type topologyPairSet map[topologyPair]struct{}
+
+// finnd all nodes


Please reword the comments as well as following ones.

Huang-Wei · 2019-06-14T02:31:42Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+	}
+
+	fmt.Printf("Found following parameters for TopologySpreadConstraint %v\n", strategy)
+	for _, topoConstraints := range strategy.Params.NamespacedTopologySpreadConstraints {


The topoConstraints is sort of "aggregated" topologyConstraints group by namespace? If so, suggest to rename NamespacedTopologySpreadConstraints to AggTopologySpreadConstraintsByNs.

I have described the DeschedulerPolicy in the description. Here is what it looks like: Basically this loop is just reading the per namespace constraints specified in the policy. May be TopologySpreadConstraintsPerNamespace ?

apiVersion: "descheduler/v1alpha1" kind: "DeschedulerPolicy" strategies: "TopologySpreadConstraint": enabled: true params: namespacedtopologyspreadconstraints: - namespace: sam-system topologyspreadconstraints: - maxSkew: 1 topologyKey: failure-domain.beta.kubernetes.io/zone labelSelector: matchLabels: apptype: server

Huang-Wei · 2019-06-14T02:42:05Z

Please remove pkg/descheduler/strategies/.pod_antiaffinity.go.swp.

Huang-Wei · 2019-06-14T02:53:53Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+	topologyPairToPods := make(map[topologyPair]podSet)
+	for _, node := range nodes {
+		glog.V(1).Infof("Processing node: %#v\n", node.Name)
+		pods, err := podutil.ListEvictablePodsOnNode(client, node, false)


I believe clientset is able to filter pods by namespace. We can pass in namespace hence don't need to check if pod.Namespace != constraint.Namespace later.

Listing pods for each node will cause N API requests (N=len(nodes)). It's better to change the logic to:

for each topologySpreadConstraint pre-fetch all qualified pods globally (or use similar cache in descheduler if appropriate) process the pods list: (1) filter out the pods whose node doesn't have needed topologyKey (2) so that we can get a map[nodeName]podSet, and also we know the minimum match number for nodeName, podSet := range map[nodeName]podSet needEvictNum := len(podSet) - minMatch - maxSkew if needEvictNum > 0 evict needEvictNum pods from this Node Note: this math is sort of brute force, we can come up with better math later. For example, for a 5/1/3 cluster, and maxSkew is 1; the brute force math above will evict 3/0/1 pods from each topology, but if we consider the math "dynamically", we should only evict 2/0/0 pods, so that eventually it can become 3/3/3.

Huang-Wei · 2019-06-14T02:54:47Z

pkg/descheduler/strategies/topologyspreadyconstraint.go

+	for _, node := range nodes {
+		glog.V(1).Infof("Processing node: %#v\n", node.Name)
+		pods, err := podutil.ListEvictablePodsOnNode(client, node, false)
+		if err != nil {


Incomplete? Should log and continue.

k8s-ci-robot · 2019-06-30T12:21:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ravisantoshgudimetla
You can assign the PR to them by writing /assign @ravisantoshgudimetla in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2019-08-14T07:27:09Z

/assign

Huang-Wei · 2019-08-19T23:20:59Z

@krmayankk I'm not sure I like the API design in policy "namespacedtopologyspreadconstraints".

I checked existing descheduler policy examples, and all of them offers pretty simple parameters to evict pods violating that kind of policy.

IMO we should offer a neat policy for EvenPodsSpread, i.e. just enabled or disable, or probably additionally provides an option like AntiAffinity in case upstream feature supports more v1.UnsatisfiableConstraintAction options:

https://github.com/kubernetes-incubator/descheduler/blob/9e28f0b362ea5afa6ef4ec15f95cd5fc7eaf108a/examples/node-affinity.yml#L4-L8

With current API, users must explicitly specify every topologySpreadConstraints which isn't practical.

So I want to stop reviewing here to get a concensus on the API first.

cc @ravisantoshgudimetla on the API design.

bsalamat

I only looked at the API. I think it is generally fine. As discussed yesterday in the SIG meeting, we should keep the API as an alpha version and wait for feedback from our users before committing to backward compatibility and longer term support.

bsalamat · 2019-08-23T17:25:43Z

pkg/api/types.go

@@ -48,6 +48,9 @@ type DeschedulerStrategy struct {
 type StrategyParameters struct {
 	NodeResourceUtilizationThresholds NodeResourceUtilizationThresholds
 	NodeAffinityType                  []string
+	// TopologySpreadConstraints describes how a group of pods should be spread across topology
+	// domains. Descheduler will use these constraints to decide which pods to evict.
+	NamespacedTopologySpreadConstraints []NamespacedTopologySpreadConstraint


Isn't this name unnecessarily long? It feels a bit like adding part of its documentation to the name.

fejta-bot · 2019-12-11T12:13:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2019-12-11T12:13:49Z

@krmayankk: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-01-10T13:00:35Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

rhockenbury · 2020-01-24T04:41:50Z

/remove-lifecycle rotten

seanmalloy · 2020-02-08T03:14:21Z

/kind feature

seanmalloy · 2020-02-21T04:42:29Z

@krmayankk are you planning to continue working on this pull request?

It would be great if you could rebase and resolve the merge conflicts. I believe the even pod spreading feature in the scheduler is being promoted to beta in the k8s v1.18, so this will be a very useful feature once k8s v1.18 is released.

Huang-Wei · 2020-02-21T19:53:35Z

@seanmalloy Correct, the PodTopologySpread (featuregate EvenPodsSpread) is gonna be beta in 1.18. So it makes great sense to vendor the 1.18 k/k codebase upon the implementation of this PR.

seanmalloy · 2020-05-13T04:51:52Z

@krmayankk the master branch has been updated with the k/k v1.18 vendor dependencies. Are you planning on continuing to work on this pull request?

I'm willing to continue working on this feature and use your original commits as a starting point if you do not have time to complete this work.

Thanks!

seanmalloy · 2020-06-02T05:38:53Z

/close

I started working on the updated code based on this PR. The new branch is is here: https://github.com/KohlsTechnology/descheduler/tree/evenpod. I'm hoping to submit a new PR in the next few weeks. See also, my comment regarding possible API changes #146 (comment).

k8s-ci-robot · 2020-06-02T05:38:59Z

@seanmalloy: Closed this PR.

In response to this:

/close

I started working on the updated code based on this PR. The new branch is is here: https://github.com/KohlsTechnology/descheduler/tree/evenpod. I'm hoping to submit a new PR in the next few weeks. See also, my comment regarding possible API changes #146 (comment).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 23, 2019

k8s-ci-robot requested a review from aveshagarwal May 23, 2019 07:46

krmayankk force-pushed the evenpod branch from 9cf9e5d to 77bcb19 Compare May 23, 2019 20:48

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 23, 2019

krmayankk force-pushed the evenpod branch 2 times, most recently from c94a2a8 to 85b0603 Compare May 29, 2019 00:25

k8s-ci-robot assigned bsalamat and ravisantoshgudimetla May 29, 2019

krmayankk mentioned this pull request May 29, 2019

Policy for Balancing Pods across topology domains #146

Closed

krmayankk commented May 29, 2019

View reviewed changes

ravisantoshgudimetla reviewed Jun 10, 2019

View reviewed changes

Huang-Wei reviewed Jun 14, 2019

View reviewed changes

Api for even Pod spreading

7b76e81

krmayankk force-pushed the evenpod branch from 85b0603 to 3ef29ad Compare June 30, 2019 12:21

address review comment and some dying

3ef29ad

krmayankk force-pushed the evenpod branch from ec0eccb to 782e04f Compare June 30, 2019 19:30

optimize loops to run faster

782e04f

krmayankk changed the title ~~Runtime Even Pod Spreadig: Api Discussion~~ Runtime Even Pod Spreading Jul 17, 2019

k8s-ci-robot assigned Huang-Wei Aug 14, 2019

bsalamat reviewed Aug 23, 2019

View reviewed changes

mrbobbytables unassigned ravisantoshgudimetla and bsalamat Sep 12, 2019

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 11, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 10, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 24, 2020

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 8, 2020

k8s-ci-robot closed this Jun 2, 2020


		type topologyPairSet map[topologyPair]struct{}

		// finnd all nodes

Runtime Even Pod Spreading #154

Runtime Even Pod Spreading #154

Conversation

krmayankk commented May 23, 2019 • edited Loading

k8s-ci-robot commented May 23, 2019

krmayankk commented May 23, 2019

krmayankk commented May 29, 2019

k8s-ci-robot commented May 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei Jun 14, 2019 • edited Loading

Choose a reason for hiding this comment

Huang-Wei Jun 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Jun 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 30, 2019

Huang-Wei commented Aug 14, 2019

Huang-Wei commented Aug 19, 2019

bsalamat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Dec 11, 2019

k8s-ci-robot commented Dec 11, 2019

fejta-bot commented Jan 10, 2020

rhockenbury commented Jan 24, 2020

seanmalloy commented Feb 8, 2020

seanmalloy commented Feb 21, 2020

Huang-Wei commented Feb 21, 2020

seanmalloy commented May 13, 2020

seanmalloy commented Jun 2, 2020

k8s-ci-robot commented Jun 2, 2020

krmayankk commented May 23, 2019 •

edited

Loading

Huang-Wei Jun 14, 2019 •

edited

Loading

Huang-Wei Jun 14, 2019 •

edited

Loading