[MCC][MCD]: Introduce in progress taint #2686

ravisantoshgudimetla · 2021-07-20T21:34:17Z

- What I did
All the upgrade candidate nodes in a mcp would be applied
UpdateInProgress: PreferNoSchedule taint.
The taint will be removed MCD once the upgrade is complete.
Since kubernetes/kubernetes#104251 landed,
the nodes not having PreferNoSchedule taint will have higher score.
Before the upgrade starts, MCC will taint all the nodes in the
cluster that are supposed to be upgraded. Once the upgrade is
complete since MCD will remove the taint, none of the nodes will
have UpdateInProgress: PreferNoSchedule taint. This ensures
the score of the nodes will be equal again.

Why is this needed?
This reduces the pod churn when the cluster upgrade is in progress.
When the non-upgraded nodes in the cluster have UpdateInProgress: PreferNoSchedule taint, they would get lesser score and the pods
would prefer to land onto untainted(upgraded) nodes there by
reducing the chances of landing onto an unupgraded node which
can cause one more reschedule
- How to verify it

- Description for the changelog

wking · 2021-07-20T21:45:07Z

pkg/controller/node/node_controller.go

@@ -973,6 +988,47 @@ func (ctrl *Controller) updateCandidateMachines(pool *mcfgv1.MachineConfigPool,
 	return nil
 }

+// setUpdateInPrgressTaint applies in progress taint to all the nodes that are to be updated.


nit: setUpdateInPrgressTaint -> setUpdateInProgressTaint

pkg/controller/node/node_controller.go

kikisdeliveryservice · 2021-07-20T21:58:00Z

/assign @kikisdeliveryservice

kikisdeliveryservice

quick pass, generally makes sense. will come back tomorrow to finish looking.

kikisdeliveryservice · 2021-07-20T22:04:41Z

pkg/controller/node/node_controller.go

@@ -963,6 +974,10 @@ func (ctrl *Controller) updateCandidateMachines(pool *mcfgv1.MachineConfigPool,
 		if err := ctrl.setDesiredMachineConfigAnnotation(node.Name, targetConfig); err != nil {
 			return goerrs.Wrapf(err, "setting desired config for node %s", node.Name)
 		}
+		ctrl.logPool(pool, "Apply UpdateInProgress taint %s target to %s", node.Name, targetConfig)


Suggested change

ctrl.logPool(pool, "Apply UpdateInProgress taint %s target to %s", node.Name, targetConfig)

ctrl.logPool(pool, "Applying UpdateInProgress taint to %s", node.Name)

pkg/controller/node/node_controller.go

pkg/daemon/daemon.go

kikisdeliveryservice · 2021-07-20T23:19:42Z

pkg/daemon/daemon.go

 	dn.logSystem("Update completed for config %s and node has been successfully uncordoned", desiredConfigName)
 	dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "Uncordon", fmt.Sprintf("Update completed for config %s and node has been uncordoned", desiredConfigName))

 	return nil
 }

+func (dn *Daemon) removeUpdateInProgressTaint() error {
+	// TODO: Move these 2 vars to common location to be shared with


pkg/daemon/daemon.go

kikisdeliveryservice · 2021-07-20T23:30:04Z

Just for ref:

// Like TaintEffectNoSchedule, but the scheduler tries not to schedule
	// new pods onto the node, rather than prohibiting new pods from scheduling
	// onto the node entirely. Enforced by the scheduler.
TaintEffectPreferNoSchedule TaintEffect = "PreferNoSchedule"

https://github.com/kubernetes/kubernetes/blob/f0b7ad3ee06c5168fef5fa4f01fe445ece595f89/pkg/apis/core/types.go#L2684

kikisdeliveryservice · 2021-07-20T23:33:07Z

pkg/controller/node/node_controller.go

+		Duration: 100 * time.Millisecond,
+		Jitter:   1.0,
+	}
+	// NodeUpdateTaint is a taint applied by MCC when the update of node starts.


nit: when you consolidate these vars elsewhere, maybe add the "why" of this to the comment?

Sure. Do you have a location where I can share code across MCO and MCC?

i think we could maybe use constants.go?

https://github.com/openshift/machine-config-operator/blob/9c6c2bfd7ed498bfbc296d530d1839bd6a177b0b/pkg/constants/constants.go

sinnykumari · 2021-07-21T08:48:13Z

Once this is ready, can we please add better description of why we are adding this taint in MCO. Also, will be nice to have details in commit message.

Also since feature freeze is on Friday, from the slack conversation I believe we leaned towards implementing this in 4.10.

ravisantoshgudimetla

Thank you for the reviews @sinnykumari @kikisdeliveryservice @wking

Once this is ready, can we please add better description of why we are adding this taint in MCO. Also, will be nice to have details in commit message.

Of course, I opened this to see if I am in the right direction or not.

Also since feature freeze is on Friday, from the slack conversation I believe we leaned towards implementing this in 4.10.

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

ravisantoshgudimetla · 2021-07-21T16:35:05Z

pkg/controller/node/node_controller.go

+		Duration: 100 * time.Millisecond,
+		Jitter:   1.0,
+	}
+	// NodeUpdateTaint is a taint applied by MCC when the update of node starts.


Sure. Do you have a location where I can share code across MCO and MCC?

pkg/controller/node/node_controller.go

pkg/daemon/daemon.go

kikisdeliveryservice · 2021-07-21T17:52:30Z

/test verify

kikisdeliveryservice · 2021-07-21T17:58:40Z

pkg/daemon/daemon.go

@@ -8,6 +8,10 @@ import (
 	"fmt"
 	"io"
 	"io/ioutil"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"


failing verify here:

pkg/daemon/daemon.go:11: File is not goimports-ed (goimports)

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2686/pull-ci-openshift-machine-config-operator-master-verify/1467926127505838080

pkg/controller/node/node_controller.go

soltysh · 2021-07-21T18:47:57Z

pkg/controller/node/node_controller.go

+		Jitter:   1.0,
+	}
+	// NodeUpdateTaint is a taint applied by MCC when the update of node starts.
+	NodeUpdateTaint = &corev1.Taint{


Nit: NodeUpdateInProgressTaint?

soltysh · 2021-07-21T18:53:11Z

pkg/daemon/daemon.go

+			}
+		}
+
+		newNode.Spec.Taints = append(newNode.Spec.Taints[:taintIndex], newNode.Spec.Taints[taintIndex+1:]...)


Alternatively, we can just iterate over taints leaving out the one we want to remove and then assign that. The complexity is similar, a bit more expensive memory wise, but easier to read, imo.

sinnykumari · 2021-07-22T10:02:02Z

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

I understand this change is not much invasive. Main hesitation is because of unplanned changes that get added in repo without having full context and time crunch makes things worse. With limited team capacity and already other planned work and unplanned bugs, it becomes a bit difficult. Having some time buffer allows to think and review code better for future maintenance.

Also whatever code we write, we all do our best to test and review and think that this shouldn't break existing fetaures but we all know what happens in reality ;)

kikisdeliveryservice · 2021-07-22T19:43:09Z

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

I understand this change is not much invasive. Main hesitation is because of unplanned changes that get added in repo without having full context and time crunch makes things worse. With limited team capacity and already other planned work and unplanned bugs, it becomes a bit difficult. Having some time buffer allows to think and review code better for future maintenance.

Also whatever code we write, we all do our best to test and review and think that this shouldn't break existing fetaures but we all know what happens in reality ;)

Agree with Sinny. next release is the better time frame. FTR I don't think this is a PR quality issue and more of a review/testing/discussion issue and I don't see a reason we can't work together to get this in next time (on the earlier side) pending those convos and testing/review/ :smile:

ravisantoshgudimetla · 2021-07-26T20:28:59Z

/test e2e-gcp-upgrade

ravisantoshgudimetla · 2021-07-30T12:55:21Z

/test e2e-gcp-upgrade

openshift-bot · 2021-10-28T15:24:07Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sinnykumari · 2021-10-28T16:24:07Z

@ravisantoshgudimetla This PR didn't get any recent attention. Are you still working in this direction? If yes, we can revisit and have discussion if needed.

ravisantoshgudimetla · 2021-12-02T15:11:46Z

kubernetes/kubernetes#104251 merged upstream, once openshift/kubernetes#1076 lands, we should be able to use beta3 version which has increased weights

kikisdeliveryservice · 2021-12-06T18:37:13Z

/retest

kikisdeliveryservice · 2021-12-06T18:42:39Z

As per call, we'd also like an OTA approver on this PR before merging.

/assign @wking

sinnykumari · 2021-12-09T13:05:46Z

Thanks for updating the commit message and description.

All the upgrade candidate nodes in a mcp would be applied `UpdateInProgress: PreferNoSchedule` taint. The taint will be removed MCD once the upgrade is complete. Since kubernetes/kubernetes#104251 landed, the nodes not having PreferNoSchedule taint will have higher score. Before the upgrade starts, MCC will taint all the nodes in the cluster that are supposed to be upgraded. Once the upgrade is complete since MCD will remove the taint, none of the nodes will have `UpdateInProgress: PreferNoSchedule` taint. This ensures the score of the nodes will be equal again. Why is this needed? This reduces the pod churn when the cluster upgrade is in progress. When the non-upgraded nodes in the cluster have `UpdateInProgress: PreferNoSchedule` taint, they would get lesser score and the pods would prefer to land onto untainted(upgraded) nodes there by reducing the chances of landing onto an unupgraded node which can cause one more reschedule

sinnykumari

Overall LGTM. I don't see any major review comment left from others that can change behavior of this feature. Let's get this merged and if any minor concerns needs to get fixed, it can be fixed as follow-up PR.

Also, with Trevor's approval we should be good to go from upgrade team side.

/lgtm

openshift-ci · 2021-12-10T17:39:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ravisantoshgudimetla, sinnykumari, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ravisantoshgudimetla · 2021-12-10T17:48:16Z

/retest

openshift-bot · 2021-12-10T18:05:54Z

/retest-required