Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MCC][MCD]: Introduce in progress taint #2686

Merged

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Jul 20, 2021

- What I did
All the upgrade candidate nodes in a mcp would be applied
UpdateInProgress: PreferNoSchedule taint.
The taint will be removed MCD once the upgrade is complete.
Since kubernetes/kubernetes#104251 landed,
the nodes not having PreferNoSchedule taint will have higher score.
Before the upgrade starts, MCC will taint all the nodes in the
cluster that are supposed to be upgraded. Once the upgrade is
complete since MCD will remove the taint, none of the nodes will
have UpdateInProgress: PreferNoSchedule taint. This ensures
the score of the nodes will be equal again.

Why is this needed?
This reduces the pod churn when the cluster upgrade is in progress.
When the non-upgraded nodes in the cluster have UpdateInProgress: PreferNoSchedule taint, they would get lesser score and the pods
would prefer to land onto untainted(upgraded) nodes there by
reducing the chances of landing onto an unupgraded node which
can cause one more reschedule
- How to verify it

- Description for the changelog

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 20, 2021
@@ -973,6 +988,47 @@ func (ctrl *Controller) updateCandidateMachines(pool *mcfgv1.MachineConfigPool,
return nil
}

// setUpdateInPrgressTaint applies in progress taint to all the nodes that are to be updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: setUpdateInPrgressTaint -> setUpdateInProgressTaint

@kikisdeliveryservice
Copy link
Contributor

/assign @kikisdeliveryservice

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick pass, generally makes sense. will come back tomorrow to finish looking.

@@ -963,6 +974,10 @@ func (ctrl *Controller) updateCandidateMachines(pool *mcfgv1.MachineConfigPool,
if err := ctrl.setDesiredMachineConfigAnnotation(node.Name, targetConfig); err != nil {
return goerrs.Wrapf(err, "setting desired config for node %s", node.Name)
}
ctrl.logPool(pool, "Apply UpdateInProgress taint %s target to %s", node.Name, targetConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctrl.logPool(pool, "Apply UpdateInProgress taint %s target to %s", node.Name, targetConfig)
ctrl.logPool(pool, "Applying UpdateInProgress taint to %s", node.Name)

pkg/controller/node/node_controller.go Show resolved Hide resolved
dn.logSystem("Update completed for config %s and node has been successfully uncordoned", desiredConfigName)
dn.recorder.Eventf(getNodeRef(dn.node), corev1.EventTypeNormal, "Uncordon", fmt.Sprintf("Update completed for config %s and node has been uncordoned", desiredConfigName))

return nil
}

func (dn *Daemon) removeUpdateInProgressTaint() error {
// TODO: Move these 2 vars to common location to be shared with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice <3

@kikisdeliveryservice
Copy link
Contributor

Just for ref:

// Like TaintEffectNoSchedule, but the scheduler tries not to schedule
	// new pods onto the node, rather than prohibiting new pods from scheduling
	// onto the node entirely. Enforced by the scheduler.
TaintEffectPreferNoSchedule TaintEffect = "PreferNoSchedule"

https://github.com/kubernetes/kubernetes/blob/f0b7ad3ee06c5168fef5fa4f01fe445ece595f89/pkg/apis/core/types.go#L2684

Duration: 100 * time.Millisecond,
Jitter: 1.0,
}
// NodeUpdateTaint is a taint applied by MCC when the update of node starts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when you consolidate these vars elsewhere, maybe add the "why" of this to the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do you have a location where I can share code across MCO and MCC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinnykumari
Copy link
Contributor

Once this is ready, can we please add better description of why we are adding this taint in MCO. Also, will be nice to have details in commit message.

Also since feature freeze is on Friday, from the slack conversation I believe we leaned towards implementing this in 4.10.

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reviews @sinnykumari @kikisdeliveryservice @wking

Once this is ready, can we please add better description of why we are adding this taint in MCO. Also, will be nice to have details in commit message.

Of course, I opened this to see if I am in the right direction or not.

Also since feature freeze is on Friday, from the slack conversation I believe we leaned towards implementing this in 4.10.

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

Duration: 100 * time.Millisecond,
Jitter: 1.0,
}
// NodeUpdateTaint is a taint applied by MCC when the update of node starts.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do you have a location where I can share code across MCO and MCC?

pkg/controller/node/node_controller.go Show resolved Hide resolved
pkg/controller/node/node_controller.go Show resolved Hide resolved
pkg/daemon/daemon.go Show resolved Hide resolved
@kikisdeliveryservice
Copy link
Contributor

/test verify

@@ -8,6 +8,10 @@ import (
"fmt"
"io"
"io/ioutil"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing verify here:

pkg/daemon/daemon.go:11: File is not goimports-ed (goimports)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/controller/node/node_controller.go Show resolved Hide resolved
Jitter: 1.0,
}
// NodeUpdateTaint is a taint applied by MCC when the update of node starts.
NodeUpdateTaint = &corev1.Taint{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: NodeUpdateInProgressTaint?

}
}

newNode.Spec.Taints = append(newNode.Spec.Taints[:taintIndex], newNode.Spec.Taints[taintIndex+1:]...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can just iterate over taints leaving out the one we want to remove and then assign that. The complexity is similar, a bit more expensive memory wise, but easier to read, imo.

@sinnykumari
Copy link
Contributor

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

I understand this change is not much invasive. Main hesitation is because of unplanned changes that get added in repo without having full context and time crunch makes things worse. With limited team capacity and already other planned work and unplanned bugs, it becomes a bit difficult. Having some time buffer allows to think and review code better for future maintenance.

Also whatever code we write, we all do our best to test and review and think that this shouldn't break existing fetaures but we all know what happens in reality ;)

@kikisdeliveryservice
Copy link
Contributor

I am fine with 4.10 implementation but I am wondering if this is too much of invasive change?

I understand this change is not much invasive. Main hesitation is because of unplanned changes that get added in repo without having full context and time crunch makes things worse. With limited team capacity and already other planned work and unplanned bugs, it becomes a bit difficult. Having some time buffer allows to think and review code better for future maintenance.

Also whatever code we write, we all do our best to test and review and think that this shouldn't break existing fetaures but we all know what happens in reality ;)

Agree with Sinny. next release is the better time frame. FTR I don't think this is a PR quality issue and more of a review/testing/discussion issue and I don't see a reason we can't work together to get this in next time (on the earlier side) pending those convos and testing/review/ :smile:

@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e-gcp-upgrade

1 similar comment
@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e-gcp-upgrade

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2021
@sinnykumari
Copy link
Contributor

@ravisantoshgudimetla This PR didn't get any recent attention. Are you still working in this direction? If yes, we can revisit and have discussion if needed.

@ravisantoshgudimetla
Copy link
Contributor Author

kubernetes/kubernetes#104251 merged upstream, once openshift/kubernetes#1076 lands, we should be able to use beta3 version which has increased weights

@kikisdeliveryservice kikisdeliveryservice removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2021
@kikisdeliveryservice
Copy link
Contributor

/retest

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Dec 6, 2021

As per call, we'd also like an OTA approver on this PR before merging.

/assign @wking

@sinnykumari
Copy link
Contributor

Thanks for updating the commit message and description.

All the upgrade candidate nodes in a mcp would be applied
`UpdateInProgress: PreferNoSchedule` taint.
The taint will be removed MCD once the upgrade is complete.
Since kubernetes/kubernetes#104251 landed,
the nodes not having PreferNoSchedule taint will have higher score.
Before the upgrade starts, MCC will taint all the nodes in the
cluster that are supposed to be upgraded. Once the upgrade is
complete since MCD will remove the taint, none of the nodes will
have `UpdateInProgress: PreferNoSchedule` taint. This ensures
the score of the nodes will be equal again.

Why is this needed?
This reduces the pod churn when the cluster upgrade is in progress.
When the non-upgraded nodes in the cluster have `UpdateInProgress:
PreferNoSchedule` taint, they would get lesser score and the pods
would prefer to land onto untainted(upgraded) nodes there by
reducing the chances of landing onto an unupgraded node which
can cause one more reschedule
Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I don't see any major review comment left from others that can change behavior of this feature. Let's get this merged and if any minor concerns needs to get fixed, it can be fixed as follow-up PR.

Also, with Trevor's approval we should be good to go from upgrade team side.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ravisantoshgudimetla, sinnykumari, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

15 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2021

@ravisantoshgudimetla: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ovn-step-registry 2976dc6 link false /test e2e-ovn-step-registry
ci/prow/e2e-aws-upgrade-single-node 2976dc6 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-disruptive 2976dc6 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-op-single-node 2976dc6 link false /test e2e-gcp-op-single-node
ci/prow/e2e-aws-single-node 2976dc6 link false /test e2e-aws-single-node
ci/prow/e2e-metal-ipi 2976dc6 link false /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -3,6 +3,7 @@ package node
import (
"encoding/json"
"fmt"
"github.com/openshift/machine-config-operator/pkg/constants"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops ordering needs to be sorted out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants