-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/con… #10464
Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/con… #10464
Conversation
Hi @unai-ttxu. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unai-ttxu Thank you 👍
That could be a breaking change for apps/deployment that follows the old taint. We need to make sure to add this to the release notes. |
Since we are removing the old taint and keeping the new one it shouldn't break any apps/deployments. Now, when both taints are set, apps/deployments should have both tolerations configured: tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane So, this change will not break these apps/deployments. Of couse we will need to clean this code and remove the toleration for the |
Hi @unai-ttxu, thanks for the cleanup, much appreciated! Could you also update this in the rest of kubespray as well? Like those places:
|
Sure I could, but I don't know if we should. If we decide to remove these tolerations, those components won't be able to deploy if Since What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I could, but I don't know if we should. If we decide to remove these tolerations, those components won't be able to deploy if
node-role.kubernetes.io/master
taint is configured.
I thought this won't be a problem with current code you added to add the taint but actually I have a few comment on this.
Since
Minimum required version of Kubernetes is v1.26
, I think this won't cause any problem, but maybe as @mzaian said, in case of making this changes we shouldadd this to the release notes
.
In any case it would be present but we should probably make it very visible yeah
@@ -11,6 +11,15 @@ | |||
--timeout={{ upgrade_post_cilium_wait_timeout }} | |||
delegate_to: "{{ groups['kube_control_plane'][0] }}" | |||
|
|||
- name: Ensure control-plane taints after upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has two problems, you taint every node even kube_node
which is very wrong. And even for control-plane nodes the user could want to not taint it. I think the correct behavior should be:
- Adding the new taint to every nodes in kube_control_plane that does have the taint master already
- removing the taint master from kube_control_plane nodes
And doing it somewhat early would be a plus as well, so that we can upgrade the toleration of the app we install with the new manifest directly. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This task is skipped for every kube_node
, notice the not
in the second conditional. So this task won't taint every kube_node
, it just taint all kube_control_plane
except when a node is both kube_control_plane
and kube_node
. In this case the taint should not be set in order to allow the node to run all types of loads, since it's both a worker
and a control-plane
in this case.
If a node is just a kube_control_plane
the user should not have the choice to not taint it, since it's the default configuration for k8s control-planes
. If the user wants to install a control-plane
which can run any type of loads, without any toleration configuration, it should add it to both kube_control_plane
and kube_node
groups. This is already documented in https://github.com/kubernetes-sigs/kubespray/blob/master/docs/ansible.md#inventory.
If you want the server to act both as control-plane and node, the server must be defined on both groups kube_control_plane and kube_node. If you want a standalone and unschedulable control plane, the server must be defined only in the kube_control_plane and not kube_node.
Finally I don't think there is any need of doing it somewhat early in order to allow toleration adaptations. As I have said in #10464 (comment), this taint migration won't break any deployment which defines both tolerations.
After the upgrade, the user will be able to update their deployments to remove the legacy not needed toleration, but this can be done after the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This task is skipped for every kube_node, notice the not in the second conditional.
Ah yes indeed my bad!
If a node is just a kube_control_plane the user should not have the choice to not taint it, since it's the default configuration for k8s control-planes. If the user wants to install a control-plane which can run any type of loads, without any toleration configuration, it should add it to both kube_control_plane and kube_node groups. This is already documented in https://github.com/kubernetes-sigs/kubespray/blob/master/docs/ansible.md#inventory.
Yes but I am bit concerned by this because some users might have untaint a node outside of kubespray scope. And I am not sure if we should force the taints on users. basically.
Finally I don't think there is any need of doing it somewhat early in order to allow toleration adaptations. As I have said in #10464 (comment), this taint migration won't break any deployment which defines both tolerations.
After the upgrade, the user will be able to update their deployments to remove the legacy not needed toleration, but this can be done after the upgrade.
I know but doing this early would allow us to remove our legacy taint safely within one kubespray version. If we don't do that now, we will have to wait for 1 maybe 2 kubespray release to do that and this might get lost in the process. So I think that having this taint early and removing the legacy taint everywhere else in kubespray would very much help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I am bit concerned by this because some users might have untaint a node outside of kubespray scope. And I am not sure if we should force the taints on users. basically.
The thing is that kubeadm removes the legacy taint node-role.kubernetes.io/master
during the upgrade to v1.25
. So in case of a cluster where control-plane
nodes just have this taint and not both (legacy and current taints), we would need to ensure that node-role.kubernetes.io/control-plane
is set before the upgrade. Do you think that adding the node-role.kubernetes.io/control-plane
taint before the upgrade in case of node-role.kubernetes.io/master
taint is set is a better approach?
I think that it could be cool in order to cover some corner cases (like some users who might have untaint a node outside of kubespray scope) altough this corner case should not be covered, as they are not following the Kubespray guidelines to configure a shared control-plane
worker
node IMHO.
I just decided to set the taint just before uncordoning the node in order to ensure idempotency and code readability. But if you think that covering these corner cases are important enough I can check how to implement it as nice as it's now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that kubeadm removes the legacy taint node-role.kubernetes.io/master during the upgrade to v1.25. So in case of a cluster where control-plane nodes just have this taint and not both (legacy and current taints), we would need to ensure that node-role.kubernetes.io/control-plane is set before the upgrade. Do you think that adding the node-role.kubernetes.io/control-plane taint before the upgrade in case of node-role.kubernetes.io/master taint is set is a better approach?
Kubernetes 1.25 is no longer supported in master so we shouldn't need to worry about the master role if it is already removed in that version then and we can get rid of it everywhere. Considering that we don't support 1.25 I am not sure where we should add the control-plane taint though 🤔. I see two possible paths moving forward:
- Backport the code enforcing the control plane taint to kubespray 2.23.0 + add a note to kubespray 2.24.0 mentioning that users that did a 1.25 upgrade and didn't upgraded their cluster once using the new 2.23 version (2.23.1 probably?) should check their taint and that the control-plane should be re-added manually
- Systematically enforce control plane node like you are doing in this PR but that's technically a change of behavior vs what we used to do. As before we didn't enforce that on upgrade, so people could remove the taint outside of kubespray and it would work even when reapplying kubespray afaiu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the aim of this PR is fixing #10217 I think that we can do the following:
-
master
: Just keep in the changes in thekubeadm
configuration files to remove the legacy taint in fresh installations. Also we can include the changes suggested in Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/con… #10464 (comment) since every cluster installed using the future 2.24.X won't have the legacy taint and every cluster upgraded using the future 2.24.X will be already with k8s 1.25. -
release-2.23
: Include the option to enforcing the control plane taint in order to avoid missing all control plane taints if upgrading to k8s 1.25. In this case we can tune this PR in orde to avoid this change of behavior about force tainting any node during the upgrade. Also, as you have said, I think is a good idea mentioning the users that have already upgraded their clusters to k8s v1.25 to check their control plane taints, since it can be already missing. Also I think the best idea is to delegate them the operation to re-added manually.
What do you think about this?
Thanks for the feedback 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be perfect thank you🙏.
And thanks for bearing with me I admit that I kind of multitasked with other topics and didn't get every details in my previous answers. We do have a nice solution now though 🤝! :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be perfect thank you🙏.
And thanks for bearing with me I admit that I kind of multitasked with other topics and didn't get every details in my previous answers. We do have a nice solution now though 🤝! :D
I'll try to make this changes next week. I'll ping you as soon as they are ready.
No problem, I appreciate the feedback. I also multitask more than I want to, I know the feeling 😄
647fa3e
to
a1c09a0
Compare
@MrFreezeex I have updated this PR and created #10532 If you have any doubt let me know 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thanks again 👍, it should get merged when the CI succeeds
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: floryut, MrFreezeex, unai-ttxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
kubernetes-sigs#10464) * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
kubernetes-sigs#10464) * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane * Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
…trol-plane
What type of PR is this?
What this PR does / why we need it:
Since kubeadm 1.25, legacy
control-plane
taints are deleted:kubernetes/kubernetes@ddd046f
This PR aligns
control-plane
taints definitions and ensure that taints are not lost during upgrade.Which issue(s) this PR fixes:
Fixes #10217
Special notes for your reviewer:
If you hace any doubt feel free to ask 😄
Does this PR introduce a user-facing change?: