Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AndrewSirenko
Copy link
Contributor

Is this a bug fix or adding new feature?
helm

What is this PR about? / Why do we need it?
The driver node plugin must stay running after all stateful workloads are evicted from pod so that it can Unpublish/Unstage volumes and report that back to the Kubelet. Therefore, EBS CSI Driver Daemonset needs to tolerate auto-scalar node drain/deletion taints if customers have set node.TolerateAllTaints=false

With this PR we tolerate 3 common auto-scalar taints:

What testing is done?

❯ helm template aws-ebs-csi-driver ./charts/aws-ebs-csi-driver --set node.tolerateAllTaints=false > post.yml
❯ helm template aws-ebs-csi-driver ./charts/aws-ebs-csi-driver --set node.tolerateAllTaints=false > pre.yml

❯ diff pre.yml post.yml -C 5
*** pre.yml	2024-10-02 15:28:21.971993034 +0000
--- post.yml	2024-10-02 15:28:13.722043329 +0000
***************
*** 452,461 ****
--- 452,470 ----
        priorityClassName: system-node-critical
        tolerations:
          - effect: NoExecute
            operator: Exists
            tolerationSeconds: 300
+         - effect: NoSchedule
+           key: ToBeDeletedByClusterAutoscaler
+           operator: Exists
+         - effect: NoSchedule
+           key: karpenter.sh/disrupted
+           operator: Exists
+         - effect: NoSchedule
+           key: karpenter.sh/disruption
+           operator: Exists
          - key: "ebs.csi.aws.com/agent-not-ready"
            operator: "Exists"
        hostNetwork: false
        securityContext:
          fsGroup: 0

Also quick test that taint is tolerated on live cluster.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andrewsirenko. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

Code Coverage Diff

This PR does not change the code coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants