helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

AndrewSirenko · 2024-10-03T14:06:04Z

Is this a bug fix or adding new feature?
helm

What is this PR about? / Why do we need it?
The driver node plugin must stay running after all stateful workloads are evicted from pod so that it can Unpublish/Unstage volumes and report that back to the Kubelet. Therefore, EBS CSI Driver Daemonset needs to tolerate auto-scalar node drain/deletion taints if customers have set node.TolerateAllTaints=false

With this PR we tolerate 3 common auto-scalar taints:

What testing is done?

❯ helm template aws-ebs-csi-driver ./charts/aws-ebs-csi-driver --set node.tolerateAllTaints=false > post.yml
❯ helm template aws-ebs-csi-driver ./charts/aws-ebs-csi-driver --set node.tolerateAllTaints=false > pre.yml

❯ diff pre.yml post.yml -C 5
*** pre.yml	2024-10-02 15:28:21.971993034 +0000
--- post.yml	2024-10-02 15:28:13.722043329 +0000
***************
*** 452,461 ****
--- 452,470 ----
        priorityClassName: system-node-critical
        tolerations:
          - effect: NoExecute
            operator: Exists
            tolerationSeconds: 300
+         - effect: NoSchedule
+           key: ToBeDeletedByClusterAutoscaler
+           operator: Exists
+         - effect: NoSchedule
+           key: karpenter.sh/disrupted
+           operator: Exists
+         - effect: NoSchedule
+           key: karpenter.sh/disruption
+           operator: Exists
          - key: "ebs.csi.aws.com/agent-not-ready"
            operator: "Exists"
        hostNetwork: false
        securityContext:
          fsGroup: 0

Also quick test that taint is tolerated on live cluster.

k8s-ci-robot · 2024-10-03T14:06:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andrewsirenko. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2024-10-03T14:08:05Z

Code Coverage Diff

This PR does not change the code coverage

helm: add recommended autoscalar Tolerations to driver DaemonSet

58bc419

k8s-ci-robot requested a review from ConnorJC3 October 3, 2024 14:06

k8s-ci-robot requested a review from torredil October 3, 2024 14:06

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

AndrewSirenko commented Oct 3, 2024

k8s-ci-robot commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

Are you sure you want to change the base?

helm: add recommended autoscalar Tolerations to driver DaemonSet #2165

Conversation

AndrewSirenko commented Oct 3, 2024

k8s-ci-robot commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

Code Coverage Diff