Skip to content

Commit

Permalink
Add Startup Taint Removal Feature
Browse files Browse the repository at this point in the history
Implements a feature to remove a taint on driver startup to alleviate
potential race conditions. Supercedes #1581, all credit for the design
and initial implementation to @gtxu.

Co-authored-by: Gengtao Xu <[email protected]>
Signed-off-by: Connor Catlett <[email protected]>
  • Loading branch information
ConnorJC3 and gtxu committed May 3, 2023
1 parent f8fa5e4 commit 2695b34
Show file tree
Hide file tree
Showing 9 changed files with 1,508 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ metadata:
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get"]
verbs: ["get", "patch"]
2 changes: 2 additions & 0 deletions charts/aws-ebs-csi-driver/templates/node.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ spec:
{{- with .Values.node.tolerations }}
{{- toYaml . | nindent 8 }}
{{- end }}
- key: "ebs.csi.aws.com/agent-not-ready"
operator: "Exists"
{{- end }}
{{- with .Values.node.securityContext }}
securityContext:
Expand Down
5 changes: 5 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ kubectl create secret generic aws-secret \
### Configure driver toleration settings
By default, the driver controller tolerates taint `CriticalAddonsOnly` and has `tolerationSeconds` configured as `300`; and the driver node tolerates all taints. If you don't want to deploy the driver node on all nodes, please set Helm `Value.node.tolerateAllTaints` to false before deployment. Add policies to `Value.node.tolerations` to configure customized toleration for nodes.

### Configure node startup taint
There are potential race conditions on node startup (especially when a node is first joining the cluster) where pods/processes that rely on the EBS CSI Driver can act on a node before the EBS CSI Driver is able to startup up and become fully ready. To combat this, the EBS CSI Driver contains a feature to automatically remove a taint from the node on startup. Users can taint their nodes when they join the cluster and/or on startup, to prevent other pods from running and/or being scheduled on the node prior to the EBS CSI Driver becoming ready.

This feature is activated by default, and cluster administrators should use the taint `ebs.csi.aws.com/agent-not-ready:NoExecute` (any effect will work, but `NoExecute` is recommended). For example, EKS Managed Node Groups [support automatically tainting nodes](https://docs.aws.amazon.com/eks/latest/userguide/node-taints-managed-node-groups.html).

### Deploy driver
You may deploy the EBS CSI driver via Kustomize, Helm, or as an [Amazon EKS managed add-on](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html).

Expand Down
4 changes: 4 additions & 0 deletions hack/update-gomock
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ mockgen -package cloud -destination=./pkg/cloud/mock_metadata.go -source pkg/clo
mockgen -package driver -destination=./pkg/driver/mock_mount.go -source pkg/driver/mount.go
mockgen -package mounter -destination=./pkg/mounter/mock_mount_windows.go -source pkg/mounter/safe_mounter_windows.go

# Reflection-based mocking for external dependencies
mockgen -package driver -destination=./pkg/driver/mock_k8s_client.go -mock_names='Interface=MockKubernetesClient' k8s.io/client-go/kubernetes Interface
mockgen -package driver -destination=./pkg/driver/mock_k8s_corev1.go k8s.io/client-go/kubernetes/typed/core/v1 CoreV1Interface,NodeInterface

# Fixes "Mounter Type cannot implement 'Mounter' as it has a non-exported method and is defined in a different package"
# See https://github.com/kubernetes/mount-utils/commit/a20fcfb15a701977d086330b47b7efad51eb608e for context.
sed -i '/type MockMounter struct {/a \\tmount_utils.Interface' pkg/driver/mock_mount.go
Expand Down
6 changes: 6 additions & 0 deletions pkg/driver/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,9 @@ var (
FSTypeNtfs: {},
}
)

// constants for node k8s API use
const (
// AgentNotReadyTaintKey contains the key of taints to be removed on driver startup
AgentNotReadyNodeTaintKey = "ebs.csi.aws.com/agent-not-ready"
)
Loading

0 comments on commit 2695b34

Please sign in to comment.