Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs with the new remediationStrategy spec #145

Merged
merged 7 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ The CR includes the following parameters:
* `retrycount` - number of times to retry the fence agent in case of failure. The default is 5.
* `retryinterval` - interval between retries in seconds. The default is "5s".
* `timeout` - timeout for the fence agent in seconds. The default is "60s".
* `remediationStrategy` - either `OutOfServiceTaint` or `ResourceDeletion`:
* `OutOfServiceTaint`: This remediation strategy implicitly causes the deletion of the pods and the detachment of the associated volumes on the node. It achieves this by placing the [`OutOfServiceTaint` taint](https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service) on the node.
* `ResourceDeletion`: This remediation strategy deletes the pods on the node.

The FenceAgentsRemediation CR is created by the administrator and is used to trigger the fence agent on a specific node. The CR includes an *agent* field for the fence agent name, *sharedparameters* field with all the shared, not specific to a node, parameters, and a *nodeparameters* field to specify the parameters for the fenced node.
For better understanding please see the below example of FenceAgentsRemediation CR for node `worker-1` (see it also as the [sample FAR](https://github.com/medik8s/fence-agents-remediation/blob/main/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml)):
Expand Down Expand Up @@ -220,6 +223,7 @@ spec:
worker-0: "6233"
worker-1: "6234"
worker-2: "6235"
remediationStrategy: ResourceDeletion
```

## Tests
Expand Down
1 change: 1 addition & 0 deletions api/v1alpha1/fenceagentsremediation_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ type FenceAgentsRemediationSpec struct {
// that enables automatic deletion of pv-attached pods on failed nodes, "out-of-service" taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version 4.13+.
// +kubebuilder:default:="ResourceDeletion"
// +kubebuilder:validation:Enum=ResourceDeletion;OutOfServiceTaint
// +operator-sdk:csv:customresourcedefinitions:type=spec
RemediationStrategy RemediationStrategyType `json:"remediationStrategy,omitempty"`
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ metadata:
"worker-2": "6235"
}
},
"remediationStrategy": "ResourceDeletion",
"retrycount": 5,
"retryinterval": "5s",
"sharedparameters": {
Expand Down Expand Up @@ -83,6 +84,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: retrycount
Expand Down Expand Up @@ -129,6 +140,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: template.spec.nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: template.spec.remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: template.spec.retrycount
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: retrycount
Expand Down Expand Up @@ -85,6 +95,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: template.spec.nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: template.spec.remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: template.spec.retrycount
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ spec:
worker-0: "6233"
worker-1: "6234"
worker-2: "6235"
remediationStrategy: ResourceDeletion