Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs with the new remediationStrategy spec #145

Merged
merged 7 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ The CR includes the following parameters:
* `retrycount` - number of times to retry the fence agent in case of failure. The default is 5.
* `retryinterval` - interval between retries in seconds. The default is "5s".
* `timeout` - timeout for the fence agent in seconds. The default is "60s".
* `remediationStrategy` - either `ResourceDeletion` or `OutOfServiceTaint`:
* `ResourceDeletion`: This remediation strategy removes the pods on the node.
Copy link
Member

@slintes slintes May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's me only, but why don't we call things how they are called already? 🤔
Here: we do not remove anything, we delete them... that's the actual method name we use. That's the D in CRUD. 🤷🏼‍♂️ (Same below)

(and I don't block on this, just a general thought, which I had more often at various places)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

in case you want to change...
/hold

Copy link
Contributor Author

@clobrano clobrano May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we call things how they are called already

for me, I think it's just an automatic mechanism to avoid repetitions in written documents, but indeed when explaining something this habit can be limited :)

* `OutOfServiceTaint`: This remediation strategy implicitly causes the removal of the pods and associated volume attachments on the node. It achieves this by placing the [`OutOfServiceTaint` taint](https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service) on the node.

The FenceAgentsRemediation CR is created by the administrator and is used to trigger the fence agent on a specific node. The CR includes an *agent* field for the fence agent name, *sharedparameters* field with all the shared, not specific to a node, parameters, and a *nodeparameters* field to specify the parameters for the fenced node.
For better understanding please see the below example of FenceAgentsRemediation CR for node `worker-1` (see it also as the [sample FAR](https://github.com/medik8s/fence-agents-remediation/blob/main/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml)):
Expand Down Expand Up @@ -220,6 +223,7 @@ spec:
worker-0: "6233"
worker-1: "6234"
worker-2: "6235"
remediationStrategy: ResourceDeletion
```

## Tests
Expand Down
1 change: 1 addition & 0 deletions api/v1alpha1/fenceagentsremediation_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ type FenceAgentsRemediationSpec struct {
// that enables automatic deletion of pv-attached pods on failed nodes, "out-of-service" taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version 4.13+.
// +kubebuilder:default:="ResourceDeletion"
// +kubebuilder:validation:Enum=ResourceDeletion;OutOfServiceTaint
// +operator-sdk:csv:customresourcedefinitions:type=spec
RemediationStrategy RemediationStrategyType `json:"remediationStrategy,omitempty"`
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ metadata:
"worker-2": "6235"
}
},
"remediationStrategy": "ResourceDeletion",
"retrycount": 5,
"retryinterval": "5s",
"sharedparameters": {
Expand Down Expand Up @@ -83,6 +84,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: retrycount
Expand Down Expand Up @@ -129,6 +140,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: template.spec.nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: template.spec.remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: template.spec.retrycount
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: retrycount
Expand Down Expand Up @@ -85,6 +95,16 @@ spec:
node that is fenced, since they are node specific
displayName: Node Parameters
path: template.spec.nodeparameters
- description: RemediationStrategy is the remediation method for unhealthy nodes.
Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion".
ResourceDeletion will iterate over all pods related to the unhealthy node
and delete them. OutOfServiceTaint will add the out-of-service taint which
is a new well-known taint "node.kubernetes.io/out-of-service" that enables
automatic deletion of pv-attached pods on failed nodes, "out-of-service"
taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version
4.13+.
displayName: Remediation Strategy
path: template.spec.remediationStrategy
- description: RetryCount is the number of times the fencing agent will be executed
displayName: Retry Count
path: template.spec.retrycount
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ spec:
worker-0: "6233"
worker-1: "6234"
worker-2: "6235"
remediationStrategy: ResourceDeletion