Skip to content

Commit

Permalink
Add two usage options for FAR in README
Browse files Browse the repository at this point in the history
With NHC or standalone
  • Loading branch information
razo7 committed Mar 19, 2023
1 parent c5e2d3c commit 59ee242
Showing 1 changed file with 85 additions and 0 deletions.
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,91 @@ Then, run `operator-sdk run bundle quay.io/medik8s/fence-agents-remediation-oper
* Follow OLM's [instructions](https://sdk.operatorframework.io/docs/building-operators/golang/tutorial/#configure-the-operators-image-registry) on how to configure the operator's image reistry (build and push the operator container).
* Run FAR using one the [suggested options from OLM](https://sdk.operatorframework.io/docs/building-operators/golang/tutorial/#run-the-operator) to run it locally, in the cluster, and in the cluster using bundle container (similar to the [above installation](#deploy-the-latest-version)).

## Usage

FAR is recommended for using with NHC to create a complete solution for unhealty nodes, since NHC detects unhelthy nodes and creates an extrenal remediation CR, e.g., FAR's CR, for unhealthy nodes.
This automated way is preferable as it gives the responsibily on FAR CRs (creation and deletion) to NHC, even though FAR can also act as standalone remediator, but it with expense from the administrator to create and delete CRs.

Either way a user must be familier with fence agent to be used - Knowing it's parameters and any other requirements on the cluster (e.g., fence_ipmilan needs machines that support IPMI).

### FAR with NHC

* Install FAR using one of the above options ([Installation](#installation)).

* Load the yaml manifest of the FAR template (see below).

* Modify NodeHealthCheck CR to use FAR as it's remediator -
This is basically a specific use case of an External Remediation of NodeHealthCheck.
In order to set it up, please make sure that Node Health Check is running, FAR controller exists and then creates the necessary CRs (*FenceAgentsRemediationTemplate* and then *NodeHealthCheck*).

#### Example CRs

The FAR template, `FenceAgentsRemediationTemplate`, CR is created by the administrator and is used as a template by NHC for creating the NHC CR that represent a request for a Node to be recovered.
For better understanding please see the below example of FAR template object (see it also as the [sample FAR template](https://github.com/medik8s/fence-agents-remediation/blob/main/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediationtemplate.yaml)):

```yaml
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
name: fenceagentsremediationtemplate-default
spec:
template: {}
```
> *Note*: FenceAgentsRemediationTemplate CR must be created in the same namespace that FAR operator has been installed.
Configuring NodeHealthCheck to use the example `fenceagentsremediationtemplate-default` template above.

```yaml
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: nodehealthcheck-sample
spec:
remediationTemplate:
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
name: fenceagentsremediationtemplate-default
namespace: default
```

NHC creates FAR CR using FAR Template after it detects an unhelathy node (according to NHC unhealthy conditions).
FAR CRs are deleted by NHC after it sees the Node is healthy again.

### Standalone FAR

* Install FAR using one of the above options ([Installation](#installation)).

* Create FAR CR using the name of the node to be remediated, and the fence-agent parameters.

#### Example CR

The FAR, `FenceAgentsRemediation`, CR is created by the admin and is used to trigger the fence-agent on a specific node. The CR includes an *agent* field for the fence-agent name, *sharedparameters* field with all the shared, not specific to a node, parameters, and *nodeparameters* field to specify the parameters for the fenced node.
For better understanding please see the below example of FAR CR for node `worker-1` (see it also as the [sample FAR](https://github.com/medik8s/fence-agents-remediation/blob/main/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml)):

```yaml
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediation
metadata:
name: worker-1
spec:
agent: fence_ipmilan
sharedparameters:
--username: "admin"
--password: "password"
--lanplus: ""
--action: "reboot"
--ip: "192.168.111.1"
nodeparameters:
--ipport:
master-0: "6230"
master-1: "6231"
master-2: "6232"
worker-0: "6233"
worker-1: "6234"
worker-2: "6235"
```

## Tests

### Run code checks and unit tests
Expand Down

0 comments on commit 59ee242

Please sign in to comment.