-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add checkpoint retention policy documentation
- Describe how to configure the checkpoint retention policy - Provide sample YAML for applying retention policies - Explain the hierarchy and specificity of policies Signed-off-by: Parthiba-Hazra <[email protected]>
- Loading branch information
1 parent
74a1ea0
commit 6a341a0
Showing
1 changed file
with
93 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
|
||
# Checkpoint Retention Policy Documentation | ||
|
||
## Overview | ||
|
||
The checkpoint retention policy in the CheckpointRestoreOperator allows users to manage and configure how checkpoints are retained and cleaned up within a Kubernetes cluster. This document provides an overview of how to configure these policies, the hierarchy of their application, and details about specific fields. | ||
|
||
## Applying a Retention Policy | ||
|
||
To apply a retention policy, you need to create a `CheckpointRestoreOperator` resource. Below is an example configuration: | ||
```yaml | ||
`apiVersion: criu.org/v1 | ||
kind: CheckpointRestoreOperator | ||
metadata: | ||
labels: | ||
app.kubernetes.io/name: checkpointrestoreoperator | ||
app.kubernetes.io/instance: checkpointrestoreoperator-sample | ||
app.kubernetes.io/part-of: checkpoint-restore-operator | ||
app.kubernetes.io/managed-by: kustomize | ||
app.kubernetes.io/created-by: checkpoint-restore-operator | ||
name: checkpointrestoreoperator-sample | ||
spec: | ||
checkpointDirectory: /var/lib/kubelet/checkpoints | ||
applyPoliciesImmediately: false | ||
globalPolicy: | ||
maxCheckpointsPerNamespace: 50 | ||
maxCheckpointsPerPod: 30 | ||
maxCheckpointsPerContainer: 10 | ||
# containerPolicies: | ||
# - namespace: namespace | ||
# pod: pod_name | ||
# container: container_name | ||
# maxCheckpoints: 5 | ||
# podPolicies: | ||
# - namespace: namespace | ||
# pod: pod_name | ||
# maxCheckpoints: 10 | ||
# namespacePolicies: | ||
# - namespace: namespace | ||
# maxCheckpoints: 15` | ||
``` | ||
A sample configuration file is available under `./config/samples/_v1_checkpointrestoreoperator.yaml`. | ||
|
||
## Understanding Policy Fields | ||
|
||
- `checkpointDirectory`: Specifies the directory where checkpoints are stored. | ||
- `applyPoliciesImmediately`: If set to `true`, the policies are applied immediately. If `false` (default value), they are applied after new checkpoint creation. | ||
- `globalPolicy`: Defines global checkpoint retention limits. | ||
- `maxCheckpointsPerNamespace`: Maximum number of checkpoints per namespace. | ||
- `maxCheckpointsPerPod`: Maximum number of checkpoints per pod. | ||
- `maxCheckpointsPerContainer`: Maximum number of checkpoints per container. | ||
- `containerPolicies` (optional): Specific retention policies for containers. | ||
- `namespace`: Namespace of the container. | ||
- `pod`: Pod name of the container. | ||
- `container`: Container name. | ||
- `maxCheckpoints`: Maximum number of checkpoints for the container. | ||
- `podPolicies` (optional): Specific retention policies for pods. | ||
- `namespace`: Namespace of the pod. | ||
- `pod`: Pod name. | ||
- `maxCheckpoints`: Maximum number of checkpoints for the pod. | ||
- `namespacePolicies` (optional): Specific retention policies for namespaces. | ||
- `namespace`: Namespace name. | ||
- `maxCheckpoints`: Maximum number of checkpoints for the namespace. | ||
|
||
## Policy Hierarchy and Specificity | ||
|
||
The CheckpointRestoreOperator uses a hierarchical approach to apply retention policies. Policies can be defined at different levels of specificity: | ||
|
||
1. **Global Policy:** Applies to all namespaces, pods, and containers if no more specific policy is defined. | ||
2. **Namespace Policy:** Applies to all pods and containers within a specific namespace. | ||
3. **Pod Policy:** Applies to all containers within a specific pod. | ||
4. **Container Policy:** Applies to a specific container within a specific pod and namespace. | ||
|
||
### Policy Application | ||
|
||
- **Global Policy:** If no other policies are defined, the global policy will be applied. In the example above, the global policy limits checkpoints to 50 per namespace, 30 per pod, and 10 per container. | ||
- **Namespace Policy:** If a namespace policy is defined, it overrides the global policy for that specific namespace. | ||
- **Pod Policy:** If a pod policy is defined, it overrides both the namespace and global policies for that specific pod. | ||
- **Container Policy:** If a container policy is defined, it is the most specific and overrides pod, namespace, and global policies for that specific container. | ||
|
||
### Example | ||
|
||
If a pod has a defined pod policy, but one of its containers has a defined container policy, the container policy will take precedence for that container. The pod policy will apply to the remaining containers within the pod. | ||
|
||
## Policy Deletion Mechanism | ||
|
||
When deleting checkpoints, the CheckpointRestoreOperator always **removes the oldest** checkpoints first. This ensures that the most recent checkpoints are retained, allowing for the most recent state of the resource to be restored if needed. | ||
|
||
## Conclusion | ||
|
||
By leveraging the hierarchical policy system, users can fine-tune checkpoint retention to meet their specific needs. More specific policies will always override less specific ones, ensuring that the most granular control is applied where needed. | ||
|
||
For more details, refer to the sample configuration file and experiment with different policy combinations to see how they interact within your Kubernetes cluster. |