Skip to content

Commit

Permalink
docs: Add checkpoint retention policy documentation
Browse files Browse the repository at this point in the history
- Describe how to configure the checkpoint retention policy
- Provide sample YAML for applying retention policies
- Explain the hierarchy and specificity of policies

Signed-off-by: Parthiba-Hazra <[email protected]>
  • Loading branch information
Parthiba-Hazra committed Jul 27, 2024
1 parent 2dd7541 commit ec2b390
Showing 1 changed file with 93 additions and 0 deletions.
93 changes: 93 additions & 0 deletions docs/retention_policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@

# Checkpoint Retention Policy Documentation

## Overview

The checkpoint retention policy in the CheckpointRestoreOperator allows users to manage and configure how checkpoints are retained and cleaned up within a Kubernetes cluster. This document provides an overview of how to configure these policies, the hierarchy of their application, and details about specific fields.

## Applying a Retention Policy

To apply a retention policy, you need to create a `CheckpointRestoreOperator` resource. Below is an example configuration:
```yaml
`apiVersion: criu.org/v1
kind: CheckpointRestoreOperator
metadata:
labels:
app.kubernetes.io/name: checkpointrestoreoperator
app.kubernetes.io/instance: checkpointrestoreoperator-sample
app.kubernetes.io/part-of: checkpoint-restore-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: checkpoint-restore-operator
name: checkpointrestoreoperator-sample
spec:
checkpointDirectory: /var/lib/kubelet/checkpoints
applyPoliciesImmediately: false
globalPolicy:
maxCheckpointsPerNamespace: 50
maxCheckpointsPerPod: 30
maxCheckpointsPerContainer: 10
# containerPolicies:
# - namespace: namespace
# pod: pod_name
# container: container_name
# maxCheckpoints: 5
# podPolicies:
# - namespace: namespace
# pod: pod_name
# maxCheckpoints: 10
# namespacePolicies:
# - namespace: namespace
# maxCheckpoints: 15`
```
A sample configuration file is available under `./config/samples/_v1_checkpointrestoreoperator.yaml`.

## Understanding Policy Fields

- `checkpointDirectory`: Specifies the directory where checkpoints are stored.
- `applyPoliciesImmediately`: If set to `true`, the policies are applied immediately. If `false`(default value), they are applied after new checkpoint creation.
- `globalPolicy`: Defines global checkpoint retention limits.
- `maxCheckpointsPerNamespace`: Maximum number of checkpoints per namespace.
- `maxCheckpointsPerPod`: Maximum number of checkpoints per pod.
- `maxCheckpointsPerContainer`: Maximum number of checkpoints per container.
- `containerPolicies`: (Optional) Specific retention policies for containers.
- `namespace`: Namespace of the container.
- `pod`: Pod name of the container.
- `container`: Container name.
- `maxCheckpoints`: Maximum number of checkpoints for the container.
- `podPolicies`: (Optional) Specific retention policies for pods.
- `namespace`: Namespace of the pod.
- `pod`: Pod name.
- `maxCheckpoints`: Maximum number of checkpoints for the pod.
- `namespacePolicies`: (Optional) Specific retention policies for namespaces.
- `namespace`: Namespace name.
- `maxCheckpoints`: Maximum number of checkpoints for the namespace.

## Policy Hierarchy and Specificity

The CheckpointRestoreOperator uses a hierarchical approach to apply retention policies. Policies can be defined at different levels of specificity:

1. **Global Policy:** Applies to all namespaces, pods, and containers if no more specific policy is defined.
2. **Namespace Policy:** Applies to all pods and containers within a specific namespace.
3. **Pod Policy:** Applies to all containers within a specific pod.
4. **Container Policy:** Applies to a specific container within a specific pod and namespace.

### Policy Application

- **Global Policy:** If no other policies are defined, the global policy will be applied. In the example above, the global policy limits checkpoints to 50 per namespace, 30 per pod, and 10 per container.
- **Namespace Policy:** If a namespace policy is defined, it overrides the global policy for that specific namespace.
- **Pod Policy:** If a pod policy is defined, it overrides both the namespace and global policies for that specific pod.
- **Container Policy:** If a container policy is defined, it is the most specific and overrides pod, namespace, and global policies for that specific container.

### Example

If a pod has a defined pod policy, but one of its containers has a defined container policy, the container policy will take precedence for that container. The pod policy will apply to the remaining containers within the pod.

## Policy Deletion Mechanism

When deleting checkpoints, the CheckpointRestoreOperator always **removes the oldest** checkpoints first. This ensures that the most recent checkpoints are retained, allowing for the most recent state of the resource to be restored if needed.

## Conclusion

By leveraging the hierarchical policy system, users can fine-tune checkpoint retention to meet their specific needs. More specific policies will always override less specific ones, ensuring that the most granular control is applied where needed.

For more details, refer to the sample configuration file and experiment with different policy combinations to see how they interact within your Kubernetes cluster.

0 comments on commit ec2b390

Please sign in to comment.