Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Configurable Retention Period for Delta Snapshots #640

29 changes: 29 additions & 0 deletions doc/usage/garbage_collection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Garbage Collection (GC) Feature

The etcd-backup-restore project incorporates a Garbage Collection (GC) feature designed to manage storage space effectively by systematically discarding older backups. The [`RunGarbageCollector`](pkg/snapshot/snapshotter/garbagecollector.go) function controls this process, marking older backups as disposable and subsequently removing them based on predefined rules.

## GC Policies

Garbage Collection policies fall into two categories, each of which can be configured with appropriate flags:

1. **Exponential Policy**: This policy operates on the principle of retaining the most recent snapshots and discarding older ones, based on the age and capture time of the snapshots. You can configure this policy with the following flag: `--garbage-collection-policy='Exponential'`. The garbage collection process under this policy unfolds as follows:

- The most recent full snapshot and its associated delta snapshots are perpetually retained, irrespective of the `delta-snapshot-retention-period` setting. This mechanism is vital for potential data recovery.
- All delta snapshots that fall within the `delta-snapshot-retention-period` are preserved.
- Full snapshots are retained for the current hour.
- For the past 24 hours, the most recent full snapshot from each hour is kept.
- For the past week (up to 7 days), the most recent full snapshot from each day is kept.
- For the past month (up to 4 weeks), the most recent full snapshot from each week is kept.
- Full snapshots older than 5 weeks are discarded.

2. **Limit-Based Policy**: This policy aims to keep the snapshot count under a specific limit, as determined by the configuration. The policy prioritizes retaining recent snapshots and eliminating older ones. You can configure this policy with the following flags: `--max-backups=10` and `--garbage-collection-policy='LimitBased'`. The garbage collection process under this policy unfolds as follows:

- The most recent full snapshot and its associated delta snapshots are always retained, regardless of the `delta-snapshot-retention-period` setting. This is essential for potential data recovery.
seshachalam-yv marked this conversation as resolved.
Show resolved Hide resolved
- All delta snapshots that fall within the `delta-snapshot-retention-period` are preserved.
- Full snapshots are retained up to the limit set in the configuration. Any full snapshots beyond this limit are removed.

## Retention Period for Delta Snapshots

The `delta-snapshot-retention-period` setting determines the retention period for older delta snapshots. It does not include the most recent set of snapshots, which are always retained to ensure data safety. The default value for this configuration is 0.

> **Note**: In both policies, the garbage collection process includes listing the snapshots, identifying those that meet the deletion criteria, and then removing them. The deletion operation encompasses the removal of associated chunks, which form parts of a larger snapshot.
54 changes: 34 additions & 20 deletions pkg/snapshot/snapshotter/garbagecollector.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ func (ssr *Snapshotter) RunGarbageCollector(stopCh <-chan struct{}) {
nextSnap := snapList[snapStreamIndexList[snapStreamIndex-1]]

// garbage collect delta snapshots.
deletedSnap, err := ssr.garbageCollectDeltaSnapshots(snapList[snapStreamIndexList[snapStreamIndex-1]:snapStreamIndexList[snapStreamIndex]])
deletedSnap, err := ssr.GarbageCollectDeltaSnapshots(snapList[snapStreamIndexList[snapStreamIndex-1]:snapStreamIndexList[snapStreamIndex]])
total += deletedSnap
if err != nil {
continue
Expand Down Expand Up @@ -150,7 +150,7 @@ func (ssr *Snapshotter) RunGarbageCollector(stopCh <-chan struct{}) {
// Delete delta snapshots in all snapStream but the latest one.
// Delete all snapshots beyond limit set by ssr.maxBackups.
for snapStreamIndex := 0; snapStreamIndex < len(snapStreamIndexList)-1; snapStreamIndex++ {
deletedSnap, err := ssr.garbageCollectDeltaSnapshots(snapList[snapStreamIndexList[snapStreamIndex]:snapStreamIndexList[snapStreamIndex+1]])
deletedSnap, err := ssr.GarbageCollectDeltaSnapshots(snapList[snapStreamIndexList[snapStreamIndex]:snapStreamIndexList[snapStreamIndex+1]])
total += deletedSnap
if err != nil {
continue
Expand Down Expand Up @@ -213,24 +213,38 @@ func garbageCollectChunks(store brtypes.SnapStore, snapList brtypes.SnapList, lo
}
}

// garbageCollectDeltaSnapshots deletes only the delta snapshots from revision sorted <snapStream>. It won't delete the full snapshot
// in snapstream which supposed to be at index 0 in <snapStream>.
func (ssr *Snapshotter) garbageCollectDeltaSnapshots(snapStream brtypes.SnapList) (int, error) {
total := 0
for i := len(snapStream) - 1; i > 0; i-- {
if (*snapStream[i]).Kind != brtypes.SnapshotKindDelta {
continue
}
snapPath := path.Join(snapStream[i].SnapDir, snapStream[i].SnapName)
ssr.logger.Infof("GC: Deleting old delta snapshot: %s", snapPath)
if err := ssr.store.Delete(*snapStream[i]); err != nil {
ssr.logger.Warnf("GC: Failed to delete snapshot %s: %v", snapPath, err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindDelta, metrics.LabelSucceeded: metrics.ValueSucceededFalse}).Inc()
return total, err
/*
GarbageCollectDeltaSnapshots traverses the list of snapshots and removes delta snapshots that are older than the retention period specified in the Snapshotter's configuration.

Parameters:

snapStream brtypes.SnapList - List of snapshots to perform garbage collection on.

Returns:

int - Total number of delta snapshots deleted.
error - Error information, if any error occurred during the garbage collection. Returns 'nil' if operation is successful.
*/
func (ssr *Snapshotter) GarbageCollectDeltaSnapshots(snapStream brtypes.SnapList) (int, error) {
totalDeleted := 0
cutoffTime := time.Now().UTC().Add(-ssr.config.DeltaSnapshotRetentionPeriod.Duration)
for i := len(snapStream) - 1; i >= 0; i-- {
if (*snapStream[i]).Kind == brtypes.SnapshotKindDelta && snapStream[i].CreatedOn.Before(cutoffTime) {
snapPath := path.Join(snapStream[i].SnapDir, snapStream[i].SnapName)
ssr.logger.Infof("GC: Deleting old delta snapshot: %s", snapPath)

if err := ssr.store.Delete(*snapStream[i]); err != nil {
ssr.logger.Warnf("GC: Failed to delete snapshot %s: %v", snapPath, err)
metrics.SnapshotterOperationFailure.With(prometheus.Labels{metrics.LabelError: err.Error()}).Inc()
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindDelta, metrics.LabelSucceeded: metrics.ValueSucceededFalse}).Inc()

return totalDeleted, err
seshachalam-yv marked this conversation as resolved.
Show resolved Hide resolved
}

metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindDelta, metrics.LabelSucceeded: metrics.ValueSucceededTrue}).Inc()
totalDeleted++
}
metrics.GCSnapshotCounter.With(prometheus.Labels{metrics.LabelKind: brtypes.SnapshotKindDelta, metrics.LabelSucceeded: metrics.ValueSucceededTrue}).Inc()
total++
}
return total, nil

return totalDeleted, nil
}
Loading