Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner Report resource deletion conditions #62

Open
ctrought opened this issue Apr 16, 2024 · 3 comments
Open

Cleaner Report resource deletion conditions #62

ctrought opened this issue Apr 16, 2024 · 3 comments
Assignees

Comments

@ctrought
Copy link

Is your feature request related to a problem? Please describe.
I'm looking for a way to expire objects based on a set of predefined conditions that can easily be applied to multiple Cleaner objects.

Describe the solution you'd like
Something native, TTL/expiration options exposed in the CRDs that the operator could decide when to delete objects. The options that would be nice (imo) are

  • TTL/occurences counter (relative to when object added to report)
  • Duration (relative to when object added to report)
  • Timestamp (absolute)
type ResourceDeleteConditions struct {
	// +kubebuilder:validation:Optional
	Occurrences int `json:"occurrences,omitempty"`

	// +kubebuilder:validation:Optional
	Timestamp *metav1.Time`json:"timestamp,omitempty"`

	// +kubebuilder:validation:Optional
	Duration *metav1.Duration`json:"duration,omitempty"`
}

TTL/Occurrences Counter: Configure a max counter in the cleaner. Objects that get written to the Report would have their own counter recorded used later to determine when the object can be deleted.

Every subsequent Cleaner run, the counter would be incremented by 1 if the resource still exists in the report. If the specific resources counter > the resourceDeleteConditions then the object would be deleted.

apiVersion: apps.projectsveltos.io/v1alpha1
kind: Report
metadata:
  name: cleaner-with-report
spec:
  action: Delete
  resourceDeleteConditions:
    occurrences: 30
  resourceInfo:
  - message: '. time: 2024-04-04 13:25:00.621990728 +0000 UTC m=+227862.774595097'
    firstOccurrence: "2024-04-02T00:25:03Z"
    occurrences: 20 
    resource:
      apiVersion: apps/v1
      kind: Deployment
      name: example
      namespace: example

Duration: Record the firstOccurrence timestamp in the resource info, once the duration has elapsed after subsequent runs the object is eligible for deletion and will be removed.

apiVersion: apps.projectsveltos.io/v1alpha1
kind: Report
metadata:
  name: cleaner-with-report
spec:
  action: Delete
  resourceDeleteConditions:
    duration: "720h0m0s"
  resourceInfo:
  - message: '. time: 2024-04-04 13:25:00.621990728 +0000 UTC m=+227862.774595097'
    firstOccurrence: "2024-04-02T00:25:03Z"
    occurrences: 20 
    resource:
      apiVersion: apps/v1
      kind: Deployment
      name: example
      namespace: example

Timestamp: Absolute, after the given timestamp the objects in the report are deleted.

apiVersion: apps.projectsveltos.io/v1alpha1
kind: Report
metadata:
  name: cleaner-with-report
spec:
  action: Delete
  resourceDeleteConditions:
    timestamp: "2024-04-30T00:00:00Z" # <- if the current time is past and the object is still in the report, it should be deleted.
  resourceInfo:
  - message: '. time: 2024-04-03 13:25:00.621990728 +0000 UTC m=+227862.774595097'
    firstOccurrence: "2024-04-02T00:25:03Z"
    occurrences: 20 
    resource:
      apiVersion: apps/v1
      kind: Deployment
      name: example
      namespace: example

In the proposed solution it would be possible to combine the conditions as well, where any conditions set must pass for the object in the report to be deleted.

Describe alternatives you've considered
Building in the logic to each Cleaner resource and replicating that to multiple Cleaner objects. There are some related examples here but are based on object age or pre-defined annotations on objects which cover a different set of needs than what I am looking for. Technically you could probably create a transform Cleaner to apply metadata to the object such as firstOccurrence or counter and then use that information in a separate Cleaner to delete resources based on duration or # of occurrences but it would be much more tedious and prone to errors.

I think conditional deletion is probably a pretty common use case, people often want some time to verify whether objects should really be deleted so they might use Scan first and when they are confident they may create a new cleaner or modify the existing to Delete. However when doing so there is no guarantee that the same set of objects as last run would be deleted and different ones may be in the next run. This would give the user an automated way to track & remove resources after specific time related conditions pass.

Additional context
I was thinking since the Cleaner Report already tracks objects, it would be relatively simple to store a bit of additional information in the resource list that can be used to determine when the object should be deleted.

@gianlucam76
Copy link
Owner

Thank you for the detailed description @ctrought

I like the timestamp idea as it requires no annotations/labels to be added to any objects:

  • Create a Cleaner instance with timestamp in the ResourceDeleteConditions;
  • Any resource which ends up being in a Report will be automatically deleted if current time is past timestamp.

I understand the TTL as well:

  • Create a Cleaner instance with TTL in the ResourceDeleteConditions;
  • Any resource which ends up being in a Report will be automatically deleted if it surpassed its specified time-to-live

I am not sure I get the use case for Occurrences. Or better there are scenarios where I can think of using (i.e, a deployment whose active replica is 0 while replicas is not, and it is in that state for many days for instance) but I also think those cases can be taken care (every time an object is a match, active replicas == 0 & != replicas, add an annotation or increment it then have another condition that deletes object is annotation value is higher than X).
The reason I am saying this is because Cleaner currently does not keep track of past actions (yes it generates a reports but it does not pull past reports to make a call on what it should do).

So I am in favour of implementing what you suggested but limiting it to Timestamp and TTL. What do you think?

Thank you

@gianlucam76 gianlucam76 self-assigned this Apr 17, 2024
@ctrought
Copy link
Author

ctrought commented Apr 17, 2024

Hi @gianlucam76 , thanks for your feedback! I'm sorry I may not have been clear with my explanation behind the intention behind the proposed conditions for my personal use case.

TTL/Occurrences:
In the context of the original ask, TTL & Occurrences are the same thing but I just wasn't sure what it should be called. Technically, speaking a TTL could be either a counter or a timestamp that limits the life of something and just depends on how it's implemented for the specific use case. The delete condition I am interested would rely on a counter that would either increase or decrease by 1 each time the resource is observed in the report (ie. if decreasing once it hits 0 the object is deleted, or if it were implemented as increasing from 1 once the counter hits the defined limit in the Cleaner it would get deleted). If the next scheduled run of the Cleaner a specific resource still exists, the counter would be increased. If that run includes new resources, the counter starts at 1. If a resource disappears from the scan then it would be treated as a new resource the next time if it appears in a future scan.

Duration:
There are two ways I guess one could look at this one. It could be interpreted as the duration of the object itself based on the creation date (ie. how long is it allowed to exist for), but the other way would be the duration the object exists in the report for (ie. relative to the time the object first appeared in scan). I think you were probably thinking of the former (lifespan of object) while I was originally thinking of the latter (duration of object in Cleaner Report). I think they are both valid use cases for DeleteResourceCondition though.

The rationale for enabling a condition for duration in report, would be similar to a TTL/occurrences counter in that it will let the resource appear in the report for a period of time before it gets deleted, and the time it lets the object live in the report for is always relative to when it was first added to the report. The way I see it, this lets you use one Cleaner instance that can run forever to handle the cleanup of objects but also gives a grace period to spot and correct any resources that one may not wish to be removed.

The reason I am saying this is because Cleaner currently does not keep track of past actions (yes it generates a reports but it does not pull past reports to make a call on what it should do).

Yep makes sense, it does not re-use data from an old report today. The proposal for the TTL/occurrences (counter) and duration/age might require new fields added to the Reports ResourceInfo, those being

  • occurrences (counter of the number of times resource appears in report)
  • firstOccurrence (the timestamp the resource first appeared in the report)
type ResourceInfo struct {
	// Resource identify a Kubernetes resource
	Resource corev1.ObjectReference `json:"resource,omitempty"`

	// FullResource contains full resources before
	// before Cleaner took an action on it
	// +optional
	FullResource []byte `json:"fullResource,omitempty"`

	// Message is an optional field.
	// +optional
	Message string `json:"message,omitempty"`

	// +kubebuilder:validation:Optional
	Occurrences int `json:"occurrences,omitempty"`

	// +kubebuilder:validation:Optional
	FirstOccurrence *metav1.Time`json:"firstOccurrence,omitempty"`
}

This would let Cleaner keep a basic history of the resource in the report. It would also require the way the report is generated to be changed slightly (but I don't think too much hopefully). Rather than overwriting the old report entirely, generateReportSpec could check an existing report (if it exists) and check to see if the resources in the new report are also in the existing report. If any resources are in the existing report, we copy the TTL counter/occurrences number and increment by one (or decrease, whichever method is preferred) and also retain the FirstOccurrence timestamp.

The naming for the conditions is important to remove any ambiguity, so any user would know the different between a TTL counter vs timestamp, etc. Hopefully I am not misinterpreting your own ideas but in summary I think there could probably be 4 different conditions?

  1. TTL Timestamp (ie. Delete all resources after X timestamp)
  2. TTL Counter (ie. Delete all resources once the TTL counter per resource exceeds the defined threshold)
  3. Object age in cluster
  4. Object age in report

Conditions 2 and 4 are dependent on metadata that would need to be tracked in the Cleaner Report. The rough idea would be something like below. The exact name of metadata fields may depend on what the DeleteResourceConditions are named.

// pass in a ref to existing report if it exists
func generateReportSpec(resources []ResourceResult, cleaner *appsv1alpha1.Cleaner, existingReport *appsv1alpha1.Report) *appsv1alpha1.ReportSpec {
	reportSpec := appsv1alpha1.ReportSpec{}
	reportSpec.Action = cleaner.Spec.Action
	message := fmt.Sprintf(". time: %v", time.Now())
	time := metav1.NewTime(time.Now())

	reportSpec.ResourceInfo = make([]appsv1alpha1.ResourceInfo, len(resources))
	for i := range resources {

		resourceInfo := appsv1alpha1.ResourceInfo{
			Resource: corev1.ObjectReference{
				Namespace:  resources[i].Resource.GetNamespace(),
				Name:       resources[i].Resource.GetName(),
				Kind:       resources[i].Resource.GetKind(),
				APIVersion: resources[i].Resource.GetAPIVersion(),
			},
			Message:         resources[i].Message + message,
			FirstOccurrence: time,
			Occurrences:     1,
		}

		// retain metadata for resource if it exists in an existing report
		if len(existingReport.Spec.ResourceInfo) > 0 {
			existingResource := getResourceInfoFromReport(resourceInfo.Resource, existingReport)
			if existingResource.Resource.Name != "" {
				resourceInfo.FirstOccurrence = existingResource.FirstOccurrence
				resourceInfo.Occurrences = existingResource.Occurrences + 1
			}
		}

		reportSpec.ResourceInfo[i] = resourceInfo
	}

	return &reportSpec
}

The updated report in-cluster would then have the necessary metadata saved for the DeleteResourceConditions to be handled. This also means these 2 DeleteResourceConditions require a Report to be generated so it can be referred to later on.

I can help draft a PR if you'd like, but let me know your thoughts on everything above.

@gianlucam76
Copy link
Owner

gianlucam76 commented Apr 18, 2024

Thank you @ctrought I get it now. Thanks for the explanation.

I am fine now with having this. I actually really like it. And if you can help taking care of it that would be fantastic.

I only have one comment. Right now, Report is generated only if notification type asks for it. With this proposal, we would generate a Report irrespective of that if the occurrence limit is set in the Cleaner.Spec for instance.

I am not saying we should add an extra CRD for this (it would be duplication). We should probably set a Failure in the Cleaner instance if Report is required because of the Cleaner configuration and NotificationType is not set to generate a Report.

We could also generate the Report irrespective when needed. But I am scared that many Report instances might be generated without user knowing and probably that is not OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants