Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] Record removed images periodically and easily accessible to operators #929

Open
1 task
ritazh opened this issue Dec 19, 2023 · 2 comments
Open
1 task
Labels
enhancement New feature or request

Comments

@ritazh
Copy link
Contributor

ritazh commented Dec 19, 2023

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

Currently, for removed images, Eraser record the removed images in logs and records the total count of removed images as a metric. In the event the pod log is gone, this information is gone as well. To make it easier for operators to troubleshoot, would be good to consider some other mechanism to record the removed images somewhere in the cluster periodically.

e.g. Record this as part of the status field of an eraser custom resource along with a timestamp. If etcd object size is a concern, then we can consider a configurable field for the maximum number of recorded removed images.

WDYT?

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.
@ritazh ritazh added the enhancement New feature or request label Dec 19, 2023
@sozercan
Copy link
Member

sozercan commented Dec 19, 2023

I want to understand the use case for this.

From the past reports, I have seen folks that wanted to see why an image did not get removed since the trivy used in eraser did not match another scanning tool results. A list of removed images will not be able to tell this story, and even more so if it's constrainted to a certain number of results. This information is not informative for non-removed images nor actionable.

Possible investigations:

  • Containerd logs might be able to provide list of removed images data
  • Might be more actionable to report why a non-running image didn't get removed (for example, no vulns that match the config criteria). Since this is a dynamic operation as vulns change, it might be useful to take a snapshot of this data.

@pmengelbert
Copy link
Contributor

pmengelbert commented Jan 17, 2024

I think a large part of the problem is that observing Eraser's behavior (or even that it worked) is currently too difficult. IMO the best thing to do is provide a report with all of the relevant information. How we provide that report is unclear, but what I would like to see is:

  • List of images found, grouped by node
  • Scan results:
    • Vulnerable
    • Failed
    • Passed
  • Final list of images received by the remover container
  • Final call to ListImages after removal to show the overall behavior
    • Will reveal failures to remove images (different from image being vulnerable or image failed to scan properly)

IMO having this information will not only benefit the end-user but will make eraser more robust as a whole. As a developer it's unwieldy to get this information (currently only available via debug logs, not aggregated in any way, etc). I'm 90% confident that gathering the above information into a report will reveal bugs we haven't noticed before. There are probably still scanning & removal issues because of the ImageID vs Manifest Digest issue that have been overlooked.

Finally, a report will make testing a lot easier. We can set up the initial state and define the exact end-result we require from Eraser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants