Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(report): Replacing source_location in github report when scanning an image #5999

Merged
merged 8 commits into from
Feb 22, 2024

Conversation

Maxim-Durand
Copy link
Contributor

@Maxim-Durand Maxim-Durand commented Jan 24, 2024

Description

Relates to #5998

Fixes #6008

The goal is to change the source_location of the manifest in SBOM report so it provides useful information when displayed in Github Dependency.

The current solution is to use <image_registry>/<image_name>:<tag>@sha256:<image_hash> format.

Before changes

As you can see instead of specifying the image scanned, it only show Python.
Screenshot_20240120_174723

After changes

Screenshot_20240120_174554

The report now shows the image name and tag as the source_location for all packages.

"manifests": {
    "Python": {
      "name": "python-pkg",
      "file": {
        "source_location": "redacted_registry/accesspod:latest"
      },

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

SrcLocation: result.Target,
if report.ArtifactType == "container_image" {
// Replacing `source_location` by the image name and tag
image_reference := strings.Join(report.Metadata.RepoTags, ", ")
Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using report.ArtifactName instead of report.Metadata.RepoTags would miss the image tag.

Copy link

@RichardoC RichardoC Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use "RepoDigests" instead, as that included both the tag and digest? That would address the comment I just put on the PR discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm using RepoDigests would work I did some testing and it seems to be working fine.

Each time you scan your image with a new sha it will close all the previously open alerts for that image and only keep the latest version open.
But you're still able to see the previously closed alerts by simply using the is:closed filter and searching for a specific manifest works fine too.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mix of suggestion at #5999 (comment) and yours would give us the longest form possible storing all information:

<image_registry>/<image_name>:<tag>@sha256:<image_hash> - <filePath_to_package>

Screenshot_20240202_094700

I tested it and although it's very long, filtering seems to work too (the image_hash is fake).

Important

One big drawback of using DmitriyLewen@8431664 is that since each vulnerable package has its own source_location set using its FilePath, it means there is no more grouping in Github.
So, if you scan an image with 10 vulnerable packages, Github will show 10 different manifest instead of 1.
To me this defeats the purpose, and that's why I would advise to only use <image_registry>/<image_name>:<tag>@sha256:<image_hash>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, showing the package location (while having those disadvantages) will help with investigations about what actually needs to be fixed/updated

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Maxim-Durand
Thanks for your work!

I left some comments. Take a look, please.

Can you also update tests, please

@@ -105,8 +105,17 @@ func (w Writer) Write(ctx context.Context, report types.Report) error {
manifest.Name = string(result.Type)
// show path for language-specific packages only
if result.Class == types.ClassLangPkg {
manifest.File = &File{
SrcLocation: result.Target,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aggregate some packages to 1 result -

func aggregate(detail *ftypes.ArtifactDetail) {
var apps []ftypes.Application
aggregatedApps := map[ftypes.LangType]*ftypes.Application{
ftypes.PythonPkg: {Type: ftypes.PythonPkg},
ftypes.CondaPkg: {Type: ftypes.CondaPkg},
ftypes.GemSpec: {Type: ftypes.GemSpec},
ftypes.NodePkg: {Type: ftypes.NodePkg},
ftypes.Jar: {Type: ftypes.Jar},
}
for _, app := range detail.Applications {
a, ok := aggregatedApps[app.Type]
if !ok {
apps = append(apps, app)
continue
}
a.Libraries = append(a.Libraries, app.Libraries...)
}
for _, app := range aggregatedApps {
if len(app.Libraries) > 0 {
apps = append(apps, *app)
}
}
// Overwrite Applications
detail.Applications = apps
}

But we can find path from Results.Packages.FilePath:

{
      "Target": "Python",
      "Class": "lang-pkgs",
      "Type": "python-pkg",
      "Packages": [
        {
          "Name": "pip",
          "Version": "23.2.1",
          "Licenses": [
            "MIT"
          ],
          "Layer": {
            "DiffID": "sha256:a7d483b6066e2ed6c9cd1d5890a93326e3f192ef6e537c046296053fa547a844"
          },
          "FilePath": "usr/local/lib/python3.12/site-packages/pip-23.2.1.dist-info/METADATA"
        },

Maybe we just use FilePath for these packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with you it's an interesting change as it would make the manifest more precise.

Nonetheless, I think the full FilePath would be too much to render in Github Dependency and would bloat the UI.
I can do a test with that change if you want and show the results here ?

Otherwise we could try to extract important values from the FilePath and adding them to result.Target, but that seems complex and prone to errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other languages we already use filePath (filePath is stored in target):

➜  5999 trivy -d fs -f github .  | jq .manifests
...
{
  "foo/bar/requirements.txt": {
    "name": "pip",
    "file": {
      "source_location": "foo/bar/requirements.txt"
    },
    "resolved": {
      "click": {
        "package_url": "pkg:pypi/[email protected]",
        "relationship": "direct",
        "scope": "runtime"
      }
    }
  }
}

And we didn't get issues about UI.

I can do a test with that change if you want and show the results here ?

Yes please. If you have time, check it out.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have a few things to report:

  1. As you pointed out trivy aggregates multiple packages in the same "App" (for instance Python or Node) into one result.
    The issue is that in the sbom format, the source_location is set at the manifest level not at the package level (resolved key in sbom format). Meaning, if we keep the default trivy aggregation then we have to use only one of all the packages FilePath. How to define that value is either arbitrary, or we need some sort of heuristic ?

  2. In terms of UI it isn't as bloated as I expected, so I think you're right and it's ok to use.
    Screenshot_20240126_150104
    The source_location used in the above screenshot is simply the FilePath of the last package in result.Packages so it's not even correct.

  3. In the case trivy is scanning an image I think it's still pretty useless to replace the manifest's source_location with this FilePath value and we should instead reference the image's name and tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, if we keep the default trivy aggregation then we have to use only one of all the packages FilePath. How to define that value is either arbitrary, or we need some sort of heuristic ?

We have already seen cases where it is necessary to remove aggregation, but this requires large changes and we are not sure about it.
I created a small function to get filePath for each package - DmitriyLewen@8431664
Can you take a look?

In the case trivy is scanning an image I think it's still pretty useless to replace the manifest's source_location with this FilePath value and we should instead reference the image's name and tag.

This makes sense for UI. But what worries me is case where user only knows that image contains vulnerable package, but doesn't have filePath to find and update that package. It would be great to add filePath somehow.
e.g. we can use <image_name>:<tag> - <filePath> format for manifest map key. Is it possible to see gsbom file after uploading to GitHub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can see in Github Dependency UI no, see screenshot below to see how it looks.

Screenshot_20240208_065410

Copy link
Contributor

@DmitriyLewen DmitriyLewen Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So user can only get package metadata when taking snapshot, right?

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they use trivy then there's no issue as the user will now be able to find the file path for each package by parsing the JSON report before sending it to GitHub.

But if they use trivy-action then the report is sent automatically and the user doesn't have access to the original report hence he wouldn't be able to parse it.
Maybe if the user then export it from GitHub, the metadata tag with each package file path will still be there, I'll test that.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I got from github export to SBOM for the same package as in #5999 (comment):

{
      "SPDXID": "SPDXRef-pip-pillow-10.1.0",
      "name": "pip:pillow",
      "versionInfo": "10.1.0",
      "downloadLocation": "NOASSERTION",
      "filesAnalyzed": false,
      "licenseConcluded": "CAL-1.0",
      "supplier": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:pypi/[email protected]",
          "referenceType": "purl"
        }
      ]
    },

As you can see the metadata tag doesn't show up in the exported SBOM.

Important

Conclusion: users of trivy-action would currently have no way of finding out the filePath of each package when using the github format.

As this issue only involves trivy-action and not trivy itself anymore, I think we should continue this discussion once this PR is merged to find a solution for trivy-action users at aquasecurity/trivy-action#286
(I would recommend to add an option for upload as an artifact so that user can easily fetch the report).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created aquasecurity/trivy-action#307 to make sure users of trivy-action will know where to find the missing details from Github Dependency.

@DmitriyLewen and @knqyf263 do you agree with this compromise ?
If so, can you review this PR code ?

pkg/report/github/github.go Outdated Show resolved Hide resolved
@Maxim-Durand

This comment was marked as resolved.

@Maxim-Durand Maxim-Durand changed the title fix(image): Replacing source_location in github report when scanning an image refactor(template): Replacing source_location in github report when scanning an image Jan 26, 2024
@RichardoC
Copy link

This will be very handy, to give an example I do nightly scans of "nightly" containers, and the most recently release of my project kube-audit-rest and it's not possible to tell them apart which means I'm not sure if something's a regression or not

Given aquasecurity/trivy-action#279 (comment) I think it should also include the digest e.g. example.com/my-image:my-tag@sha256-fake-hash so that it's possible to tell which version of an image was scanned, as the image could have been retagged

Below is an example screenshot showing how it's not possible to tell them apart currently
Screenshot 2024-02-01 at 13 27 26

@Maxim-Durand Maxim-Durand changed the title refactor(template): Replacing source_location in github report when scanning an image refactor(report): Replacing source_location in github report when scanning an image Feb 3, 2024
@Maxim-Durand Maxim-Durand marked this pull request as ready for review February 3, 2024 12:27
image_with_hash := strings.Join(report.Metadata.RepoDigests, ", ")
_, image_hash, found := strings.Cut(image_with_hash, "@")
if found {
image_reference += "@" + image_hash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing this rather than just using RepoDigests directly? Mind adding a comment?

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RepoDigests looks like <registry>/<image_name>@sha256:<image_hash>.
Whereas RepoTag looks like looks like <registry>/<image_name>:<image_tag>.

So if only using RepoDigests it means loosing the <image_tag>.

Good question, I added a comment.

Copy link

@RichardoC RichardoC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this!

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Maxim-Durand
Sorry for delay.

Left comments. Take a look, please.

pkg/report/github/github_test.go Outdated Show resolved Hide resolved
pkg/report/github/github.go Outdated Show resolved Hide resolved
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maxim-Duran
Thank you very much for your work, discussion, opinions and patience.

LGTM. I think we choosed the best solution for this problem.

@knqyf263 I approved this PR. Take a look, please.

@knqyf263 knqyf263 added this pull request to the merge queue Feb 22, 2024
Merged via the queue into aquasecurity:main with commit 388f476 Feb 22, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(template): change source_location in github template for aggregated packages
4 participants