Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(report): Replacing source_location in github report when scanning an image #5999

Merged
merged 8 commits into from
Feb 22, 2024
25 changes: 23 additions & 2 deletions pkg/report/github/github.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,25 @@ func (w Writer) Write(ctx context.Context, report types.Report) error {
manifest.Name = string(result.Type)
// show path for language-specific packages only
if result.Class == types.ClassLangPkg {
manifest.File = &File{
SrcLocation: result.Target,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aggregate some packages to 1 result -

func aggregate(detail *ftypes.ArtifactDetail) {
var apps []ftypes.Application
aggregatedApps := map[ftypes.LangType]*ftypes.Application{
ftypes.PythonPkg: {Type: ftypes.PythonPkg},
ftypes.CondaPkg: {Type: ftypes.CondaPkg},
ftypes.GemSpec: {Type: ftypes.GemSpec},
ftypes.NodePkg: {Type: ftypes.NodePkg},
ftypes.Jar: {Type: ftypes.Jar},
}
for _, app := range detail.Applications {
a, ok := aggregatedApps[app.Type]
if !ok {
apps = append(apps, app)
continue
}
a.Libraries = append(a.Libraries, app.Libraries...)
}
for _, app := range aggregatedApps {
if len(app.Libraries) > 0 {
apps = append(apps, *app)
}
}
// Overwrite Applications
detail.Applications = apps
}

But we can find path from Results.Packages.FilePath:

{
      "Target": "Python",
      "Class": "lang-pkgs",
      "Type": "python-pkg",
      "Packages": [
        {
          "Name": "pip",
          "Version": "23.2.1",
          "Licenses": [
            "MIT"
          ],
          "Layer": {
            "DiffID": "sha256:a7d483b6066e2ed6c9cd1d5890a93326e3f192ef6e537c046296053fa547a844"
          },
          "FilePath": "usr/local/lib/python3.12/site-packages/pip-23.2.1.dist-info/METADATA"
        },

Maybe we just use FilePath for these packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with you it's an interesting change as it would make the manifest more precise.

Nonetheless, I think the full FilePath would be too much to render in Github Dependency and would bloat the UI.
I can do a test with that change if you want and show the results here ?

Otherwise we could try to extract important values from the FilePath and adding them to result.Target, but that seems complex and prone to errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other languages we already use filePath (filePath is stored in target):

➜  5999 trivy -d fs -f github .  | jq .manifests
...
{
  "foo/bar/requirements.txt": {
    "name": "pip",
    "file": {
      "source_location": "foo/bar/requirements.txt"
    },
    "resolved": {
      "click": {
        "package_url": "pkg:pypi/[email protected]",
        "relationship": "direct",
        "scope": "runtime"
      }
    }
  }
}

And we didn't get issues about UI.

I can do a test with that change if you want and show the results here ?

Yes please. If you have time, check it out.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have a few things to report:

  1. As you pointed out trivy aggregates multiple packages in the same "App" (for instance Python or Node) into one result.
    The issue is that in the sbom format, the source_location is set at the manifest level not at the package level (resolved key in sbom format). Meaning, if we keep the default trivy aggregation then we have to use only one of all the packages FilePath. How to define that value is either arbitrary, or we need some sort of heuristic ?

  2. In terms of UI it isn't as bloated as I expected, so I think you're right and it's ok to use.
    Screenshot_20240126_150104
    The source_location used in the above screenshot is simply the FilePath of the last package in result.Packages so it's not even correct.

  3. In the case trivy is scanning an image I think it's still pretty useless to replace the manifest's source_location with this FilePath value and we should instead reference the image's name and tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, if we keep the default trivy aggregation then we have to use only one of all the packages FilePath. How to define that value is either arbitrary, or we need some sort of heuristic ?

We have already seen cases where it is necessary to remove aggregation, but this requires large changes and we are not sure about it.
I created a small function to get filePath for each package - DmitriyLewen@8431664
Can you take a look?

In the case trivy is scanning an image I think it's still pretty useless to replace the manifest's source_location with this FilePath value and we should instead reference the image's name and tag.

This makes sense for UI. But what worries me is case where user only knows that image contains vulnerable package, but doesn't have filePath to find and update that package. It would be great to add filePath somehow.
e.g. we can use <image_name>:<tag> - <filePath> format for manifest map key. Is it possible to see gsbom file after uploading to GitHub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can see in Github Dependency UI no, see screenshot below to see how it looks.

Screenshot_20240208_065410

Copy link
Contributor

@DmitriyLewen DmitriyLewen Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So user can only get package metadata when taking snapshot, right?

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they use trivy then there's no issue as the user will now be able to find the file path for each package by parsing the JSON report before sending it to GitHub.

But if they use trivy-action then the report is sent automatically and the user doesn't have access to the original report hence he wouldn't be able to parse it.
Maybe if the user then export it from GitHub, the metadata tag with each package file path will still be there, I'll test that.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I got from github export to SBOM for the same package as in #5999 (comment):

{
      "SPDXID": "SPDXRef-pip-pillow-10.1.0",
      "name": "pip:pillow",
      "versionInfo": "10.1.0",
      "downloadLocation": "NOASSERTION",
      "filesAnalyzed": false,
      "licenseConcluded": "CAL-1.0",
      "supplier": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:pypi/[email protected]",
          "referenceType": "purl"
        }
      ]
    },

As you can see the metadata tag doesn't show up in the exported SBOM.

Important

Conclusion: users of trivy-action would currently have no way of finding out the filePath of each package when using the github format.

As this issue only involves trivy-action and not trivy itself anymore, I think we should continue this discussion once this PR is merged to find a solution for trivy-action users at aquasecurity/trivy-action#286
(I would recommend to add an option for upload as an artifact so that user can easily fetch the report).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created aquasecurity/trivy-action#307 to make sure users of trivy-action will know where to find the missing details from Github Dependency.

@DmitriyLewen and @knqyf263 do you agree with this compromise ?
If so, can you review this PR code ?

if report.ArtifactType == ftypes.ArtifactContainerImage {
// `RepoDigests` ~= <registry>/<image_name>@sha256:<image_hash>
// `RepoTag` ~= <registry>/<image_name>:<image_tag>
// By concatenating the hash from `RepoDigests` at the end of `RepoTag` we get all the information
image_reference := strings.Join(report.Metadata.RepoTags, ", ")
Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using report.ArtifactName instead of report.Metadata.RepoTags would miss the image tag.

Copy link

@RichardoC RichardoC Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use "RepoDigests" instead, as that included both the tag and digest? That would address the comment I just put on the PR discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm using RepoDigests would work I did some testing and it seems to be working fine.

Each time you scan your image with a new sha it will close all the previously open alerts for that image and only keep the latest version open.
But you're still able to see the previously closed alerts by simply using the is:closed filter and searching for a specific manifest works fine too.

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mix of suggestion at #5999 (comment) and yours would give us the longest form possible storing all information:

<image_registry>/<image_name>:<tag>@sha256:<image_hash> - <filePath_to_package>

Screenshot_20240202_094700

I tested it and although it's very long, filtering seems to work too (the image_hash is fake).

Important

One big drawback of using DmitriyLewen@8431664 is that since each vulnerable package has its own source_location set using its FilePath, it means there is no more grouping in Github.
So, if you scan an image with 10 vulnerable packages, Github will show 10 different manifest instead of 1.
To me this defeats the purpose, and that's why I would advise to only use <image_registry>/<image_name>:<tag>@sha256:<image_hash>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, showing the package location (while having those disadvantages) will help with investigations about what actually needs to be fixed/updated

DmitriyLewen marked this conversation as resolved.
Show resolved Hide resolved
image_with_hash := strings.Join(report.Metadata.RepoDigests, ", ")
_, image_hash, found := strings.Cut(image_with_hash, "@")
if found {
image_reference += "@" + image_hash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing this rather than just using RepoDigests directly? Mind adding a comment?

Copy link
Contributor Author

@Maxim-Durand Maxim-Durand Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RepoDigests looks like <registry>/<image_name>@sha256:<image_hash>.
Whereas RepoTag looks like looks like <registry>/<image_name>:<image_tag>.

So if only using RepoDigests it means loosing the <image_tag>.

Good question, I added a comment.

}
// Replacing `source_location` in manifest by the image name, tag and hash
manifest.File = &File{
SrcLocation: image_reference,
}

} else {
manifest.File = &File{
SrcLocation: result.Target,
}
}
}

Expand All @@ -123,6 +140,10 @@ func (w Writer) Write(ctx context.Context, report types.Report) error {
return xerrors.Errorf("unable to build purl for %s: %w", pkg.Name, err)
}

if pkg.FilePath != "" {
githubPkg.Metadata = Metadata{"source_location": pkg.FilePath}
}

resolved[pkg.Name] = githubPkg
}

Expand Down
67 changes: 67 additions & 0 deletions pkg/report/github/github_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,73 @@ func TestWriter_Write(t *testing.T) {
},
},
},
{
name: "pypi",
Maxim-Durand marked this conversation as resolved.
Show resolved Hide resolved
report: types.Report{
SchemaVersion: 2,
ArtifactName: "fake_repo.azurecr.io/image_name",
ArtifactType: "container_image",
Metadata: types.Metadata{
RepoDigests: []string{"fake_repo.azurecr.io/image_name@sha256:a7c92cdcb3d010f6edeb37ddcdbacab14981aa31e7f1140e0097dc1b8e834c49"},
RepoTags: []string{"fake_repo.azurecr.io/image_name:latest"},
},
Results: types.Results{
{
Target: "Python",
Class: "lang-pkgs",
Type: "python-pkg",
Packages: []ftypes.Package{
{
Name: "jwcrypto",
Version: "0.7",
Licenses: []string{
"LGPLv3+",
},
Layer: ftypes.Layer{
Digest: "sha256:ddc612ba4e74ea5633a93e19e7c32f61f5f230073b21a070302a61ef5eec5c50",
DiffID: "sha256:12935ef6ce21a266aef8df75d601cebf7e935edd01e9f19fab16ccb78fbb9a5e",
},
FilePath: "opt/pyenv/versions/3.11.2/lib/python3.11/site-packages/jwcrypto-0.7.dist-info/METADATA",
},
{
Name: "matplotlib",
Version: "3.5.3",
Licenses: []string{
"PSF",
},
Layer: ftypes.Layer{
Digest: "sha256:ddc612ba4e74ea5633a93e19e7c32f61f5f230073b21a070302a61ef5eec5c50",
DiffID: "sha256:12935ef6ce21a266aef8df75d601cebf7e935edd01e9f19fab16ccb78fbb9a5e",
},
FilePath: "opt/pyenv/versions/3.11.2/lib/python3.11/site-packages/matplotlib-3.5.3.dist-info/METADATA",
},
},
},
},
},
want: map[string]github.Manifest{
"Python": {
Name: "python-pkg",
File: &github.File{
SrcLocation: "fake_repo.azurecr.io/image_name:latest@sha256:a7c92cdcb3d010f6edeb37ddcdbacab14981aa31e7f1140e0097dc1b8e834c49",
},
Resolved: map[string]github.Package{
"jwcrypto": {
PackageUrl: "pkg:pypi/[email protected]",
Relationship: "direct",
Scope: "runtime",
Metadata: github.Metadata{"source_location": "opt/pyenv/versions/3.11.2/lib/python3.11/site-packages/jwcrypto-0.7.dist-info/METADATA"},
},
"matplotlib": {
PackageUrl: "pkg:pypi/[email protected]",
Relationship: "direct",
Scope: "runtime",
Metadata: github.Metadata{"source_location": "opt/pyenv/versions/3.11.2/lib/python3.11/site-packages/matplotlib-3.5.3.dist-info/METADATA"},
},
},
},
},
},
}

for _, tt := range tests {
Expand Down
Loading