Filter out packages that are owned by OS packages (ownership overlap) #1373

wagoodman · 2023-06-29T21:14:29Z

What would you like to be added:
Today we conditionally filter out packages that are owned by other packages before attempting to find matches

grype/grype/pkg/package.go

Lines 104 to 144 in bc93a96

    
           func removePackagesByOverlap(catalog *pkg.Collection, relationships []artifact.Relationship) *pkg.Collection { 
        
           	byOverlap := map[artifact.ID]artifact.Relationship{} 
        
           	for _, r := range relationships { 
        
           		if r.Type == artifact.OwnershipByFileOverlapRelationship { 
        
           			byOverlap[r.To.ID()] = r 
        
           		} 
        
           	} 
        
           	out := pkg.NewCollection() 
        
           	for p := range catalog.Enumerate() { 
        
           		r, ok := byOverlap[p.ID()] 
        
           		if ok { 
        
           			from, ok := r.From.(pkg.Package) 
        
           			if ok && excludePackage(p, from) { 
        
           				continue 
        
           			} 
        
           		} 
        
           		out.Add(p) 
        
           	} 
        
           	return out 
        
           } 
        
           func excludePackage(p pkg.Package, parent pkg.Package) bool { 
        
           	// NOTE: we are not checking the name because we have mismatches like: 
        
           	// python      3.9.2      binary 
        
           	// python3.9   3.9.2-1    deb 
        
           	// If the version is not effectively the same, keep both 
        
           	if !strings.HasPrefix(parent.Version, p.Version) { 
        
           		return false 
        
           	} 
        
           	// filter out only binary pkg, empty types, or equal types 
        
           	if p.Type != pkg.BinaryPkg && p.Type != "" && p.Type != parent.Type { 
        
           		return false 
        
           	} 
        
           	return true 
        
           }

.

Ideally we would remove packages that overlap in ownership with other packages when the parent package is an OS package.

Why is this needed:
This should remove a class of false positives since OS distributions typically manage back ports of fixes and are the source of truth for the logical package, thus should override any vulnerability match claims for the contained packages (e.g. if an RPM contains a python package and we find a vuln for the python package and not the RPM, we should drop this).

Dev notes:

we do something like this today (see the above link) so it would be ideal to stay consistent here: try to filter out the package based on artifact/relationship rules before we do any matching.

kzantow · 2023-06-29T21:17:27Z

This is related to anchore/syft#931

willmurphyscode · 2023-07-20T21:36:34Z

Found an example case where I'm not sure the RPM package should have won over the python package: Language packages that are available from the OS package manager. For example, in the branch where I have this feature partly built, we can see: pkg:rpm/rhel/[email protected]?arch=noarch&upstream=python-six-1.11.0-8.el8.src.rpm&distro=rhel-8.3 is preferred over pkg:pypi/[email protected].

It's likely there because someone ran yum install -y python3-six, but any vulnerabilities reported against the PyPI version of the package should probably be reported by grype, so it's not obvious in this case that preferring the OS package is the right thing to do.

westonsteimel · 2023-07-21T02:36:38Z

Actually these are exactly the cases where we want the os packages to win out by default because rhel backport fixes that PyPI won't know about.

westonsteimel · 2023-07-21T03:32:39Z

Now something which would be even more accurate here would be to somehow deselect PyPI vulns specifically when a RHEL one overrules it, but that gets much more complicated because right now grype only knows how to fetch vulnerabilities that a package may be affected by and that approach would require fetching ones that a package is not affected by. It'd also require looking at the relatedVulnerability data to convert identifiers so that you could compare between say GHSA and CVE. It might be doable though, but probably needs alot more thought

westonsteimel · 2023-07-21T03:55:34Z

Also I don't think we could do that approach today since grypedb might not even have records when something is marked as not affected or withdrawn, and those would be important to capture somewhere if attempting the more advanced approach

willmurphyscode · 2023-07-22T10:35:59Z

Thanks for the additional context @westonsteimel. Based on that additional info, I'll keep trying to get #1387 merged. I may have found an interesting edge case with the quality gate, which I'll discuss on the PR.

willmurphyscode · 2023-08-07T15:03:04Z

@westonsteimel and I discussed this issue this morning, and I think there are basically 2 approaches, which I'll explain with an example issue:

Example situation

syft -q docker.io/anchore/test_images@sha256:c8d664b0e728d52f57eeb98ed1899c16d3b265f02ddfb41303d7a16c31e0b0f1 | grep guava
guava                                    20.0                                            java-archive
guava20                                  20.0-8.module+el8+5161+5cac467c                 rpm

In this case, syft found 2 packages with guava in the name: the JAR guava, and the rpm guava20. They have slightly different versions. The guava20 RPM has a more specific version number, because we have the RPM installed package database to look at. But these are the same package; the Java cataloger found the JAR, and the RPM cataloger found the RPM.

Let's see what grype thinks:

grype -q docker.io/anchore/test_images@sha256:c8d664b0e728d52f57eeb98ed1899c16d3b265f02ddfb41303d7a16c31e0b0f1 | grep guava
guava                                    20.0                                                                                               java-archive  CVE-2023-2976        High
guava                                    20.0                                           24.1.1                                              java-archive  GHSA-mvr2-9pj6-7w5j  Medium
guava                                    20.0                                           32.0.0                                              java-archive  GHSA-7g45-4rm6-3mm3  Medium
guava                                    20.0                                                                                               java-archive  CVE-2018-10237       Medium
guava                                    20.0                                                                                               java-archive  GHSA-5mg8-w23w-74h3  Low
guava                                    20.0                                                                                               java-archive  CVE-2020-8908        Low

We can see that all the vulnerabilities are reported against the JAR, not the RPM. So there are two possibilities: either the RPM has a distro-specific fix that we know about, or the distro doesn't report the package as vulnerable. This brings us to the two approaches:

First approach: remove before matching

In this approach, we remove the JAR guava@20 before matching. Then none of the above vulnerabilities are found. If we believed that the distro's vulnerability feed were exhaustive, this would be a great approach. It's what's suggested earlier in this issue, and it's certainly easier to implement.

However, there are 3 possible situations:

Distro feed had no record - we don't know whether we should have removed the JAR
Distro feed was vulnerable - removing the JAR was fine, since we're report against the RPM, which has more specific version info anyway.
Distro feed had a specific fix information - removing the JAR was correct, since the report against the JAR was a false positive caused by not considering the OS specific fix.

Second possible approach

In the second approach, we try to figure out which of the 3 possible situations we're in, and act accordingly:

Distro feed has no record - leave the vuln reported against the JAR, because a false positive is better than a false negative.
Distro feed shows OS package is vulnerable - remove the match against the JAR, because the vulnerability is redundant
Distro feed shows OS package is fixed - remove the match against the JAR, because it is a known false positive.

I'm going to take some time to see how hard implementing the second approach is.

willmurphyscode · 2023-08-09T10:30:41Z

I've done some research on the second approach, and it requires changing grype a bit.

In order to be able to correctly dedupe in this way, we need a place in the code where we have the following information:

We know whether a given package is owned by an OS package, in other words, the SBOM's relationships are in scope
We know whether the OS package is vulnerable, in other words, matching has taken place
If the OS package is not vulnerable, we know whether it's because distro has no vulnerability record, or because the distro reports the package as not affected or patched. In other words, the vulnerability store is in scope*.

Right now, the only place where all this information is in scope is at

grype/cmd/grype/cli/legacy/root.go

Line 367 in fff4341

    
           remainingMatches, ignoredMatches, err := vulnMatcher.FindMatches(packages, pkgContext)

, which has the matches, the relationships (on the SBOM), and the vulnerability store. However, it seems strange to have additional filtering of matches after the main vulnerabilityMatcher.FindMatches call, so I think the vulnerability matcher struct (grype.VulnerabilityMatcher) should be updated to have a pointer to the SBOM. The code should check for nil on the vulnerability matcher struct, since library users might not know to set the new field.

I'll take this specific approach and see how far it gets me:

Change

grype/grype/vulnerability_matcher.go

Lines 15 to 21 in fff4341

    
           type VulnerabilityMatcher struct { 
        
           	Store          store.Store 
        
           	Matchers       []matcher.Matcher 
        
           	IgnoreRules    []match.IgnoreRule 
        
           	FailSeverity   *vulnerability.Severity 
        
           	NormalizeByCVE bool 
        
           }

to have a pointer to a syft sbom.SBOM on it.

Change

grype/grype/vulnerability_matcher.go

Lines 45 to 70 in fff4341

    
           func (m *VulnerabilityMatcher) FindMatches(pkgs []pkg.Package, context pkg.Context) (*match.Matches, []match.IgnoredMatch, error) { 
        
           	var ignoredMatches []match.IgnoredMatch 
        
           	matches := matcher.FindMatches(m.Store, context.Distro, m.Matchers, pkgs) 
        
           	matches, ignoredMatches = m.applyIgnoreRules(matches) 
        
           	if m.NormalizeByCVE { 
        
           		normalizedMatches := match.NewMatches() 
        
           		for originalMatch := range matches.Enumerate() { 
        
           			normalizedMatches.Add(m.normalizeByCVE(originalMatch)) 
        
           		} 
        
           		// we apply the ignore rules again in case any of the transformations done during normalization 
        
           		// regresses the results (relative to the already applied ignore rules). Why do we additionally apply 
        
           		// the ignore rules before normalizing? In case the user has a rule that ignores a non-normalized 
        
           		// vulnerability ID, we wantMatches to ensure that the rule is honored. 
        
           		matches, ignoredMatches = m.applyIgnoreRules(normalizedMatches) 
        
           	} 
        
           	var err error 
        
           	if m.FailSeverity != nil && HasSeverityAtOrAbove(m.Store, *m.FailSeverity, matches) { 
        
           		err = grypeerr.ErrAboveSeverityThreshold 
        
           	} 
        
           	return &matches, ignoredMatches, err 
        
           }

to use that reference to deduplicate as described in "Second possible approach" in my last comment

willmurphyscode · 2023-08-10T15:41:47Z

Met with @wagoodman to discuss this, and we had some thoughts.

First, there are 3 logical states grype can believe about a given package/CVE combination:

The package is vulnerable to the CVE
There's not evidence the package is vulnerable to the CVE (e.g. search miss in Grype DB)
The package is NOT vulnerable to the CVE, e.g. the distro feed records a specific patch.

Right now, grype's searching doesn't distinguish between cases 2 and 3. Case 1 gets printed, and cases 2 and 3 get dropped. However, logic distinguishing cases 2 and 3 would be very helpful.

Let's call case 3 an "anti-match." We could have a situation, for example, where someone has run yum install python3-urllib3 on RHEL8. The Python cataloger will find the PyPI package "urllib3" at some version, and the RPM cataloger will find the package "python3-urllib3", and the ownership relationship will show that python3-urllib3 owns urllib3. The python matcher will find a positive (case 1) against urllib3, because the PyPI version is listed as vulnerable. But the python3-urllib3 will find an anti-match, because the vulnerabilitiy feed from RHEL proactively documents a distro-provided patch. Then we can say, "Since this is an RPM and we're on RedHat, the anti-match from the distro feed overrides the match from the generic Python data, so this is not a positive."

This is the same logic as the second approach in #1373 (comment), but it provides a new entity that can be used in grype. So the functions in the search package will return a slice of known matches and a slice of known anti-matches, and the present of anti-matches can be used to perform business logic, especially for lowering the false positive rate. Additionally, we could display packages from grype that are known to be patched or not affected, which might save time to security teams.

willmurphyscode · 2023-08-15T18:10:02Z

I think rather than block this issue on #1426, we might do the following:

Have an allow-list of vulnerability feeds that are believed to be "complete." That is, vulnerability feeds that report both fixed and non-fixed vulnerabilities
For OS packages that are from a distro operating one of those feeds, remove the non-OS package from the ownership-by-overlap relationship before matching.

For example, if we're scanning an RHEL8 image, and we have an RPM that owns a Python package, go ahead and remove the Python package before matching, since the RHEL8 feed will have better information on whether the package is vulnerable than GHSA will. But if we're scanning an Alpine image and there's an APK that owns a Python package, leave the python package under consideration for matching, because Alpine only reports fixes, and so the matching GHSA or NVD data directly is necessary to detect vulnerabilities that Alpine hasn't released a patch for yet. Then, when #1426 is implemented, we can remove the allow-list in favor of the logic described in the previous comment.

Next steps are:

Develop this allow list
Update https://github.com/anchore/grype/pull/1387/files#diff-c1eeca4b68c3cef87a8131f596f5e598d5b9a56704624ff94afd7db799e4193aR142 to only filter packages if the distro is allowed by that allow-list.

wagoodman added the enhancement New feature or request label Jun 29, 2023

westonsteimel mentioned this issue Jul 13, 2023

False Positive: CVE-2021-3521 reported against rpm-4.14.3-26.el8.x86_64 on Oracle Linux 8 #1362

Closed

willmurphyscode self-assigned this Jul 13, 2023

willmurphyscode mentioned this issue Jul 13, 2023

feat: filter out packages owned by OS packages #1387

Merged

This was referenced Aug 3, 2023

Feat relabel grype1373 anchore/vulnerability-match-labels#87

Merged

Feat more grype1373 relabels anchore/vulnerability-match-labels#89

Merged

willmurphyscode mentioned this issue Aug 10, 2023

RFC: Explicit reporting of negative matches #1426

Open

willmurphyscode mentioned this issue Aug 16, 2023

Remaining 1373 updates anchore/vulnerability-match-labels#99

Merged

willmurphyscode closed this as completed in #1387 Aug 18, 2023

p-linnane mentioned this issue Aug 31, 2023

grype 0.66.0 Homebrew/homebrew-core#141029

Merged

dependabot bot mentioned this issue Sep 1, 2023

Bump github.com/anchore/grype from 0.65.2 to 0.66.0 anchore/grype-db#153

Merged

willmurphyscode mentioned this issue Sep 11, 2023

Vulnerabilities marked as fixed in distro packages should be reported as fixed for all contained packages too #1236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out packages that are owned by OS packages (ownership overlap) #1373

Filter out packages that are owned by OS packages (ownership overlap) #1373

wagoodman commented Jun 29, 2023

kzantow commented Jun 29, 2023

willmurphyscode commented Jul 20, 2023

westonsteimel commented Jul 21, 2023 •

edited

Loading

westonsteimel commented Jul 21, 2023

westonsteimel commented Jul 21, 2023

willmurphyscode commented Jul 22, 2023

willmurphyscode commented Aug 7, 2023

willmurphyscode commented Aug 9, 2023

willmurphyscode commented Aug 10, 2023

willmurphyscode commented Aug 15, 2023

Filter out packages that are owned by OS packages (ownership overlap) #1373

Filter out packages that are owned by OS packages (ownership overlap) #1373

Comments

wagoodman commented Jun 29, 2023

kzantow commented Jun 29, 2023

willmurphyscode commented Jul 20, 2023

westonsteimel commented Jul 21, 2023 • edited Loading

westonsteimel commented Jul 21, 2023

westonsteimel commented Jul 21, 2023

willmurphyscode commented Jul 22, 2023

willmurphyscode commented Aug 7, 2023

Example situation

First approach: remove before matching

Second possible approach

willmurphyscode commented Aug 9, 2023

willmurphyscode commented Aug 10, 2023

willmurphyscode commented Aug 15, 2023

westonsteimel commented Jul 21, 2023 •

edited

Loading