-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
931: binary cataloger exclusion defaults #1948
Conversation
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
Signed-off-by: Christopher Phillips <[email protected]>
Quick update on this PR - after some discussion we're going to dial back the feature to not be exposed to the user yet and just fix this for the narrower cases of |
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
ace0cb0
to
d45458e
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blockers, but left some feedback.
type CategoryType string | ||
|
||
const ( | ||
OsCatalogerType CategoryType = "os" | ||
BinaryCatalogerType CategoryType = "binary" | ||
) | ||
|
||
var CatalogerTypeIndex = map[CategoryType][]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could these just be simplified into 2 variables? something like:
var parentCatalogerTypes = []string { .... }
var childCatalogerTypes = []string { .... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Yea that would be a good simplification here.
My only hesitancy to change it back to that is the original config object we had discussed on the issue:
#931 (comment)
I think keeping this as is has two advantages:
- Is clear to future users/contributors that Os/Binary categorization types were an explicit choice as and additional condition. The parent child designation loses this nuance a little.
- It keeps us open to category based configuration options we may want to consider in the future
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concern is that we want to be explicit about OS and binary cataloger types, these could be named
var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }
keeps us open to category based configuration options we may want to consider in the future
I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.
One more question I forgot: should this PR include a boolean config option to revert this behavior? |
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Yea good call - this should now be added with 58f6d69 I've opted for the new behavior of exclusions to be the default since we've identified the synthetic binary packages in some cases to be a mistake. Users can add the following to their configs to reenable the old flow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the comment about the change to encode_decode_cycle_test
, I think this might be a blocker (and an accidental commit?)
type CategoryType string | ||
|
||
const ( | ||
OsCatalogerType CategoryType = "os" | ||
BinaryCatalogerType CategoryType = "binary" | ||
) | ||
|
||
var CatalogerTypeIndex = map[CategoryType][]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concern is that we want to be explicit about OS and binary cataloger types, these could be named
var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }
keeps us open to category based configuration options we may want to consider in the future
I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.
Signed-off-by: Christopher Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 -- and definitely agree that having the default behavior as you noted to exclude these entries in the SBOM
@@ -482,6 +482,10 @@ default-image-pull-source: "" | |||
# - "./out/**/*.json" | |||
exclude: [] | |||
|
|||
# allows users to exclude synthetic binary packages from the sbom | |||
# these packages are removed if an overlap with a non-synthetic package is found | |||
exclude-overlap-by-ownership: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code has exclude-binary-overlap-by-ownership
https://github.com/anchore/syft/pull/1948/files#diff-9dd8956cf9479ebf46ae7743d82d2d89bd81661bd13cd239651ff31f414f10b5R226
Parallelism int | ||
} | ||
|
||
func DefaultConfig() Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete the DefaultConfig
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was only used as a part of *_test.go
files. It was moved here:
syft/test/integration/utils_test.go
Lines 57 to 65 in c7272fd
func defaultConfig() cataloger.Config { | |
return cataloger.Config{ | |
Search: cataloger.DefaultSearchConfig(), | |
Parallelism: 1, | |
LinuxKernel: kernel.DefaultLinuxCatalogerConfig(), | |
Python: python.DefaultCatalogerConfig(), | |
ExcludeBinaryOverlapByOwnership: true, | |
} | |
} |
Apologies for the boy scout change on an unrelated PR - my IDE was yelling about this being deadcode
and I could not figure out why - the refactor over to test resolved that issue
// 3) the child is a synthetic package generated by the binary cataloger | ||
// 4) the package names are identical | ||
// This exclude was implemented as a way to help resolve: https://github.com/anchore/syft/issues/931 | ||
func Exclude(r artifact.Relationship, c *pkg.Collection) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function seems very specific, but has a very generic name. I think the name should probably be tweaked to be a little more specific.
) | ||
|
||
var ( | ||
osCatalogerTypes = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the filtering should be based on the package type, not the cataloger names.
…chore#1948) Fixes anchore#931 PR anchore#1948 introduces a new implicit exclusion for binary packages that overlap by file ownership and have certain characteristics: 1) the relationship between packages is OwnershipByFileOverlap 2) the parent package is an "os" package - see changelog for included catalogers 3) the child is a synthetic package generated by the binary cataloger - see changelog for included catalogers 4) the package names are identical --------- Signed-off-by: Christopher Phillips <[email protected]>
Binary cataloger exclusion defaults
Fixes #931
PR #1948 introduces a new implicit exclusion for packages that overlap by file ownership and have certain characteristics:
Packages found by the following catalogers will dedupe synthetic binary packages given an overlap as described above:
I've added an integration test that captures the new
default
where scanning an alpine image withbusybox
goes from:to