Scan package files and extract for packages #1207
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In all the following pipelines:
when we scan files for license, copyright and others, we are skipping the scan for codebase resources which have a status already before this step, and so anything tagged as
application-package
orsystem-package
will not be scanned.In the
match_not_analyzed_to_system_packages
pipe of the rootfs pipeline, we are matching all codebase resources which are a part of that package to the discovered package object and also updating it's status tosystem-package
. (It seems like earlier we were also doing this for application packages with thematch_not_analyzed_to_application_packages
function, but this is not used anywhere after this)Similary in the docker pipelines, in the
create_system_package
function of thecollect_and_create_system_packages
step we are updating the status of package files tosystem-package
.We can either:
In this PR I've tried out the 2. approach, as this is what we do in SCTK also, but here we have to create a new argument
update_status
and pass it on to the function which saves data to resources after the scan to not overwrite thesystem-package
orapplication-package
status for codebase-resources toscanned
, which was a side-effect of the file scans.Since all these pipelines already did scan application package files (which were not metadata files/lockfiles) I'm assuming we also want to scan the metadata files which were not being scanned? Otherwise #762 does not make any sense. Note here that license scans which are part of a package scan (parsing the manifest and then only running license detection on the extracted part) can be different in some complex files than a simple license scan of the file, and we might need to improve how we handle this in SCTK to avoid confusion. See aboutcode-org/scancode-toolkit#3024 for details
Reference: #762
Reference: #1194
Reference: #83