Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requirement, pypi: Add a --require-hashes flag #229

Merged
merged 11 commits into from
Feb 6, 2022
Merged

Conversation

tetsuo-cpp
Copy link
Contributor

@tetsuo-cpp tetsuo-cpp commented Feb 1, 2022

@tetsuo-cpp tetsuo-cpp marked this pull request as draft February 1, 2022 13:48
@tetsuo-cpp
Copy link
Contributor Author

tetsuo-cpp commented Feb 1, 2022

This needs cleanup + testing + doc but I wanted to engage early on this one since the intended behaviour isn't straightforward.

@di You mentioned that the --require-hashes flag should skip dependency resolution. Did you mean that we shouldn't resolve transitive dependencies for each requirement and instead, only audit what is explicitly listed in the requirements file?

I was also wondering how hashes should work when the --require-hashes flag isn't supplied. My thinking is that when the --require-hashes flag is supplied, every requirement in the file needs to have a hash AND we check them against PyPI. But if the flag isn't supplied, we should still check any hashes that we find against PyPI and fail if it doesn't match.

@woodruffw
Copy link
Member

You mentioned that the --require-hashes flag should skip dependency resolution. Did you mean that we shouldn't resolve transitive dependencies for each requirement and instead, only audit what is explicitly listed in the requirements file?

I was also wondering how hashes should work when the --require-hashes flag isn't supplied. My thinking is that when the --require-hashes flag is supplied, every requirement in the file needs to have a hash AND we check them against PyPI. But if the flag isn't supplied, we should still check any hashes that we find against PyPI and fail if it doesn't match.

This is an indirect answer to both: we can skip dependency resolution entirely, since pip stipulates the following:

  1. If any dependency has a content hash, then all dependencies have content hashes
  2. (Transitively) All dependencies must be explicitly listed in the requirements file if hashing is enabled

In other words: our requirements parser in pip-api should preserve those properties: we should raise an error if we're given a requirements file with dependencies that are missing hashes, and we should not do dependency resolution because only the present hashed dependencies should be considered.

(Looking at things more, I'm not actually sure that we need the --require-hashes flag -- pip stipulates that passing it is identical to any requirement having hashes attached to it. But I guess we should preserve it as a no-op, since that's what pip seems to do?

@tetsuo-cpp
Copy link
Contributor Author

tetsuo-cpp commented Feb 1, 2022

In other words: our requirements parser in pip-api should preserve those properties: we should raise an error if we're given a requirements file with dependencies that are missing hashes, and we should not do dependency resolution because only the present hashed dependencies should be considered.

Ah! Right, I misunderstood the pip behaviour. Thanks for explaining.

(Looking at things more, I'm not actually sure that we need the --require-hashes flag -- pip stipulates that passing it is identical to any requirement having hashes attached to it. But I guess we should preserve it as a no-op, since that's what pip seems to do?

I think it's still important because your file may not have ANY hashes in it. If you pass in --require-hashes, it will complain whereas it otherwise would succeed.

@woodruffw
Copy link
Member

I think it's still important because your file may not have ANY hashes in it. If you pass in --require-hashes, it will complain whereas it otherwise would succeed.

Ah! You're absolutely right. Yes, we need it for that case.

@di
Copy link
Member

di commented Feb 1, 2022

@di You mentioned that the --require-hashes flag should skip dependency resolution. Did you mean that we shouldn't resolve transitive dependencies for each requirement and instead, only audit what is explicitly listed in the requirements file?

Thinking about this more, I think we still need to do dependency resolution, and ensure that we have hashes for all subdependencies, because this is what pip does as well:

$ cat req.txt
sampleproject==2.0.0 \
    --hash=sha256:2b0c55537193b792098977fdb62f0acbaeb2c3cfc56d0e24ccab775201462e04 \
    --hash=sha256:d99de34ffae5515db43916ec47380d3c603e9dead526f96581b48c070cc816d3

$ python -m pip install -r req.txt
Collecting sampleproject==2.0.0 (from -r req.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/b8/f7/dd9223b39f683690c30f759c876df0944815e47b588cb517e4b9e652bcf7/sampleproject-2.0.0-py3-none-any.whl
Collecting peppercorn (from sampleproject==2.0.0->-r req.txt (line 1))
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    peppercorn from https://files.pythonhosted.org/packages/14/84/d8d9c3f17bda2b6f49406982546d6f6bc0fa188a43d4e3ba9169a457ee04/peppercorn-0.6-py3-none-any.whl#sha256=46125cad688a9cf3b08e463bcb797891ee73ece93602a8ea6f14e40d1042d454 (from sampleproject==2.0.0->-r req.txt (line 1))

I think it's still important because your file may not have ANY hashes in it. If you pass in --require-hashes, it will complain whereas it otherwise would succeed.

Agreed!

@tetsuo-cpp
Copy link
Contributor Author

tetsuo-cpp commented Feb 1, 2022

Thinking about this more, I think we still need to do dependency resolution, and ensure that we have hashes for all subdependencies, because this is what pip does as well:

Can we leave that validation to pip and not attempt to do it ourselves? I imagine that's going to introduce a significant chunk of code reinventing what pip does which will have to be removed during integration, since pip already contains this logic.

@tetsuo-cpp tetsuo-cpp marked this pull request as ready for review February 2, 2022 06:28
pip_audit/_cli.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@woodruffw
Copy link
Member

Can we leave that validation to pip and not attempt to do it ourselves? I imagine that's going to introduce a significant chunk of code reinventing what pip does which will have to be removed during integration, since pip already contains this logic.

I'm inclined to agree with this, although it does deviate our handling of --require-hashes from that of pip (our --require-hashes is more like --no-deps --require-hashes). My reasoning is that pip-audit isn't in the business of telling the user whether pip install ... would succeed, and that the overwhelming majority of hashed requirements files should be valid already (since, if they're like Warehouse's, they're checked in after successful local and CI usage).

Thoughts @di?

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure LGTM here!

@di
Copy link
Member

di commented Feb 4, 2022

I suppose the pip install with an incompletely hashed requirements file, or a hashed requirements file that's missing dependencies, would fail to install anyways, so the user would be protected regardless.

@tetsuo-cpp tetsuo-cpp merged commit e27e70d into main Feb 6, 2022
@tetsuo-cpp tetsuo-cpp deleted the alex/require-hashes branch February 6, 2022 15:16
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Feb 20, 2022
### Added

* CLI: The `--fix` flag has been added, allowing users to attempt to
  automatically upgrade any vulnerable dependencies to the first safe version
  available ([#212](pypa/pip-audit#212),
  [#222](pypa/pip-audit#222))

* CLI: The combination of `--fix` and `--dry-run` is now supported, causing
  `pip-audit` to perform the auditing step but not any resulting fix steps
  ([#223](pypa/pip-audit#223))

* CLI: The `--require-hashes` flag has been added which can be used in
  conjunction with `-r` to check that all requirements in the file have an
  associated hash ([#229](pypa/pip-audit#229))

* CLI: The `--index-url` flag has been added, allowing users to use custom
  package indices when running with the `-r` flag
  ([#238](pypa/pip-audit#238))

* CLI: The `--extra-index-url` flag has been added, allowing users to use
  multiple package indices when running with the `-r` flag
  ([#238](pypa/pip-audit#238))

### Changed

* `pip-audit`'s minimum Python version is now 3.7.

* CLI: The default output format is now correctly pluralized
  ([#221](pypa/pip-audit#221))

* Output formats: The SBOM output formats (`--format=cyclonedx-xml` and
  `--format=cyclonedx-json`) now use CycloneDX
  [Schema 1.4](https://cyclonedx.org/docs/1.4/xml/)
  ([#216](pypa/pip-audit#216))

* Vulnerability sources: When using PyPI as a vulnerability service, any hashes
  provided in a requirements file are checked against those reported by PyPI
  ([#229](pypa/pip-audit#229))

* Vulnerability sources: `pip-audit` now uniques each result based on its
  alias set, reducing the amount of duplicate information in the default
  columnar output format
  ([#232](pypa/pip-audit#232))

* CLI: `pip-audit` now prints its output more frequently, including when
  there are no discovered vulnerabilities but packages were skipped.
  Similarly, "manifest" output formats (JSON, CycloneDX) are now emitted
  unconditionally
  ([#240](pypa/pip-audit#240))

### Fixed

* CLI: A regression causing excess output during `pip audit -r`
  was fixed ([#226](pypa/pip-audit#226))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a --require-hashes flag
3 participants