Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan without resolving/downloading dependencies #168

Closed
wbolster opened this issue Dec 3, 2021 · 19 comments · Fixed by #255
Closed

scan without resolving/downloading dependencies #168

wbolster opened this issue Dec 3, 2021 · 19 comments · Fixed by #255
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@wbolster
Copy link

wbolster commented Dec 3, 2021

maybe i'm misunderstanding, but it seems that pip-audit cannot scan the contents of a requirements file in the simplest way imaginable: for each entry, check if the specified version has known vulnerabilities.

currently, running against a requirements file with exclusively exact version matches (foo==x.y), like this

$ pip-audit -r requirements.txt 

… downloads and (temporarily) installs various packages, which not only takes forever, but also eventually fails because some packages cannot be found – they come from a private index server, which is a different issue.

as a work-around, installing everything (also downloads and installs) and then using pip-audit --local seems to work, but this is not a convenient solution.

i understand that in the general case, requirements files may not be exhaustive and may not have exact versions, in which case running a resolver etc makes total sense, but that's not always the case.

some background: the use case is that my requirements files are produced by pip-tools and hence constitute the exact list of pinned versions used by my application, and i would like to integrate pip-audit into a continuous integration job. i also run pip-compile (from pip-tools) in that job to warn about potential package upgrades, some of which come from private index servers, and it would be great if pip-audit can be added to it.

@wbolster wbolster added the enhancement New feature or request label Dec 3, 2021
@di
Copy link
Member

di commented Dec 3, 2021

Thanks for the issue @wbolster! One quick question: do you pin hashes in this file as well as use --require-hashes when installing?

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

currently not, but ideally yes.

the reason we don't is because it doesn't work so well with some non-pypi index software (artifactory, devpi, ...), forcing hash calculation to turn into a slow ‘download everything, hash everything, compare’ operation during what should be a cheap ‘any updates available?’ check, which is literally a ci job that runs pip-compile followed by git diff --exit-code.

(yes, there are auto-upgrade bots that open pull requests, but they don't work well for self-hosted vpn-only repos, and hosting it adds ops overhead that's not currently worth it)

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

without ‘solutionizing’ (terrible word, sorry) too much for a problem i may not fully understand, i can imagine something like this

pip-audit --no-resolve -r requirements.txt

…and it would be totally fine if it gives warnings for every non-exact dependency, e.g. if the requirements file contains

foo == 1.0.0
bar >= 1.2.3

… it could print a warning that bar cannot be checked.

@di
Copy link
Member

di commented Dec 3, 2021

Unfortunately without hashes, a requirements file with only pins is not guaranteed to be exhaustive, and generally I think we want pip-audit to be exhaustive.

That said, I think it would be reasonable to automatically disable resolution if hashes are present. We could possibly give an option to disable resolution, but I think this would be somewhat confusing as it would have the effect of ignoring version ranges, and because pip doesn't really have equivalent functionality.

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

without hashes, a requirements file with only pins is not guaranteed to be exhaustive

i'm interested to learn why that is? exhaustiveness and pinning seem orthogonal?

because pip doesn't really have equivalent functionality

pip can be told to not access index servers and not install dependencies:

$ pip --no-index --no-deps

… which works fine when installing from a directory full of pre-built wheels, for instance:

$ pip --no-index --no-deps wheels/*.whl

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

glancing at --local which calls into PipSource(local=True) i see that the comments at the top of the file mention pip list -v --format=json as a baseline input. this looks something like this:

[
  {
    "name": "aiohttp",
    "version": "3.8.0",
    "location": "../lib/python3.9/site-packages",
    "installer": "pip"
  },
  {
    "name": "aiosignal",
    "version": "1.2.0",
    "location": "../lib/python3.9/site-packages",
    "installer": "pip"
  },
  ...
]

…which seems really close to the information of a requirements file with only pinned versions in it, which this issue is about.

@di
Copy link
Member

di commented Dec 3, 2021

i'm interested to learn why that is? exhaustiveness and pinning seem orthogonal?

Because source distributions. E.g., https://pypi.org/project/paradox/ only provides a source distribution with non-deterministic dependencies. Even if you pinned paradox==0.0.1, you aren't guaranteed to get the same sub-dependencies for every installation. Obviously that's a toy example, but the same thing commonly happens between different Python versions, different platforms, etc.

pip can be told to not access index servers and not install dependencies:

Ah, good call. It seems like adding a --no-deps flag would mostly resolve this, but we'd still need to resolve version ranges, or error if version ranges are provided instead of pins.

@di
Copy link
Member

di commented Dec 3, 2021

glancing at --local which calls into PipSource(local=True) i see that the comments at the top of the file mention pip list -v --format=json as a baseline input.

Using pip list only works when run in environments where the dependencies have already been resolved/installed, we can't use this with a requirements file without resolving/installing the requirements file itself.

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

thanks for the pointer about undeterministic dependencies, paradox is a nice sample package 😸😿

we can't use this with a requirements file without resolving/installing the requirements file itself.

…unless explicitly opting in to this behaviour of course 🙃

a --no-deps flag combined with warnings about non-pinned requirements would be great; see #168 (comment). it even seems that most of the plumbing is already in place for this; see #168 (comment)

@di
Copy link
Member

di commented Dec 3, 2021

…unless explicitly opting in to this behaviour of course 🙃

Ah, I see what you're saying: we still install the requirements into a virtualenv, we just pass --no-deps along with the installation, and then use pip list as usual. That could work.

You mentioned two issues: the time it takes to resolve dependencies, and not failing on packages in private indexes, which we fixed in #162 and released in https://pypi.org/project/pip-audit/1.0.1/. Given that the latter should be resolved, how much of an issue is the former for you? I'm inclined to leave this open for a bit to get some additional perspectives here.

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

we still install the requirements into a virtualenv

no, that is not what i meant.

i would actually like to not download any packages at all, let alone install them (somewhere).

what i would like to do is is: ‘take my requirement file(s), and just tell me which ones have known security problems’, preferably by listing them and a non-zero exit code.

my remark about the --local checker was that it seems to take a similar approach: it collects package versions (from the environment), then scans them. i was merely suggesting that hooking up a different input (namely the requirements files) might be a feasible implementation approach.

@woodruffw
Copy link
Member

what i would like to do is is: ‘take my requirement file(s), and just tell me which ones have known security problems’, preferably by listing them and a non-zero exit code.

Just to make sure I understand: what would you like pip-audit -r input.txt to do with this input.txt?

foo
bar==1.2.3
baz>=2.0.0

There are four options that I see:

  1. Our current behavior, which is to fully resolve foo and baz, and then perform full dependency resolution to audit all dependencies
  2. A "pinned-only" audit, where we only audit bar, since the other two require us to perform the dependency resolution step
  3. A "flattened" audit, where we only audit the top level packages (resolving just their versions, but not their dependencies)
  4. Some combination of (3) and (4)

@wbolster
Copy link
Author

wbolster commented Dec 3, 2021

my original thinking was 2, with warnings for the packages that cannot be audited.

but 3 or a variant thereof would be fine as well: e.g. find the ‘best’ (without dependency resolution) version that matches the version spec, and audit that.

@woodruffw
Copy link
Member

@alex came up with another use case for this: sometimes our dependency source can be something like a third-party package manager, which has already done the hard work of fully resolving the dependency tree for us. Performing the resolution step in that case would be pointless, since we already have all of the information available to us.

What I'm thinking is a --no-resolve or --already-resolved flag that, when the dependency source is a requirements-style file, checks that every dependency is pinned to an exact version and only audits those dependencies.

That will effectively address (2)/(3) above, since it assumes that the input is "flat" in the sense that no additional resolutions are necessary.

@woodruffw woodruffw self-assigned this Dec 10, 2021
@woodruffw woodruffw added this to the Post-stable milestone Dec 10, 2021
@di
Copy link
Member

di commented Dec 11, 2021

Maybe --no-deps, perhaps? 🙂

@woodruffw
Copy link
Member

I'm going to take a stab at this today!

@di
Copy link
Member

di commented May 2, 2022

Initial thoughts here: since hash pinning has the effect of disabling dependency resolution, users that want to disable dependency resolution could do this now by hashing & pinning all dependencies. (This would also have the effect of getting more users to hash & pin dependencies).

@woodruffw
Copy link
Member

IMO we should include this, but with a nudge telling users to use pip-compile to hash and pin their dependencies.

My reasoning: There's no built-in way to hash all dependencies with just pip, so users who are using only pip shouldn't have to install additional third-party tooling to audit their fully-pinned requirements.txt. So here's my proposed behavior:

  • --require-hashes only: no functional changes
  • --no-deps: Warn the user that --require-hashes + pip-compile is preferred, but perform the operation as requested
  • --require-hashes + --no-deps: Warn the user that the combination is redundant and warn them that --require-hashes + pip-compile is preferred.

@di
Copy link
Member

di commented May 2, 2022

@woodruffw SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants