Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate --extra-index-url argument because of vulnerable definition #9612

Closed
heni opened this issue Feb 15, 2021 · 20 comments
Closed

Deprecate --extra-index-url argument because of vulnerable definition #9612

heni opened this issue Feb 15, 2021 · 20 comments

Comments

@heni
Copy link

heni commented Feb 15, 2021

What's the problem this feature will solve?
According to the research of Alex Birsan internal infrastructure could be attacked with injection the same-name package into the public pip.
For example if I use package my_super_secret_package that available only in my internal repo everything is ok. But after some attacker create the packages with the same name into the public pip repo but with the higher version my installation would be compromised by external code :(

Describe the solution you'd like
The idea is deprecate --extra-index-url argument and print danger warnings each time when this argument is used.
You always can use --index-url with your own proxy pip installation (which one will not return the public available versions for your internal packages).

Also it's possible add new one argument --internal-index-url (or something like this) with more clear and secure semantic:

  • always require that this repo to be accessible (to avoid problems in case of DDOS attacks)
  • make this repo prioritized over --index-url for package resolving strategy.

Alternative Solutions
It's possible to rethink --extra-index-url behaviour for more secure semantic, but my understanding is that is incorrect to change semantic for existing parameters. It's always better introduce new one for transparent transition.

@vanschelven
Copy link
Contributor

More info:

The CVE is currently marked as "disputed", because this is the "intended behavior". I suppose the remarks by
@encukou on the above redhat bug could be the source of that marking-as-disputed. I think that was always a questionable take, given e.g. the example given even back then by @cowlicks

I wonder... given the fact that we have now seen large-scale exploitation of this problem, including at companies supposedly employing the world's best engineers, would this a good time to re-evaluate that position?

@encukou
Copy link
Contributor

encukou commented Feb 17, 2021

My comments on that bug are about Red Hat's reaction to this. I think the place to fix the issue is here and that downstream redistributors shouldn't patch and alter pip's behavior.
(But I'm not a security expert.)

@uranusjr
Copy link
Member

Many seem to simplify Alex’s research to “--extra-index-url is harmful” and (in this issue) it should go away. But as clearly indicated in the article, the option in “designed to be insecure”, and it is like so for a reason: it is used by legistimate use cases. Simply removing the option will not solve anything, but only lead to the vulnarability to shift to other places, since the use cases will not magically vanish, and people will continue to need and build replacements for it.

A better approach to the issue, in my opinion, is to

  1. Provide better documentation to how existing options work.
  2. Design alternative options to handle the use cases people are incorrectly using --extra-index-url for, and guide users to them through documentation.
  3. Urge private entities (who understand the nature of the issue best) to talk about how they approach this issue, so everyone in the community can benefit.

Also note that few (none?) of the core pip maintainers are in the position to do any of the above (except almost anyone can do the first if they want to, pip maintainers or not), since the issue is, by its nature, in direct conflict to the tool that is designed to encourage community code sharing.

@heni
Copy link
Author

heni commented Feb 17, 2021

Actually I've not proposed deletion for --extra-index-url and deprecation is not equal to going feature away.
The new one variant for more secure analogue is also proposed in this issue.

My opinion is that's really important to show users security warnings about vulnerable design of --extra-index-url. It could be just the notion in the documentation, but it's better to show it as runtime warning of pip tool too.

@pfmoore
Copy link
Member

pfmoore commented Feb 17, 2021

I suggest that anyone who is genuinely interested in improving things in this area starts by doing some user research into how people use pip's existing index options. Until we have the underlying information, we cannot realistically find a good solution that handles how people want to use pip, while avoiding the situation where we make a change to "fix" this issue and in the process we break other workflows that weren't vulnerable to the problem in the first place (e.g., they read the documentation, and acted on it...)

In particular, spewing out warnings on every use of --extra-index-url, or deprecating it (which implies it will go away) will break people who are doing nothing wrong at the moment, so unless we can demonstrate that no such people exist, or that they are in a minority and they can adjust their workflow with minimal disruption, I'm against that.

@vanschelven
Copy link
Contributor

vanschelven commented Feb 18, 2021

e.g., they read the documentation, and acted on it...

The documentation does in no way explicitly indicate that you're about to set yourself up for a huge security issue. Sure, with careful analysis one might understand the implications of one's actions, but the article by mr. Birsan makes clear precisely that many people are not able to connect the dots in practice. This is true for both the location linked above, and the inline --help output.

In an attempt to move the discussion forward, I'd say the underlying problem with using --index-url and --extra-index-url is that this "offers the possibility to mix internal and public libraries into the same 'virtual' repository" -- a single namespace in which package names are being resolved.

This is a good fit for when there are multiple repositories which are trusted to be basically mirrors of each other, perhaps differing in details such as the availability of wheels or expected repository uptime.

Unfortunately, using --extra-index-url also appears to be a good solution for when the levels of trust differ and the contents of the repositories are not in fact mirrors, in particular the case in which one of the repositories is fully private and the other one is the main public one*. This is because there is no other straightforward way to install packages from a mix of private and public repositories, and because this solution works just fine until the moment you are compromised.

In terms of a solution, I would say

  1. the documentation should make these implicit assumptions about trust and mirroring explicit
  2. (preferably) there should be a recommended way to install packages in pip when the assumptions are broken that does not suffer from dependency confusion.

*mixing fully private and fully public repositories is not the only use case that breaks the implicit assumptions of --extra-index-url: there is also the case of privately controlled, publicly accessible repositories. NVIDIA's repo is one example, the previously squatted casatools package is another one, as is this odoo repository and the kivy garden In some cases, it appears that the repository owners are aware of the danger, e.g. NVIDIA's packages exist as broken-on-purpose in pypi.

@vanschelven
Copy link
Contributor

As a data-point in the above-mentioned "user research", the discussion about unsquatting casatools shows that for some people (in this case: a pypa contributor), even after the problem has been pointed out to them, they persist in believing that "extra" index is a fallback that will only be used when no package is found in the "main" index. I would take that as evidence that the existing documentation (including the names of the parameters) is insufficiently clear.

@vanschelven
Copy link
Contributor

Maybe it's also good to enumerate the legitimate use cases more explicitly.

One that has not been explicitly mentioned in the above yet is something like piwheels Note that (provided that you trust piwheels) the implicit assumptions in the above hold, because all piwheels does is mirroring your packages for an architecture that is often not supported on pypi.

@pfmoore
Copy link
Member

pfmoore commented Feb 18, 2021

The documentation does in no way explicitly indicate that you're about to set yourself up for a huge security issue.

PRs with documentation improvements are always welcome 🙂

@areitz86
Copy link

I guess this feature is used a lot to combine a private index with the public pypi. But like @vanschelven said, it's a bad idea to mix sources of different trust levels together (the same thing you do with a virtual repository that relays pypi and adds packages).
In my project, we use two different private repositories: one for our self-developed packages and repository only as (manually maintained) mirror the packages needed. We only do this, to be not exposed to dependency confusion during automated builds.
Maybe we could bias pip to install packages from the main index if found there and only go to the extra, if the dependency could not be satisfied? For legacy, this could be enabled via an additional switch?

@uranusjr
Copy link
Member

uranusjr commented Feb 24, 2021

The main designed usage of --extra-index-url is for self-hosted indexes to serve files that PyPI does not allow, such as wheels for Linux distrubitions that do not qualify manylinux, package variants not covered by wheel tags (e.g. GPUs), and alternative platforms not officially supported by package maintainers (ARM builds for RPi). The usage relies on the fact that extra indexes are treated at the same priority as PyPI (so an extra index can change a file’s priority by versioning and tagging), and always preferring PyPI would break these usages.

I think the main misunderstanding people have is to think --extra-index-url is unsafe. It is unsafe to use it to distribute private packages, because it is not designed for that. The solution should there fore be to clear that misunderstanding (maybe by renaming, offering a more obvious alternative, or even moving the functionality from pip into standalone tools), not to change --extra-index-url so it works as expected to people using it incorrectly.

@vanschelven
Copy link
Contributor

I wonder whether it is possible to come up with good heuristics to detect misuse of --extra-index-url dynamically, and emit a warning only then. Given the implicit assumption I mentioned above (which follows from the "main designed usage" mentioned by @uranusjr) I would suggest the following: pip can detect that a package is present in one, but not all, indexes. In that case it emits a warning.

2 questions remain:

  1. Would this lead to too many false positives? In particular, do indexes like piwheels faithfully mirror every package, even e.g. pyhon-only packages?
  2. How hard is it to implement this? I haven't looked at pip's source yet, but someone more familiar might know this by heart.

@vanschelven
Copy link
Contributor

Maybe we could bias pip to install packages from the main index if found there and only go to the extra, if the dependency could not be satisfied?

Just to clarify: this would remediate the issue when configuring the "main index" to be the private one, and the "extra index" to be the public one.

@uranusjr
Copy link
Member

I thought about having a heuristic as well, but didn’t manage to come up with a working logic (hopefully someone else would be able to). The main issue is that the “extra file index” and “extra name index” usages are almost in direct conflict with each other by design.

I think it’s a good idea to emit a warning (and probably turn it into an error after a deprecation period) when a package exists only on an extra index and not the main one, but then we’ll need to offer an alternative to the usage. I’m wondering whether it’d make sense to turn --index-url into a multi-value option (like --extra-index-url) so --index-url="https://pypi.org/simple https://myindex/simple" would work like how --index-url=https://pypi.org/simple --extra-index-url=https://myindex/simple currently does. This would probably be more obvious to users that the two indexes are treated with the same priority (especially if we say in the help string that the indexes are unordered).

@vanschelven
Copy link
Contributor

I’m wondering whether it’d make sense to turn --index-url into a multi-value option [..] This would probably be more obvious to users that the two indexes are treated with the same priority

+1 on this

@areitz86
Copy link

I’m wondering whether it’d make sense to turn --index-url into a multi-value option [..] This would probably be more obvious to users that the two indexes are treated with the same priority

+1 on this

While this fixes the issue of "vulnerable definition", I would still argue, that it could be beneficial to introduce something that also alles to use multiple indexes without the risk of the aforementioned dependency confusion. I get that this is the intended behavior, but tbh it does not feel like a good solution.

@uranusjr
Copy link
Member

uranusjr commented Feb 24, 2021

As we’ve said (multiple times), we are not opposed to having that, but few/none of the existing pip maintainers are in the position to propose a good design for it.

And from what we see, people that have done significant research on the topic prefer to do this outside of pip (building an in-house index and avoid direct reliance on PyPI), and we are willing to follow their advice, and assist by accepting documentation updates that clearly point users to the correct direction.

@pradyunsg
Copy link
Member

pradyunsg commented Feb 25, 2021

So... I agree with the idea here -- getting rid of the ability to tell pip to use multiple indexes.

We'd push this problem to folks who wanna do anything involving multiple indexes/cascading/whatever to tools like devpi, Artifactory etc who are in a significantly better position than us to push specific opinionated choices. In other words, this simplifies the model for what-component-is-responsible here -- pushing the problem of managing "hierarchy of indexes" to the tools are already built for solving those issues and can make opinionated choices.

We have the hammer of "it is a security concern" -- our position can then be "we can't keep this around in it's current form, and there's better tools for your workflows". So... thoughts? :)

--internal-index-url

NOPE. This is more complexity -- I'd prefer to have less. :)

@pfmoore
Copy link
Member

pfmoore commented Feb 25, 2021

getting rid of the ability to tell pip to use multiple indexes.

In principle I agree with this. More than that, I think it's an excellent idea. It fits in with my general preference, to move self contained functionality that's proven to be problematic and/or specialised out of pip and into tools dedicated to handling those cases well.

But it's a non-trivial breakage, and will without doubt cause quite a lot of complaint from users who are happily using the current functionality without problems (they are either using it as intended, or they have mitigated or don't care about the risks). I have no idea how we can publicise this sufficiently well to avoid fallout - we've already used up a lot of user goodwill with all of the (necessary) churn around the new resolver and the desupport of Python 2.

I sort of feel that this is something we should be reaching out to the user community to get their views. We could put up a survey asking for opinions (maybe options like "hard break", "add warnings but retain the existing options", "do nothing, users are happy to take the view that it's user error and not pip's issue"). But I don't know how to contact the "user resource group" that got set up for the UX work, and I've no idea how we'd extend that group to include views from some of the "big corporates" caught by the dependency confusion work (Apple, Microsoft, etc). Nor do I have a real feel what we'd do with the results beyond a heavy-handed "do whatever gets the most votes". @nlhkabu @ei8fdb Any views on this?

NOPE. This is more complexity -- I'd prefer to have less. :)

Definitely. Given that the current approach is getting misunderstood (and as a result misused) I have little confidence that something more complex will be clearer.

@pradyunsg
Copy link
Member

Consolidating this into #8606.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants