-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RFC7617-compliant multi-domain basic authentication #10904
Conversation
Apologies if I did not follow all the necessary information - I submitted it as a draft to facilitate #10902 discussion - happy to add whatever is needed to make it fully compliant with the requirements after we decide in the discussion if this is valid change or not. |
48cca6d
to
fed5ab8
Compare
In the latest fixups I've also added a missing test for the 401 case - I am mocking user interaction with the user. |
3eca90a
to
b8c1120
Compare
I addressed all doc build and static check failures and rebased to latest main. |
I would love to get the workflows approved to see if all is cool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, this should work.
I thnk there was a timeout on some windows test but it seems like a flake. Let me close/reopen to rebuild it (but I believe it will need to get workflow re-approved). |
Seems like all tests passed. |
And chances to merge it ? |
Hello, are we going to merge it? |
@potiuk Could you rebase this PR with |
Sure. No problem. |
de43b07
to
d0d2af6
Compare
Done. |
d0d2af6
to
5d8a1fa
Compare
Fixed static checks too with pre-commit. Rebase does not run pre-commit by-default unfortunately (only when there is a conflict). |
This is now failing |
Yep. Looking at it - I guess it's because of the recent changes in netrc prioritization - the test was added yesterday eacc739 |
5d8a1fa
to
d15d5f7
Compare
Right - the problem was that the test made wrong assumption about non-RFC7617 compliant behaviour: Before the RFC7617 fix:
After the RFC-compliant change:
The fix was to make the "/files/" to be "/simple/files" which makes proper (RFC7617-compliant) authentication. I believe (please correct me if I am wrong) this is in-line with https://peps.python.org/pep-0503/:
I think the "index" in simple should return only URLs starting with "/simple". Likely (If I am right) other tests in this file could be also corrected to follow the corect "/simple/files/" pattern but I think it is not part of this PR. |
One worry here when I thought about it (and it's worth to make a conscious decision): I think this might potentially break some repos that are not exactly https://peps.python.org/pep-0503 compliant. If someone actually implemented those repos with the same "mistake" that the one who created the tests did (i.e. exposed the files directly at / file of the server, while follow the "/simple/" prefix approach from the PEP and expected the authentication to be reused for those /* downloads). There is this comment there which indicates that the implementation might host files anywhere:
PyPI itself hosts the files with completely unrelated URLs. And many repositories might do the same (in which case the user/password authentication might not be at all used and likely the file urls generated are already pre-authenticated/signed or contain authentication information). Even previously, before my fix, the user/password would only be "reused" (accidentally as it is not standard-compliant) to authenticate to download files when the files are hosted at the same URL. But that was not stated anywhere (and is against the So there is a small risk that some implementations accidentally rely on this (even if it is against the standards). But I leave to maintainers to decide (knowing the risks). Maybe we should add a flag that "restores" previous behaviour just in case? |
IMO we have to allow for URLs outside of those which are specified by the simple index API (such as files) to be served from any path on the server - as you noted, PEP 503 explicitly allows for this possibility with the comment
In particular, I don't think it's OK to require users to specify a flag to allow that behaviour - there's no reason to assume that the user even knows the URL structure of the server, or even that it won't change (if the server changes its layout, why should the user have to add a flag?) |
They are allowed, it's just the authentication fthat is specified wiht "/simple" does not carry now to anything hosted with "/" - which is the "right" way according to internet standard of basic authentication. The PEP 503 does not mention anything about authentication, so it's not "against" PEP 503. However the previous authentication approach was not conforming to the RFC standard for basic authentication that far predates any PEP. So if anyone served files from "/" - where the URL they used were I am not at all sure if anyone was doing it - if they did, this worked because the authentication was not following the well established and security-reaoned standard, so if we decide to add a flag, this is more of an escape hatch until someone fixes the problem, But I am also ok with not providing the flag but simply switching |
What do you propose then @pfmoore ? |
Actually - I jus realized, that there is no need for an escape hatch. If you really hit the problem, you can always (as expected with the standard) add user/password for "https://URL" in pip.conf (or use other ways It does not have to be carried from the "/simple" URL. So there is no need to add any switch, but maybe explaining in the docs what to do if you hit this problem. (I am just trying to anticipate and prepare for any questions users might have). I am happy to add docs for it. WDYT? |
I’d say the situation you described is just a bug and should not have worked in the first place, so describing this in the documentation (with a link from the changelog perhaps) should be appropriate. |
I do not consider myself a security expert, so I have no opinion on this. |
d15d5f7
to
d643123
Compare
I pushed a documentation update to explain it. I am not sure if I should add Changelog entry myself or leave it up to the relase manager? |
d643123
to
e5e9e0f
Compare
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
e5e9e0f
to
cf285b9
Compare
Any thoughts here - I know @pradyunsg have some compatibility concerns and I share them. I think what we might do is to add an exception to "/simple/" prefix. We could treat it in non-RFC compliant way. The risk connected with that is very low but if someone follows theh "copy" of standard |
Any chance this one will make it in the upcoming release ? Seems that there is at least one other issue #10806 that is triggered by this non-compliant behaviour and it seems merging that one would solve the issue (I looked briefly but it seems that the problem #10806 is triggered by non-RFC7617 compliant auth reuse. |
@potiuk could you pls also check this #10979 (comment), I think I've made useless change in that pr, and with RFC-compliant urls it works out of the box. |
I believe your change fixing #10979 was useful regardless of the RFC7617 change. The problem you described would still not be solved in this case. If you added .netrc auth at the base "URL" and .netrc would be prioritized over URL credentials, even with RFC7617 change it would work the same. RFC7617 states that authentication for "http://my-url" is also valid for "http://my-url/project/path" and (if I understand it correctly) the problem #10797 was that the .netrc configuration was used first when getting credentials to download file (and without your change it would be the same). However after looking in detail of your log from artifactory, I think I will have to update the fix - see the more detailed description of the worry I had #10904 (comment) and also @pradyunsg was right to be a little reserved about it. It seems actually artifactory implements what I was afraid of a bit. The "/simple" suffix is not treated as a real part of the repository URL and makes the implementation "nearly RFC7617 compliant": First the meta-data downloading is done with And then the file is downloaded using: This means that the authentication from the first request will NOT be reused in the second. The "my-pypi-repo/simple-package/" is different authentication scope than "my-pypi-repo/simple/simple-package/". The problem here is that artifactory does not treat the first authentication request as "authentication scope". They act as if the "https://us-python.pkg.dev/$PROJECT/my-pypi-repo/" is the "authentication scope" of the first request. Even if the first URL to call is really "https://us-python.pkg.dev/$PROJECT/my-pypi-repo/simple/simple-package". And actually - what artifactory does, makes sense. The authentication scope SHOULD be https://us-python.pkg.dev/$PROJECT/my-pypi-repo/ - this is the scope where you specify credentials. But currently we have no good way to handle this - because we just see the first URL requested by The confusion is caused - I think by the "/simple" suffix which is used by PyPI but on one hand it is not mandatory but on the other hand it is widely used. I looked at artifactory docs and well .. there is a lot of explanation on when you should use simple and when not . They seem to be confused too ... https://www.jfrog.com/confluence/display/JFROG/PyPI+Repositories. This comes from all the ( I think welll known) ambiguity around "/simple" coming from https://peps.python.org/pep-0503/. We all know that .pypi rc index-url contains full URL including /simple, but pip search uses I think there is no "perfect" simple solution because of how the "/simple" prefix can be interpreted by different implementations of PyPI repository On one hand it is used by PIP/PYPI and different implementation could have mimicked it (as jfrog did), on the other hand the real "authentication scope" is the one that --index prefix should normally point to and PEP0503 does not make any assertions about that. It could be for example (theoretically) that I think we can only try to guess currently. The current "guess" that is done by PyPI is that authentication scope is "domain name" (and also it does not include schema - http and https are equivalent). But it does not handle well the case where one domain can host multiple repos with different authentication (which is the jfrog case actually) My change moves the authentication scope to "somerepo/subdir/simple" which is also wrong - in case when someone (and again jfrog ACTUALLY did it) assumes and uses the fact that the real authentication scope is "somerepo/subdir". I think we simply should get better at guessing (because we have no way to know for sure what is the real "authentication scope"). My initial proposal is that we check if the first URL path ends with "/simple" and if so - we could set "authentication scope" to be the prefix of "/simple". Otherwise we assume full URL is the "authentication scope". I think this is not "perfect" but "good enough". For example artifactory allows (!!!) to change the suffix ..... Yeah. You can actually change "/simple" to something else :( . And we have no way of knowing that. Ideal fix (and maybe we could tackle it together or leave for later) should be that we should be able to specify the "suffix" in PyPI as well. @pradyunsg @uranusjr WDYT? |
Converted to Draft to avoid accidental merging. |
@potiuk def test_netrc(
script: PipTestEnvironment,
data: TestData,
cert_factory: CertFactory,
) -> None:
cert_path = cert_factory()
ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
ctx.load_cert_chain(cert_path, cert_path)
ctx.load_verify_locations(cafile=cert_path)
ctx.verify_mode = ssl.CERT_REQUIRED
server = make_mock_server(ssl_context=ctx)
server.mock.side_effect = [
package_page(
{
"simple-3.0.tar.gz": "/files/simple-3.0.tar.gz",
}
),
authorization_response(str(data.packages / "simple-3.0.tar.gz")),
]
url = f"https://{server.host}:{server.port}/simple"
netrc = script.scratch_path / ".netrc"
netrc.write_text(
f"machine {server.host} login USERNAME password PASSWORD"
)
with server_running(server):
script.environ["NETRC"] = netrc
script.pip(
"install",
"--no-cache-dir",
"--index-url",
url,
"--cert",
cert_path,
"--client-cert",
cert_path,
"simple",
)
script.assert_installed(simple="3.0") |
I think one of the things we can do is... to not go about fixing this in one go. Rather, let's treat this as a potentially-disruptive change that we give opt-in and opt-outs for. With that, the rollout of this would look like: Release X:
Release X+2:
Release X+4:
To set expectations here -- I'm likely not going to have time to look at this for about a month after today since I'll likely be AFK during that period. :) |
Yeah. I will think a bit more about it too and maybe look in the code of I think the real intention here - for sites that support multiple So in the case here: For project/repo
But they will never, ever be served (if authentication is required to downlad the files) under:
Because THAT would violate the intention of spearating packages and files per repos (and RFC7617). Currently, the problem is that when we see the authentication information for the first time, we just see However we actualy know at this point that So one other solution I see is to add extra information on the "base" URL of the repository we are actually reaching out to, to the part of code where we cache credentials - and then we could cache credentials precisely for that "repo" URL. This would be working fine, regardless from the suffix chosen (whether it is The only potential problem I see is if someone develops a solution hosting multiple However (unlike the previous cases) this is wrong according to RFC7617, because even in I think that might work I will try to take a closer look at that while you are AFK @pradyunsg . Have fun. |
Well, it's feasible that the files that you download are behind different credentials than the index server. The following (imagined) situation is a completely valid implementation of co-operating indexes: A
A
IIUC, pip's current behaviour is to use credentials for IIUC, the whole request is that we should to allow users to set different credentials for each of them, picking the correct one based on which one matches the prefix best. This means that users now need to specify credentials for If we require it to be under the same prefix, then we shouldn't be keeping track of what index they've come from. It's one-or-the-other. I also prefer the "assume things are under the same prefix" model, FWIW. As long as we have the opt-in/opt-out transition model for rolling this out, I think the current implementation (which, IIUC, does that) is fine and likely better long term than a pip-specific hacky solution that makes an assumption that could be wrong. |
Honestly though, after all this discussion, I feel like it's probably fine to leave things as-is. This is the exact same model as the netrc file -- where the login credentials are set by (basically) a domain name. |
I think that actually wrong as it leaks credentials. This problem is not solved. |
Actually - as you wish @pradyunsg - I thought about it and since it's not needed for me, but if the maintainer thinks this is not an issue and accept the risks involved - who am I to argue with it :). Closing it then. |
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2
defines multi-domain authentication behaviour and authentication
scopes for basic authentication. This change improves the
implementation of the multi-domain matching to be RC7617 compliant
Closes: #10902