Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarl.URL doesn't recognize BACKSLASH as host-path separator #253

Open
behnam opened this issue Oct 15, 2018 · 3 comments
Open

yarl.URL doesn't recognize BACKSLASH as host-path separator #253

behnam opened this issue Oct 15, 2018 · 3 comments

Comments

@behnam
Copy link
Contributor

behnam commented Oct 15, 2018

Having "https://google%2Ecom\.yahoo.com/" as URL, both Chrome and Firefox resolve it as "google.com" to be the domain, which is (most probably) what the URL spec is defining.

Right now, yarl doesn't recognize that:

In [8]: yarl.URL(r"https://google%2Ecom%2F.yahoo.com/").host
Out[8]: 'google%2ecom%2f.yahoo.com'

I think it's important to fix this, specially from a security perspective. What do you think?

@aio-libs-bot
Copy link

GitMate.io thinks possibly related issues are #242 (Handle path argument of URL.build, which doesn't start from /), #84 (Incorrect handling of '..' in url path), #185 (Allow joining URL and pathlib.Path), #156 (URL.build doesn't url encode credentials), and #143 (YARL does not support link-local ipv6 addresses).

@asvetlov
Copy link
Member

  1. WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.
  2. IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.
  3. The backslash is never considered as a separator

@behnam
Copy link
Contributor Author

behnam commented Oct 16, 2018

WhatWg is not a spec but a set of recommendations. The recommendations are sometimes controversial and sometimes conflicts with RFC specs.

Well, whatever we call them, it's a good specification of the behavior we get from the most common client implementations. Not sure how the numbers are on the server side, but there are many other libraries making these behaviors consistent across the board.

IIRC Percent Encoding is not allowed in Domain part, it should use IDNA encoding.

Sure. And I don't disagree. But yarl.URL doesn't throw an exception, either. The RFC says it's not allow, you also agree it's not allowed, but it just gets parsed and assigned into the host field and there are no errors reported.

The backslash is never considered as a separator

Well, it depends who you ask, right? Since it looks like on any major web browser, they get converted to SLASH, for some backwards compatibility reason.

From a server-side perspective, I understand that's it's not a favorable thing to support. But would actually allow using the library for areas that have browser-side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants