Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructor string parser has confusing result for paths containing a colon. #210

Open
jeremyroman opened this issue Jan 23, 2024 · 2 comments

Comments

@jeremyroman
Copy link
Collaborator

What is the issue with the URL Pattern Standard?

The colon is treated as defining a protocol, even in cases where the URL parser would not (because it looks for a leading alphabetic character).

Consider, e.g., new URLPattern('/scope\\:0/*', 'https://example.com/') (note that the : is already escaped for tokenizing purposes). This fails - in Chromium, the error reads:
TypeError: Failed to construct 'URLPattern': Invalid protocol pattern '/scope'. Invalid protocol '/scope'.

This is because of how the init branch ends up trying to figure out if there's a protocol. It took me some time to figure out how to work around this (use {} around the literal to prevent the parser from thinking too hard about it, basically).

But the URL parser doesn't have this issue. Should we do something similar here? While in general it's hard to tell if the first character is alphabetic because it might begin with a wildcard or regex of some kind, in the majority of cases it's not ambiguous and we could succeed. For example, we might simply say that if it begins with a char token and that token is not alphabetic, we don't try to treat it as a protocol. Anything along those lines would fix the common case of a relative URL beginning with a /, ? or #.

However, genuinely relative ones are still ambiguous and maybe it is more consistent and simpler to just fail. An example might be new URLPattern('http:bar', 'https://example.com/') could hypothetically be trying to refer to a file in the same directory named http:bar.

But the URL parser will already trip you up if you're trying to do that, so I'm not too worried. I think all the cases where my proposed resolution changes things are cases where the pattern would previously have been invalid, so it should be compatible to change.

Thoughts?

@jeremyroman
Copy link
Collaborator Author

For reference, the URL spec's handling of this is at https://url.spec.whatwg.org/#scheme-start-state

jeremyroman added a commit to jeremyroman/eleventy-plugin-speculationrules that referenced this issue Feb 27, 2024
@rotu
Copy link

rotu commented Aug 6, 2024

There are some other examples that may be relevant from the URL spec:

e.g. new URLPattern("hello:world", "https://example.com/") resolves to {protocol:"https", hostname:"example.com", pathname:"/hello:world"}

but it should have protocol hello and path world.

e.g. new URLPattern("urn:isbn:9780307476463") resolves to {protocol:"urn:isbn", hostname:"", pathname:"9780307476463"}

but it should have protocol urn and path isbn:9780307476463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants