Constructor string parser has confusing result for paths containing a colon. #210

jeremyroman · 2024-01-23T05:32:50Z

What is the issue with the URL Pattern Standard?

The colon is treated as defining a protocol, even in cases where the URL parser would not (because it looks for a leading alphabetic character).

Consider, e.g., new URLPattern('/scope\\:0/*', 'https://example.com/') (note that the : is already escaped for tokenizing purposes). This fails - in Chromium, the error reads:
TypeError: Failed to construct 'URLPattern': Invalid protocol pattern '/scope'. Invalid protocol '/scope'.

This is because of how the init branch ends up trying to figure out if there's a protocol. It took me some time to figure out how to work around this (use {} around the literal to prevent the parser from thinking too hard about it, basically).

But the URL parser doesn't have this issue. Should we do something similar here? While in general it's hard to tell if the first character is alphabetic because it might begin with a wildcard or regex of some kind, in the majority of cases it's not ambiguous and we could succeed. For example, we might simply say that if it begins with a char token and that token is not alphabetic, we don't try to treat it as a protocol. Anything along those lines would fix the common case of a relative URL beginning with a /, ? or #.

However, genuinely relative ones are still ambiguous and maybe it is more consistent and simpler to just fail. An example might be new URLPattern('http:bar', 'https://example.com/') could hypothetically be trying to refer to a file in the same directory named http:bar.

But the URL parser will already trip you up if you're trying to do that, so I'm not too worried. I think all the cases where my proposed resolution changes things are cases where the pattern would previously have been invalid, so it should be compatible to change.

Thoughts?

The text was updated successfully, but these errors were encountered:

jeremyroman · 2024-01-23T05:39:44Z

For reference, the URL spec's handling of this is at https://url.spec.whatwg.org/#scheme-start-state

rotu · 2024-08-06T17:57:01Z

There are some other examples that may be relevant from the URL spec:

e.g. new URLPattern("hello:world", "https://example.com/") resolves to {protocol:"https", hostname:"example.com", pathname:"/hello:world"}

but it should have protocol hello and path world.

e.g. new URLPattern("urn:isbn:9780307476463") resolves to {protocol:"urn:isbn", hostname:"", pathname:"9780307476463"}

but it should have protocol urn and path isbn:9780307476463

jeremyroman mentioned this issue Jan 23, 2024

Escape path prefix and restrict it to be a pathname in Speculation Rules WordPress/performance#951

Merged

3 tasks

jeremyroman added a commit to jeremyroman/eleventy-plugin-speculationrules that referenced this issue Feb 27, 2024

Handle the pathological case described in whatwg/urlpattern#210.

49ed129

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constructor string parser has confusing result for paths containing a colon. #210

Constructor string parser has confusing result for paths containing a colon. #210

jeremyroman commented Jan 23, 2024

jeremyroman commented Jan 23, 2024

rotu commented Aug 6, 2024

Constructor string parser has confusing result for paths containing a colon. #210

Constructor string parser has confusing result for paths containing a colon. #210

Comments

jeremyroman commented Jan 23, 2024

What is the issue with the URL Pattern Standard?

jeremyroman commented Jan 23, 2024

rotu commented Aug 6, 2024