Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a string syntax instead of a dictionary #4

Closed
mgiuca opened this issue Aug 29, 2019 · 14 comments
Closed

Consider a string syntax instead of a dictionary #4

mgiuca opened this issue Aug 29, 2019 · 14 comments

Comments

@mgiuca
Copy link

mgiuca commented Aug 29, 2019

I wonder if a syntax for URL patterns would be easier to learn and read instead of designing a dictionary which takes up multiple lines just to express a simple pattern.

I'm thinking that it should basically have a syntax that looks like a URL with ? and * being special, and \? (an escaped question mark) separating the path from the query, and with special treatment being applied to the query, allowing a match in any order.

For example:

https://example.com/foo/bar/?*\?wiz=wham&zot=*

Means to match any URL starting with "https://example.com/foo/bar" or a sub-path thereof, AND which must have the "wiz=wham" anywhere in its query param list, AND it must have a zot parameter with any value. wiz and zot can appear in any order, and the URL is allowed to have other query params, and they can come before or after wiz and zot.

The counter I can think of to this is that there's potential confusion to the user in that it looks like a glob or regex that would require those query parameters to appear in that order and first in the param list, where in fact we are creating a new type of URL-specific glob that treats query parameters separately (allowing them to appear in any order). Also having to write /? is ugly and you might forget the slash and create an entirely different meaning.

The advantage of this would be that you can use it in both Service Worker and Manifest (a JSON dictionary) with the exact same syntax, since you'd be defining a syntax for a "URL pattern" rather than a class URLPattern (requiring a separate explanation of how to express it in a JSON dictionary).

@wanderview
Copy link
Member

So, I started with a flat string representation, but moved to the dictionary for the follow reasons:

  1. There are more character conflicts if we use a single flat string. For example, a path cannot have a ? normally, but a full URL string can. Many libraries avoid this by simply ignoring search strings completely, etc, but we can't afford to that here.
  2. There does not seem to be a good way to distinguish between search parameters that can be re-ordered vs not. For example, legacy scope behavior requires exact matching of the order, but we want to support a mode that allows re-ordering.
  3. A dictionary is easier to extend in the future to add more features to URLPattern. For example, if at some point URLPattern starts supporting variable origins, that could be done through a wildcard in a host dictionary field, etc.

I expect to get more feedback at TPAC around all the issues you filed and will update more after that point.

@mgiuca
Copy link
Author

mgiuca commented Aug 30, 2019

There does not seem to be a good way to distinguish between search parameters that can be re-ordered vs not. For example, legacy scope behavior requires exact matching of the order, but we want to support a mode that allows re-ordering.

Wow. I did not realise legacy scope match allowed you to specify query parameters, and then mandated that they appear at the start of the URL in the exact order specified in the scope. That's ... pretty nasty. (And it isn't explicit in the spec, it's just implied by the fact that the scope is a URL prefix.)

So it turns out the manifest scope algorithm doesn't match service worker, because in the manifest, we only match the origin and path.

I don't see any merit in a query match that cares about the order of query parameters (it just seems like a way to let developers shoot themselves in the foot; e.g., using a hash table to generate query parameters resulting in a pseudo-random parameter ordering which sometimes matches and sometimes doesn't). Ideally, the new pattern would only let you match in any order, leaving the exact order matching as an unfortunate feature of the old match algorithm.

Edit: And hence, for point 2, I think we could define a string syntax that always (explicitly) ignores the order that you specify in your pattern, and always allows parameters in any order.

For point 3, we could add that to the string syntax also, e.g., in the future allowing "https://*.example.com" or even "https://foo*.example.com".

@wanderview
Copy link
Member

Well, we have to support legacy web features whether we consider them useful or not. I don't think we can just remove it.

For point 3, we could add that to the string syntax also, e.g., in the future allowing "https://*.example.com" or even "https://foo*.example.com".

There are other extensions that would not be so easily supported in a single string. One that comes to mind:

What if we wanted to add the option to "OR" search parameters together. If we devote normal search syntax to "AND" logic then it seems we'd have to create some crazy new syntax to reflect the other logical mode in a flat string. In comparison, a dictionary makes this easy by adding a mode value or a separate orParams value.

Also, from an implementation point of view I think a dictionary approach would be much more manageable. We can keep individual pattern string syntax relatively simple which limits the risk of introducing security errors in the parser. It also makes it easier to define a small set of features today that can be extended in the future.

@mgiuca
Copy link
Author

mgiuca commented Sep 2, 2019

Well, we have to support legacy web features whether we consider them useful or not. I don't think we can just remove it.

Which legacy web feature are you referring to? The fact that SW scope currently matches a string prefix including the query? Or something else?

I'm not suggesting we remove that from the old syntax, but that it isn't exposed in the new URLPattern syntax. You'll still be able to do "scope": "/foo?bar=baz" which means "only match a URL with a bar=baz query that must appear first in the query parameter list". But if you switch to a URLPattern type of scope, you won't be able to specify that any more; you'll only be able to say "only match a URL with a bar=baz query, anywhere in the parameter list". The new URLPattern does not need to support all the features of the old scope syntax (if we deem them to be a mistake), since there is no backwards compatibility requirement.

The other points make sense.

@wanderview
Copy link
Member

The fact that SW scope currently matches a string prefix including the query?

Yes

I'm not suggesting we remove that from the old syntax, but that it isn't exposed in the new URLPattern syntax.

Currently we have a consistent internal representation of the pattern whether the user provided a legacy scope string or not. Its always represented internally as a pattern. That pattern is exposed as registration.scopePattern in the current proposal. If we can't represent the legacy scope as a pattern then we can't populate that attribute.

Making legacy and new pattern scopes support a disjoint set of things will complicate the implementation. It will also make it harder to create a consistent API like scopePattern that code can use to always get a known type out.

@mgiuca
Copy link
Author

mgiuca commented Sep 5, 2019

That's understandable, but I'm also not sure we should create a new feature that lets people do something we all agree is bad, just to simplify the implementation.

The solution to the implementation could be that there's an internal member on that dict called "requireQueryFirst" or something, which is set to true when parsing a legacy scope, but cannot be set from user-space when defining a URLPattern (it's always false).

If you go with this implementation, you need to find a way to squish all possible query strings into the key/value pattern data structure. Are you prepared to make the data structure that flexible just to accommodate legacy query prefix matching?

You'd need to allow patterns for (starting each one with a specific example of a legacy scope):

  • /foo: No preference as to whether there's a query or not.
  • /foo?: '?' required but no specific query parameters required.
  • /foo?bar: At least one query parameter, where the first query parameter's key starts with a given prefix, and which may or may not have an '=' symbol (it may not be a key=value pair, just a string without an =).
  • /foo?bar= or /foo?bar=baz: At least one query parameter, where the first query parameter's key exactly matches, and whose value starts with a given prefix. (Note that in the /foo?bar=baz example, this is actually checking that the value starts with baz, not that it is baz.)
  • /foo?bar=baz&boo or /foo?bar=baz&boo=hoo: Two or more query parameters, with an exact order required and an match required on the key and value for all but the last one, and the last one being matched with either "key prefix and optional =" or "exact key and value prefix" mode.

That seems like it has some edge cases that may not be supported by the current URLPattern proposal. I feel like it's simpler if you just create a separate implementation for legacy scope matching, at least for the query string part. You could, for example, have an option of either using the query matching dictionary, or treating the query as an opaque string and doing a prefix match (which would only be usable in the implementation for legacy scopes, not exposed in the public API).

@wanderview
Copy link
Member

wanderview commented Sep 5, 2019

The solution to the implementation could be that there's an internal member on that dict called "requireQueryFirst" or something, which is set to true when parsing a legacy scope, but cannot be set from user-space when defining a URLPattern (it's always false).

True. We could hide the legacy behavior in unexposed logic within URLPattern. It would basically be hiding the URLPattern({search: { value: 'foo' }}) part of the original proposal.

If you go with this implementation, you need to find a way to squish all possible query strings into the key/value pattern data structure. Are you prepared to make the data structure that flexible just to accommodate legacy query prefix matching?

I'm sorry, I got a little lost at this point.

It seems to me that the original proposal can largely be implemented by parsing the URL to matched against using URLSearchParams. We then effectively use URLSearchParams.get(), etc. Maybe we would have to make param keys non-variable to make this reasonable, etc.

Anyway, I think the discussion of whether to hide the service worker prefix matching of search params from URLPattern is a different issue from the original topic here.

@wanderview
Copy link
Member

In regards to string-vs-dictionary, I would be open to a convenience form that takes a string, but optionally also takes a dictionary to handle future extensions. My impression, though, is the manifest implementation would prefer not to have two possible types passed for a field. That is another reason I settled on the dictionary as it provides a single type for manifest and the extensibility I think we will need in the future.

I don't think we'll be able to resolve this before the TPAC face-to-face, though. I'd really like to get a wider set of feedback around the tradeoffs. Also, if folks would prefer to remove URLPattern completely and do something service worker specific then it would be easier to do a simple string. Basically, I need agreement on use cases before we can really resolve this.

@mgiuca
Copy link
Author

mgiuca commented Sep 6, 2019

The solution to the implementation could be that there's an internal member on that dict called "requireQueryFirst" or something, which is set to true when parsing a legacy scope, but cannot be set from user-space when defining a URLPattern (it's always false).

True. We could hide the legacy behavior in unexposed logic within URLPattern. It would basically be hiding the URLPattern({search: { value: 'foo' }}) part of the original proposal.

If you go with this implementation, you need to find a way to squish all possible query strings into the key/value pattern data structure. Are you prepared to make the data structure that flexible just to accommodate legacy query prefix matching?

I'm sorry, I got a little lost at this point.

Sorry, that was pretty confusing. I've edited my comment to show specific examples of each. The point is that because SW scope is currently a "dumb" prefix match, that actually implies quite a wide variety of different "modes" that a "smart" query matcher would need to emulate, and I'm not sure your current proposal captures all of those modes. Also, I'm not convinced that it's desirable to do so, since almost every single one of those (if not all) makes no sense. (There's no way to require a parameter bar=baz; you can only require that bar starts with baz.) I'm suggesting that it might be a fool's errand to try and capture all of those "modes" in the new data structure, and instead simply have a separate code path for processing legacy scope strings.

I've also just made a proposal to deprecate query matching. Based on this discussion, I think it's so broken that it can't be used correctly, so we should just (attempt) to remove it, which would make things much easier both for this effort, and the unification of the manifest matching algorithm with this.

(Again, I won't be at TPAC unfortunately, but I look forward to the outcome of these discussions.)

@wanderview
Copy link
Member

My original proposal includes a "search value option" that operates on the search string including the leading "?". Its mainly there to provide back-compat with the service worker scopes.

The "search params option" is mutually exclusive with the value option. The params option is the proposal for more sane behavior that allows matching regardless of ordering, etc.

I think between these two modes we can meet all constraints. If we deprecate the legacy scope behavior we could hide or drop the value option.

@wanderview
Copy link
Member

(Again, I won't be at TPAC unfortunately, but I look forward to the outcome of these discussions.)

Understood. I'll do my best to represent your position. Also, my impression is that @fallaciousreasoning from your team will be there. Is that correct?

@mgiuca
Copy link
Author

mgiuca commented Sep 13, 2019

Yep!

@mgiuca
Copy link
Author

mgiuca commented Sep 13, 2019

value vs params sounds good (avoids the need to have params represent all possible prefix matches).

@wanderview
Copy link
Member

I summarized the TPAC discussion here:

w3c/ServiceWorker#1468 (comment)

I'm not sure the minutes call it out explicitly, but I asked many people for their opinion on string-vs-dictionary. The unanimous feedback I got was that we should use a dictionary and that a flat string was probably not practical.

I was a little concerned about header representations of the dictionary, but I'm told there is an effort to support this with "structured headers". People did not seem concerned.

Based on that consensus I'm going to mark this issue WONTFIX for now. Sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants