-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The spec doesn't seem clear on how to handle "incomplete" hostnames #694
Comments
@rushmorem Thanks for opening this. This is the wildcard problem originally captured at https://bugzilla.mozilla.org/show_bug.cgi?id=1124625#c6 and more broadly documented at https://wiki.mozilla.org/Public_Suffix_List/platform.sh_Problem |
Thanks @sleevi. That clarifies it. I'm rewriting my Rust implementation, so I wanted to know the correct way to handle this. I think adding these to the official test case would help iron out the differences in implementations. What do you think, should I submit a pull request? |
I’m not sure the pull request - the answer for “which is correct” hasn’t
quite been resolved yet across implementations, nor do we know which
“should” be correct.
…On Fri, Jul 20, 2018 at 23:23 Rushmore Mushambi ***@***.***> wrote:
Thanks @sleevi <https://github.com/sleevi>. That clarifies it. I'm
rewriting my Rust implementation, so I wanted to know the correct way to
handle this. I think adding these to the official test case would help iron
out the differences in implementations. What do you think, should I submit
a pull request?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABayJ-1ST4ofWq8qaj2uZ3xoU5AstVcSks5uIef4gaJpZM4VWJk5>
.
|
According to that Wiki, you linked to:-
So I thought this was already decided. In any case, I think the spec should clear this up one way or another. |
In the following, when I say "loose interpretation", I mean the one where the rule On the other hand, there's the "strict interpretation" which takes the current rules literally, such that the rule Let's assume that a client has access to a function to look up the public suffix using strict interpretation. If the client is interested in the loose interpretation, it can first look up the public suffix for If the lookup function implements loose lookup, then the client's ability to determine whether The strict interpretation (= literal interpretation of the current algorithm) therefore gives greater flexibility to the client, without the list showing prejudice regarding what the use case will be. I think it's a good thing for the list to not make assumptions about the use case. Based on the documents linked here, the Chrome implementation appears to follow the loose interpretation. One solution for the problem could be to define the algorithm as strict, with Chrome (implicitly) adhering to the "two-tiered lookup" described above. This is equivalent to the loose interpretation, and the contradiction is removed. (In the case where |
On Tue, Jun 11, 2019 at 3:51 AM Peter Thomassen ***@***.***> wrote:
If the lookup function implements loose lookup, then the client's ability
to determine whether platforms.sh itself is on the PSL is lost entirely.
I think that is making assumptions about the service that aren’t specified.
The service could implement the loose lookup itself and return appropriate
results - not allowing for “holes” in the namespace.
The strict interpretation (= literal interpretation of the current
algorithm) therefore gives greater flexibility to the client, without the
list showing prejudice regarding what the use case will be. I think it's a
good thing for the list to not make assumptions about the use case.
I don’t agree with this being good. I think this would be very bad. Can you
explain more why you think it would be good?
|
There is no assumption about the service here. In my original post, I wrote:
This is the assumption that there may be PSL client / library / other implementation existing already now that outputs the public suffix according to strict interpretation. This is an assumption not about the PSL service, but an assumption about the existence of existing implementations. Actually, it's a fact, as I know of at least one implementation that works this way, and @rushmorem said something similar in the initial post. If the meaning of the On the other hand, the strict algorithm allows emulating the loose interpretation by first getting the public suffix of (Arguably, this is what Chrome does implicitly already, according to the Mozilla Wiki article -- maybe not with the two-step approach, but nevertheless, the implementation has chosen to interpret the PSL like this, and could continue doing so even if the algorithm's definition was clarified to mean the strict interpretation: In this case, even if Chrome decided to migrate to a new, strictly compliant PSL library, the two-step approach described in the previous paragraph would recover the loose interpretation's behavior, resulting in no change as far as Chrome's use case is concerned.) The converse is not true: If the algorithm was changed to follow the loose interpretation, so that a wildcard rule's parent is always a public suffix as well (barring an exception rule), then that would reduce flexibility in the sense that implementations could not anymore decide which interpretation they want to implement. All implementations would follow the loose interpretation (permanently breaking pre-existing implementations that relied, say, on the non-publicness of Now, in turn, it is unclear why that would be desirable, as it removes the choice on implementation level. There may be use cases where the strict interpretation is preferable, and those would be thwarted by imposing the loose one. This stems from the fact that if rules denote public-suffix policy not only about domain names with the same number of labels (dots) as in the rule, but instead also make statements about domains with a different (lower) number of labels (dots) as in the rule, the level of granularity is reduced. In the strict interpretation, granularity is higher. As a result, one can retrieve all "loose statements" from "strict rules" (you may have to check the So, defining the algorithm by the loose interpretation has the following cons:
On the other hand, the strict interpretation does not have these downsides, while allowing for either use case: With the strict interpretation, you actually get (the possibility to have) both.
It would be good because of the above reasons. Why would it be bad? |
As I commented in #1986, the conflicting rules between the wiki and the test case/linters should be resolved. I believe the test case and linter are correct, supported by implementations and intended use cases. The existing implementations that do not follow the test case should not be a reason to leave the conflicting rules in this repository. I believe it would be better to have one self-consistent rule and put a notice about the possible differences between implementations instead. |
Couple things at play here. Sometimes both at once, but it is often one or the other. 1] Epochs / Legacy entries 2] What we proclaim vs implementation choices (aka "Browsers are gonna do what browsers are gonna do") In some applications, the loose interpretation is adequate. In others the strict is much wiser. This is a ultimately just a text file. |
By "incomplete" hostname, I mean a hostname that's entirely part of some rule or rules but does not have enough labels to match the rule or rules entirely. Examples of such hostnames are
yokohama.jp
andkobe.jp
.The relevant rules for those hostnames are:-
I have seen these two, interpreted differently by at least two implementations and I understand how it can go either way.
libpsl
returns the public suffices for those domains asyokohama.jp
andkobe.jp
respectively. Servo'snet_traits
crate, however, returnsjp
for both, which leads to weird test cases like these.What's the official position on this?
The text was updated successfully, but these errors were encountered: