Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named validation errors #502

Merged
merged 9 commits into from
Feb 20, 2023
Merged

Conversation

TRowbotham
Copy link
Contributor

@TRowbotham TRowbotham commented May 8, 2020

This addresses #406.

By my count, there are 54 different places where a validation error can occur with the majority of them being unique. This adds a table with descriptive error codes for each unique validation error, which also has an associated description of the error.

  • Would there by value in adding a column indicating whether the validation error can cause the parser to fail?
  • I am considering adding example input that would trigger each validation error. Is this something you think would have value?
  • The unexpected-windows-drive-letter-host needs an error description still.
  • I wasn't sure how best to link the validation errors in each spot, so I just ended up doing <a><unique error name></a> <a>validation error</a> everywhere. I had considered a prefix or suffix of validation-error to every code, but felt it was overly verbose, though I'm not opposed to it if someone feels its more appropriate.
  • "Error Message" should probably be renamed to "Error Description".
  • Some of the error descriptions could probably use a little more detail, but I wanted to get this initial draft posted.

Preview | Diff

@domenic
Copy link
Member

domenic commented May 8, 2020

This looks amazing! Giving my opinions on some of your questions, but you should probably wait for @annevk before acting on them:

Would there by value in adding a column indicating whether the validation error can cause the parser to fail?

IMO yes. I'm thinking a "Fatal" column which is either empty or one of the unicode checkmarks.

I am considering adding example input that would trigger each validation error. Is this something you think would have value?

IMO yes. Doing so in a way that makes things non-overwhelming might be a bit tricky. E.g. I think another column would probably not work well. Maybe a second row (making each error code rowspan="2") with class="example", similar to https://html.spec.whatwg.org/#usage-summary-2 ? Or maybe just putting in <p class="example"> wouldn't be so bad.

Relatedly, although I imagine you are following the pattern from HTML, maybe the descriptions could be tightened up by removing "This error occurs when".

I wasn't sure how best to link the validation errors in each spot,

I like what you did. It provides a convenient link to the concept of validation error, which is important as people can be confused by parse error vs. validation error.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for putting this together!

Yes to both your questions from me as well. I gave some initial feedback on some of the naming choices.

url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
@TRowbotham
Copy link
Contributor Author

Thanks for the feedback. I'll work on adding a new "Fatal" column as domenic suggested and rewording the descriptions to try not to repeat the "This error occurs when...".

I agree that an additional column probably isn't the best approach for the example input. The HTML parse errors table appears to put notes and examples right under the error description text, so I'll experiment with doing that and hopefully things don't get too cluttered.

@domenic
Copy link
Member

domenic commented May 11, 2020

The HTML parse errors table appears to put notes and examples right under the error description text, so I'll experiment with doing that and hopefully things don't get too cluttered.

One thing that will declutter things a bit is adding <p> around each description, which will introduce some nicer margins.

@annevk
Copy link
Member

annevk commented May 11, 2020

@TRowbotham if this needs rebasing at some point due to other ongoing changes I'm happy to do that work for you.

@TRowbotham
Copy link
Contributor Author

I think I broke this while rebasing :(

@TRowbotham
Copy link
Contributor Author

@annevk, I think this is ready now. It just needs a rebase, which I'll leave up to you so I don't botch it again.

@annevk
Copy link
Member

annevk commented May 17, 2020

@TRowbotham how are the rows ordered?

@TRowbotham
Copy link
Contributor Author

They are grouped by section in the spec. I started at section 4.4 URL Parsing and worked my way backwards through the sections.

@annevk
Copy link
Member

annevk commented May 18, 2020

For port-out-of-range a problem is that URL-port string should define the upper limit. If it defines the upper limit, do we still want a separate enum value? I guess we do, but the description would have to change as it would match invalid-port otherwise.

annevk added a commit that referenced this pull request May 18, 2020
annevk added a commit that referenced this pull request May 18, 2020
@karwa
Copy link
Contributor

karwa commented Aug 29, 2020

I think the missing-credentials error is incorrect. That error occurs if there is an @ but no host. https://@test.com should not cause the parser to fail, but https://username:password@ should (and does, in the Live URL viewer in Safari 13).

Perhaps it should be replaced by something like 'unexpected credentials without host' (similar to 'unexpected port without host')?

karwa added a commit to karwa/swift-url that referenced this pull request Sep 15, 2020
…t components from the stored URL string

- Corrected a validation error in the host parser
- Renamed 'missing-credentials' validation error; it's really an "unexpected credentials without host" error. See: whatwg/url#502 (comment)
Base automatically changed from master to main January 15, 2021 07:41
TRowbotham added a commit to TRowbotham/URL-Parser that referenced this pull request Aug 26, 2022
@annevk
Copy link
Member

annevk commented Jan 13, 2023

The IPv4 parser still has two unclassified validation errors.

Otherwise this has been rebased. I adjusted casing to my liking. I addressed @karwa's observation. I want to give it another pass and address those missing cases, but it's getting late.

@annevk annevk added clarification Standard could be clearer topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) labels Jan 13, 2023
@annevk
Copy link
Member

annevk commented Jan 17, 2023

I want #739 to land first and rebase this on top.

I also want to divide the table into 3 row groups. Tentatively titled "Hosts", "URLs", and "Opaque hosts and URLs". And then put the errors in order of appearance.

@annevk
Copy link
Member

annevk commented Jan 20, 2023

This is now ready for review. All validation errors should be covered. Please don't hold back on nits!

@annevk annevk requested a review from domenic January 20, 2023 18:54
@TRowbotham
Copy link
Contributor Author

Thank you for driving this to completion!

Overall, I think this looks pretty good. I think the IPv4 and IPv6 validation error names could use a normalization pass. Currently, there are a few with similar names, but slightly different word order such as:

  • too-many-IPv4-parts
  • IPv6-too-many-pieces
  • IPv4-in-IPv6-too-many-parts

url.bs Outdated Show resolved Hide resolved
url.bs Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
@TRowbotham
Copy link
Contributor Author

TRowbotham commented Jan 24, 2023

I stood up a small demo implementation, similar to the jsdom version, you can use it to quickly check validation errors if that helps with the review.

The git branch of the url parser the demo is using isn't published at the moment, but I can publish the branch if you would find the source code useful. The error highlighting could probably use some refinement, but it should do the job for reporting the validation error.

Edit: fixed the link

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for better consistency in naming the errors, if possible. Some suggestions, none of which I'm 100% confident in:

  • All the file: URL specific ones could be prefixed with file-.
  • All the authority/credentails ones could be prefixed with authority-.
  • Choose between "unexpected" (e.g. unexpected-C0-control-or-space), "invalid" (e.g. invalid-URL-code-point, invalid-scheme-start), and "forbidden" (e.g. forbidden-host-code-point").

I also wonder about redundancy for some of these. Isn't invalid-URL-code-point a superset of unexpected-C0-control-or-space and unexpected-ASCII-tab-or-newline? Similarly isn't unexpected-at-sign a superset of unexpected-credentials-without-host and missing-solidus-before-authority? Maybe the extra specificity is useful, but I'm not so sure...

url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
@annevk annevk added the do not merge yet Pull request must not be merged per rationale in comment label Jan 24, 2023
@annevk
Copy link
Member

annevk commented Jan 24, 2023

I pushed a large number of changes after a fairly lengthy editing session, but it's still not done unfortunately.

@annevk annevk removed the do not merge yet Pull request must not be merged per rationale in comment label Feb 10, 2023
@annevk annevk requested a review from domenic February 10, 2023 12:49
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits. Very clean.

url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
<p class=example id=example-ipv4-in-ipv6-too-many-pieces>"<code>https://[1:1:1:1:1:1:1:127.0.0.1]</code>"
<td>Yes
<tr>
<td><dfn>IPv4-in-IPv6-invalid-code-point</dfn>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really seems like this should be split into multiple errors, but I guess that could be refactored later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I ended up grouping some obscure ones into this one as there was no great logical separation. And IPv4-in-IPv6 is already pretty obscure so I'm not sure how detailed we really want to go.

@annevk
Copy link
Member

annevk commented Feb 18, 2023

@TRowbotham did you want to give this a final look before it's merged?

@TRowbotham
Copy link
Contributor Author

LGTM. Thank you for pushing this over the finish line!

@annevk annevk merged commit d84610f into whatwg:main Feb 20, 2023
TRowbotham added a commit to TRowbotham/URL-Parser that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)
Development

Successfully merging this pull request may close these issues.

4 participants