From 58c15efd2389574d4dd6e2d509a1a46b5a993122 Mon Sep 17 00:00:00 2001 From: Trevor Rowbotham Date: Tue, 5 May 2020 22:45:25 -0400 Subject: [PATCH] Add named validation errors Closes #406. --- url.bs | 444 ++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 379 insertions(+), 65 deletions(-) diff --git a/url.bs b/url.bs index 13ba3f07..a6a0672f 100644 --- a/url.bs +++ b/url.bs @@ -88,6 +88,301 @@ valid input. User agents, especially conformance checkers, are encouraged to rep unclear to other developers. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Error type + Error description + Failure +
unexpected-C0-control-or-space + +

The input to the URL parser contains a leading or trailing C0 control or space. The + URL parser subsequently strips any matching code points. +

" https://example.org " +

❌ +
unexpected-ASCII-tab-or-newline + +

The input to the URL parser contains ASCII tab or newlines. The URL parser + subsequently strips any matching code points. +

"ht
tps://example.org
" +

❌ +
invalid-scheme-start + +

The first code point of a URL's scheme is not an ASCII alpha. +

"3ttps://example.org" +

✅ +
file-scheme-missing-following-solidus + +

The URL parser encounters a URL with a "file" scheme that is not + followed by "//". +

"file:c:/my-secret-folder" +

❌ +
invalid-scheme + +

The URL's scheme contains an invalid code point. +

"^_^://example.org" and + "https//example.org" +

✅ +
missing-scheme-non-relative-URL + +

The input is missing a scheme, because it does not begin with an + ASCII alpha, and either no base URL was provided or the base URL cannot be + used as a base URL because it has an opaque path. +

+

Input's scheme is missing and no base URL is given: +


+let url = new URL("💩");
+

Input's scheme is missing, but the base URL has an + opaque path. +


+let url = new URL("💩", "mailto:user@example.org");
+
+
✅ +
relative-URL-missing-beginning-solidus + +

The input is a relative-URL String that does not begin with U+002F (/). +


+let url = new URL("foo.html", "https://example.org/");
+
❌ +
unexpected-reverse-solidus + +

The URL has a special scheme and it uses U+005C (\) instead of U+002F (/). +

"https://example.org\path\to\file" +

❌ +
missing-solidus-before-authority + +

The URL includes credentials that are not preceded by "//". +

"https:user@example.org" +

❌ +
unexpected-at-sign + +

The URL includes credentials, however this is considered invalid. +

"https://user@example.org" +

❌ +
unexpected-credentials-without-host + +

The URL include credentials, but no host. +

"https://user:pass@" +

✅ +
unexpected-port-without-host + +

The URL contains a port, but no host. +

"https://:443" +

✅ +
empty-host-special-scheme + +

The URL has a special scheme, but does not contain a host. +

"https://#fragment" +

✅ +
host-invalid + +

The host portion of the URL is an empty string when it + includes credentials or a port and the basic URL parser's state is + overridden. +


+const url = new URL("https://example:9000");
+url.hostname = "";
+
❌ +
port-out-of-range + +

The input's port is too big. +

"https://example.org:70000" +

✅ +
port-invalid + +

The input's port is invalid. +

"https://example.org:7z" +

✅ +
unexpected-Windows-drive-letter + +

The input is a relative-URL string that starts with a Windows drive letter and + the base URL's scheme is "file". +


+let url = new URL("/c:/path/to/file", "file:///c:/");
+
❌ +
unexpected-Windows-drive-letter-host + +

The file URL's host is a Windows drive letter. +

"file://c:" +

❌ +
invalid-URL-code-point + +

A code point is found that is not a URL code point or U+0025 (%), in the URL's + path, query, or fragment. +

"https://example.org/>" +

❌ +
unescaped-percent-sign + +

A U+0025 (%) is found that is not followed by two ASCII hex digits, in the URL's + path, query, or fragment. +

"https://example.org/%s" +

❌ +
unclosed-IPv6-address + +

An IPv6 address is missing the closing U+005D (]). +

"https://[::1" +

✅ +
domain-to-ASCII + +

The result of Unicode toASCII records an error while processing + the input domain. +

[[!UTS46]] conformance does not require the reporting of precise errors, only that + an error has occurred. If the [[!UTS46]] implementation reports precise error codes, user agents + are encouraged pass those codes along. +

✅ +
domain-to-ASCII-empty + +

The result of Unicode toASCII returns an empty string. This + could have been caused by: +

    +
  • Input consists of all ignorable code points. +
  • Input is the string "xn--". +
  • Input is the empty string and the VerifyDnsLength parameter is false. +
+
✅ +
domain-to-Unicode + +

The result of Unicode toUnicode returns an error while + processing the input domain. +

The same considerations as with domain-to-ASCII apply. +

❌ +
forbidden-domain-code-point + +

The input's host contains a forbidden domain code point. +

+

Hosts are percent-decoded before being processed when the URL + is special, which would result in the following host portion becoming + "exa#mple.org". +

"https://exa%23mple.org" +

+
✅ +
unexpected-non-decimal-number + +

The IPv4 address contains numbers expressed using hexadecimal or octal digits. +

"https://127.0.0x0.1" +

❌ +
IPv4-part-out-of-range + +

An IPv4 part exceeds 255. This is only fatal if the last part exceeds 255. +

"https://255.255.4000.1" +

✅ +
invalid-compressed-IPv6-address + +

An IPv6 address begins with improper compression. +

"https://[:1]" +

✅ +
IPv6-too-many-pieces + +

An IPv6 address contains more than 8 pieces. +

"https://[1:2:3:4:5:6:7:8:9]" +

✅ +
IPv6-multiple-compression + +

An IPv6 address is compressed in more than one spot. +

"https://[1::1::1]" +

✅ +
IPv4-in-IPv6-empty-part + +

An IPv6 address that contains an IPv4 address has an empty part in the IPv4 address. +

"https://[ffff::.0.0.1]" +

✅ +
IPv4-in-IPv6-too-many-pieces + +

An IPv6 address contains an IPv4 address and the IPv6 address has more than 6 pieces. +

"https://[1:1:1:1:1:1:1:127.0.0.1]" +

✅ +
IPv6-unexpected-eof + +

An IPv6 address unexpectedly ends. +

"https://[1:2:3:]" +

✅ +
IPv6-unexpected-delimiter + +

An IPv6 address contains a code point that is neither an ASCII hex digit nor a + U+003A (:). +

"https://[1:2:3!:4]" +

✅ +
IPv6-too-few-pieces + +

An uncompressed IPv6 address contains fewer than 8 pieces. +

"https://[1:2:3]" +

✅ +
forbidden-host-code-point + +

When an opaque host (in a URL that is not special) contains a + forbidden host code point. +

"foo://exa[mple.org" +

✅ +
IPv4-in-IPv6-too-many-parts + +

An IPv6 address contains an IPv4 address and the IPv4 address has more than 4 parts. +

"https://[ffff::127.0.0.1.2]" +

✅ +
IPv4-in-IPv6-unexpected-code-point + +

An IPv6 address contains an IPv4 address of which a part contain a code point other than an + ASCII digits or is the empty string. +

"https://[ffff::127.0.xyz.1]" +

✅ +
IPv4-in-IPv6-invalid-first-part + +

The first part of an IPv4 address that is contained within an IPv6 address is 0. +

"https://[ffff::0.0.0.1]" +

✅ +
IPv4-in-IPv6-part-out-of-range + +

An IPv4 address contained within an IPv6 address contains a part that exceeds 255. +

"https://[ffff::127.0.0.4000]" +

✅ +
IPv4-in-IPv6-too-few-parts + +

An IPv4 address contained within an IPv6 address does not contain exactly 4 parts. +

"https://[ffff::127.0.0]" +

✅ +
+

Parsers

@@ -653,9 +948,11 @@ concepts. item that starts with an ASCII case-insensitive match for "xn--", this step is equivalent to ASCII lowercasing domain. -
  • If result is a failure value, validation error, return failure. +

  • If result is a failure value, domain-to-ASCII validation error, + return failure. -

  • If result is the empty string, validation error, return failure. +

  • If result is the empty string, domain-to-ASCII-empty + validation error, return failure.

  • Return result. @@ -674,8 +971,8 @@ concepts. UseSTD3ASCIIRules set to beStrict, and Transitional_Processing set to false. [[!UTS46]] -

  • Signify validation errors for any returned errors, and then, return - result. +

  • Signify domain-to-Unicode validation errors for any returned errors, and then, + return result. @@ -743,7 +1040,8 @@ to be distinguished.

    If input starts with U+005B ([), then:

      -
    1. If input does not end with U+005D (]), validation error, return failure. +

    2. If input does not end with U+005D (]), unclosed-IPv6-address + validation error, return failure.

    3. Return the result of IPv6 parsing input with its leading U+005B ([) and trailing U+005D (]) removed. @@ -764,10 +1062,10 @@ to be distinguished.

    4. Let asciiDomain be the result of running domain to ASCII with domain and false. -

    5. If asciiDomain is failure, validation error, return failure. +

    6. If asciiDomain is failure, then return failure.

    7. If asciiDomain contains a forbidden domain code point, - validation error, return failure. + forbidden-domain-code-point validation error, return failure.

    8. If asciiDomain ends in a number, then return the result of IPv4 parsing asciiDomain. @@ -851,19 +1149,21 @@ return value of the host parser is an IPv4 address.

    9. If result is failure, validation error, return failure. -

    10. If result[1] is true, validation error. +

    11. If result[1] is true, unexpected-non-decimal-number + validation error.

    12. Append result[0] to numbers.

    -
  • If any item in numbers is greater than 255, validation error. +

  • If any item in numbers is greater than 255, IPv4-part-out-of-range + validation error.

  • If any but the last item in numbers is greater than 255, then return failure.

  • If the last item in numbers is greater than or equal to - 256(5 − numbers's size), validation error, - return failure. + 256(5 − numbers's size), + IPv4-part-out-of-range validation error, return failure.

  • Let ipv4 be the last item in numbers. @@ -960,8 +1260,8 @@ actually doing that with the editors of this document first.

    If c is U+003A (:), then:

      -
    1. If remaining does not start with U+003A (:), validation error, return - failure. +

    2. If remaining does not start with U+003A (:), + invalid-compressed-IPv6-address validation error, return failure.

    3. Increase pointer by 2. @@ -973,13 +1273,15 @@ actually doing that with the editors of this document first.

      While c is not the EOF code point:

        -
      1. If pieceIndex is 8, validation error, return failure. +

      2. If pieceIndex is 8, IPv6-too-many-pieces validation error, return + failure.

      3. If c is U+003A (:), then:

          -
        1. If compress is non-null, validation error, return failure. +

        2. If compress is non-null, IPv6-multiple-compression + validation error, return failure.

        3. Increase pointer and pieceIndex by 1, set compress to pieceIndex, and then continue. @@ -995,11 +1297,13 @@ actually doing that with the editors of this document first.

          If c is U+002E (.), then:

            -
          1. If length is 0, validation error, return failure. +

          2. If length is 0, IPv4-in-IPv6-empty-part validation error, + return failure.

          3. Decrease pointer by length. -

          4. If pieceIndex is greater than 6, validation error, return failure. +

          5. If pieceIndex is greater than 6, IPv4-in-IPv6-too-many-pieces + validation error, return failure.

          6. Let numbersSeen be 0. @@ -1016,10 +1320,11 @@ actually doing that with the editors of this document first.

          7. If c is a U+002E (.) and numbersSeen is less than 4, then increase pointer by 1. -

          8. Otherwise, validation error, return failure. +
          9. Otherwise, IPv4-in-IPv6-too-many-parts validation error, return failure.
          -
        4. If c is not an ASCII digit, validation error, return failure. +

        5. If c is not an ASCII digit, IPv4-in-IPv6-unexpected-code-point + validation error, return failure.

        6. @@ -1031,13 +1336,14 @@ actually doing that with the editors of this document first.
        7. If ipv4Piece is null, then set ipv4Piece to number. -

          Otherwise, if ipv4Piece is 0, validation error, return failure. +

          Otherwise, if ipv4Piece is 0, IPv4-in-IPv6-invalid-first-part + validation error, return failure.

          Otherwise, set ipv4Piece to ipv4Piece × 10 + number. -

        8. If ipv4Piece is greater than 255, validation error, return - failure. +

        9. If ipv4Piece is greater than 255, IPv4-in-IPv6-part-out-of-range + validation error, return failure.

        10. Increase pointer by 1.

        @@ -1050,7 +1356,8 @@ actually doing that with the editors of this document first.
      4. If numbersSeen is 2 or 4, then increase pieceIndex by 1.

      -
    4. If numbersSeen is not 4, validation error, return failure. +

    5. If numbersSeen is not 4, IPv4-in-IPv6-too-few-parts + validation error, return failure.

    6. Break.

    @@ -1061,11 +1368,12 @@ actually doing that with the editors of this document first.
    1. Increase pointer by 1. -

    2. If c is the EOF code point, validation error, return failure. +

    3. If c is the EOF code point, IPv6-unexpected-eof + validation error, return failure.

    -
  • Otherwise, if c is not the EOF code point, validation error, return - failure. +

  • Otherwise, if c is not the EOF code point, IPv6-unexpected-delimiter + validation error, return failure.

  • Set address[pieceIndex] to value. @@ -1087,7 +1395,7 @@ actually doing that with the editors of this document first.

  • Otherwise, if compress is null and pieceIndex is not 8, - validation error, return failure. + IPv6-too-few-pieces validation error, return failure.

  • Return address. @@ -1102,13 +1410,13 @@ actually doing that with the editors of this document first.

    1. If input contains a forbidden host code point, - validation error, return failure. + forbidden-host-code-point validation error, return failure.

    2. If input contains a code point that is not a URL code point and not - U+0025 (%), validation error. + U+0025 (%), invalid-URL-code-point validation error.

    3. If input contains a U+0025 (%) and the two code points following it are - not ASCII hex digits, validation error. + not ASCII hex digits, unescaped-percent-sign validation error.

    4. Return the result of running UTF-8 percent-encode on input using the C0 control percent-encode set. @@ -1853,12 +2161,13 @@ and then runs these steps:

    5. Set url to a new URL.

    6. If input contains any leading or trailing C0 control or space, - validation error. + unexpected-C0-control-or-space validation error.

    7. Remove any leading and trailing C0 control or space from input.

    -
  • If input contains any ASCII tab or newline, validation error. +

  • If input contains any ASCII tab or newline, + unexpected-ASCII-tab-or-newline validation error.

  • Remove all ASCII tab or newline from input. @@ -1892,7 +2201,7 @@ and then runs these steps: no scheme state and decrease pointer by 1.

  • -

    Otherwise, validation error, return failure. +

    Otherwise, invalid-scheme-start validation error, return failure.

    This indication of failure is used exclusively by the {{Location}} object's {{Location/protocol}} setter. @@ -1944,7 +2253,7 @@ and then runs these steps:

    1. If remaining does not start with "//", - validation error. + file-scheme-missing-following-solidus validation error.

    2. Set state to file state.

    @@ -1976,7 +2285,7 @@ and then runs these steps: in input).
  • -

    Otherwise, validation error, return failure. +

    Otherwise, invalid-scheme validation error, return failure.

    This indication of failure is used exclusively by the {{Location}} object's {{Location/protocol}} setter. Furthermore, the non-failure termination earlier in this state @@ -1987,7 +2296,8 @@ and then runs these steps:

    1. If base is null, or base has an opaque path and - c is not U+0023 (#), validation error, return failure. + c is not U+0023 (#), missing-scheme-non-relative-URL validation error, + return failure.

    2. Otherwise, if base has an opaque path and c is U+0023 (#), set url's scheme to @@ -2013,8 +2323,8 @@ and then runs these steps: state to special authority ignore slashes state and increase pointer by 1. -

    3. Otherwise, validation error, set state to relative state and - decrease pointer by 1. +

    4. Otherwise, relative-URL-missing-beginning-solidus validation error, set + state to relative state and decrease pointer by 1.

    path or authority state @@ -2036,7 +2346,8 @@ and then runs these steps:
  • If c is U+002F (/), then set state to relative slash state.

  • Otherwise, if url is special and c is U+005C (\), - validation error, set state to relative slash state. + unexpected-reverse-solidus validation error, set state to + relative slash state.

  • Otherwise: @@ -2081,7 +2392,8 @@ and then runs these steps:

    If url is special and c is U+002F (/) or U+005C (\), then:

      -
    1. If c is U+005C (\), validation error. +

    2. If c is U+005C (\), unexpected-reverse-solidus + validation error.

    3. Set state to special authority ignore slashes state.

    @@ -2108,8 +2420,9 @@ and then runs these steps: state to special authority ignore slashes state and increase pointer by 1. -
  • Otherwise, validation error, set state to - special authority ignore slashes state and decrease pointer by 1. +

  • Otherwise, missing-solidus-before-authority validation error, set + state to special authority ignore slashes state and decrease + pointer by 1.

    special authority ignore slashes state @@ -2118,7 +2431,7 @@ and then runs these steps:
  • If c is neither U+002F (/) nor U+005C (\), then set state to authority state and decrease pointer by 1. -

  • Otherwise, validation error. +

  • Otherwise, missing-solidus-before-authority validation error.

    authority state @@ -2128,7 +2441,7 @@ and then runs these steps:

    If c is U+0040 (@), then:

      -
    1. Validation error. +

    2. Unexpected-at-sign validation error.

    3. If atSignSeen is true, then prepend "%40" to buffer. @@ -2168,7 +2481,7 @@ and then runs these steps:

      1. If atSignSeen is true and buffer is the empty string, - validation error, return failure. + unexpected-credentials-without-host validation error, return failure. @@ -2193,7 +2506,8 @@ and then runs these steps:

        Otherwise, if c is U+003A (:) and insideBrackets is false, then:

          -
        1. If buffer is the empty string, validation error, return failure. +

        2. If buffer is the empty string, unexpected-port-without-host + validation error, return failure.

        3. If state override is given and state override is @@ -2221,13 +2535,13 @@ and then runs these steps:

          1. If url is special and buffer is the empty string, - validation error, return failure. + empty-host-special-scheme validation error, return failure.

          2. Otherwise, if state override is given, buffer is the empty string, and either url includes credentials or url's - port is non-null, return. + port is non-null, host-invalid validation error, return.

          3. Let host be the result of host parsing buffer with url is not special. @@ -2279,7 +2593,7 @@ and then runs these steps: 0 through 9.

          4. If port is greater than 216 − 1, - validation error, return failure. + port-out-of-range validation error, return failure.

          5. Set url's port to null, if port is url's scheme's default port; otherwise to port. @@ -2292,7 +2606,7 @@ and then runs these steps:

          6. Set state to path start state and decrease pointer by 1.

          -
        4. Otherwise, validation error, return failure. +

        5. Otherwise, port-invalid validation error, return failure.

        file state @@ -2306,7 +2620,7 @@ and then runs these steps:

        If c is U+002F (/) or U+005C (\), then:

          -
        1. If c is U+005C (\), validation error. +

        2. If c is U+005C (\), unexpected-reverse-solidus validation error.

        3. Set state to file slash state.

        @@ -2343,7 +2657,7 @@ and then runs these steps:

        Otherwise:

          -
        1. Validation error. +

        2. Unexpected-Windows-drive-letter validation error.

        3. Set url's path to « ».

        @@ -2365,7 +2679,7 @@ and then runs these steps:

        If c is U+002F (/) or U+005C (\), then:

          -
        1. If c is U+005C (\), validation error. +

        2. If c is U+005C (\), unexpected-reverse-solidus validation error.

        3. Set state to file host state.

        @@ -2406,8 +2720,8 @@ and then runs these steps:
        1. If state override is not given and buffer is a - Windows drive letter, validation error, set state to - path state. + Windows drive letter, unexpected-Windows-drive-letter-host + validation error, set state to path state.

          This is a (platform-independent) Windows drive letter quirk. buffer is not reset here and instead used in the path state. @@ -2454,7 +2768,7 @@ and then runs these steps:

          If url is special, then:

            -
          1. If c is U+005C (\), validation error. +

          2. If c is U+005C (\), unexpected-reverse-solidus validation error.

          3. Set state to path state. @@ -2500,7 +2814,7 @@ and then runs these steps:

            1. If url is special and c is U+005C (\), - validation error. + unexpected-reverse-solidus validation error.

            2. If buffer is a double-dot URL path segment, then: @@ -2550,10 +2864,10 @@ and then runs these steps:

              1. If c is not a URL code point and not U+0025 (%), - validation error. + invalid-URL-code-point validation error.

              2. If c is U+0025 (%) and remaining does not start with two - ASCII hex digits, validation error. + ASCII hex digits, unescaped-percent-sign validation error.

              3. UTF-8 percent-encode c using the path percent-encode set and append the result to buffer. @@ -2574,10 +2888,10 @@ and then runs these steps:

                1. If c is not the EOF code point, not a URL code point, and not - U+0025 (%), validation error. + U+0025 (%), invalid-URL-code-point validation error.

                2. If c is U+0025 (%) and remaining does not start with two - ASCII hex digits, validation error. + ASCII hex digits, unescaped-percent-sign validation error.

                3. If c is not the EOF code point, UTF-8 percent-encode c using the @@ -2633,10 +2947,10 @@ and then runs these steps:

                  1. If c is not a URL code point and not U+0025 (%), - validation error. + invalid-URL-code-point validation error.

                  2. If c is U+0025 (%) and remaining does not start with two - ASCII hex digits, validation error. + ASCII hex digits, unescaped-percent-sign validation error.

                  3. Append c to buffer.

                  @@ -2650,10 +2964,10 @@ and then runs these steps:
                  1. If c is not a URL code point and not U+0025 (%), - validation error. + invalid-URL-code-point validation error.

                  2. If c is U+0025 (%) and remaining does not start with two - ASCII hex digits, validation error. + ASCII hex digits, unescaped-percent-sign validation error.

                  3. UTF-8 percent-encode c using the fragment percent-encode set and append the result to url's