Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: erroneous WHATWG whitespace handling #12825

Closed
TimothyGu opened this issue May 4, 2017 · 1 comment
Closed

url: erroneous WHATWG whitespace handling #12825

TimothyGu opened this issue May 4, 2017 · 1 comment
Labels
whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@TimothyGu
Copy link
Member

TimothyGu commented May 4, 2017

  • Version: v7.x and master
  • Platform: all
  • Subsystem: url
  1. We should be stripping leading and trailing C0 control or space if the call comes from any place other than the setters (step 1.3 in basic URL parser), but we are not.

    const assert = require('assert');
    const { URL } = require('url');
    assert.strictEqual(new URL('\x1fhttp://abc\x1f').href, 'http://abc/');
      // Currently: TypeError: Invalid URL: ...
  2. We should be stripping ASCII tab or newline (step 3 in basic URL parser) before entering the state machine (step 11). Instead of following the spec strictly, we are currently using a "clever" scheme that allows us to strip them without an additional loop. However, this scheme is already somewhat not elegant, but can actually break completely when the remaining magical variable is used:

    const assert = require('assert');
    const { URL } = require('url');
    assert.strictEqual(new URL('C|/', 'file://host/dir/file').href, 'file:///C:/');
      // No errors
    assert.strictEqual(new URL('C|\n/', 'file://host/dir/file').href, 'file:///C:/');
      // AssertionError: 'file://host/dir/C|/' === 'file:///C:/'

When implementing issue 1 above, one should first add an appropriate CHAR_TEST for C0 control or space, like so:

// https://infra.spec.whatwg.org/#c0-control-or-space
CHAR_TEST(8, IsC0ControlOrSpace, (ch >= '\0' && ch <= ' '))

Then we should add a new has_url argument to URL::Parse() much like the existing has_base to signify whether we should be stripping leading and trailing C0 control or space, or not.

As an optimization, if !has_url (i.e. if we are stripping leading and trailing C0 control or space) it should be possible to first find the appropriate start and end of the input without creating a new string, and then only create a new string in the ASCII tab or newline-removal step later.

@watilde Are you interested in taking a stab at this? #12846

@watilde
Copy link
Member

watilde commented May 5, 2017

That would be interesting. I just started working on it. Oops, nvm for it. I didn't refresh the page :)

domenic pushed a commit to web-platform-tests/wpt that referenced this issue May 10, 2017
TimothyGu added a commit to TimothyGu/node that referenced this issue May 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants