url: erroneous WHATWG whitespace handling #12825

TimothyGu · 2017-05-04T06:13:53Z

Version: v7.x and master
Platform: all
Subsystem: url

We should be stripping leading and trailing C0 control or space if the call comes from any place other than the setters (step 1.3 in basic URL parser), but we are not.

const assert = require('assert');
const { URL } = require('url');
assert.strictEqual(new URL('\x1fhttp://abc\x1f').href, 'http://abc/');
  // Currently: TypeError: Invalid URL: ...

We should be stripping ASCII tab or newline (step 3 in basic URL parser) before entering the state machine (step 11). Instead of following the spec strictly, we are currently using a "clever" scheme that allows us to strip them without an additional loop. However, this scheme is already somewhat not elegant, but can actually break completely when the remaining magical variable is used:
```
const assert = require('assert');
const { URL } = require('url');
assert.strictEqual(new URL('C|/', 'file://host/dir/file').href, 'file:///C:/');
  // No errors
assert.strictEqual(new URL('C|\n/', 'file://host/dir/file').href, 'file:///C:/');
  // AssertionError: 'file://host/dir/C|/' === 'file:///C:/'
```

When implementing issue 1 above, one should first add an appropriate CHAR_TEST for C0 control or space, like so:

// https://infra.spec.whatwg.org/#c0-control-or-space
CHAR_TEST(8, IsC0ControlOrSpace, (ch >= '\0' && ch <= ' '))

Then we should add a new has_url argument to URL::Parse() much like the existing has_base to signify whether we should be stripping leading and trailing C0 control or space, or not.

As an optimization, if !has_url (i.e. if we are stripping leading and trailing C0 control or space) it should be possible to first find the appropriate start and end of the input without creating a new string, and then only create a new string in the ASCII tab or newline-removal step later.

~~@watilde Are you interested in taking a stab at this?~~ #12846

The text was updated successfully, but these errors were encountered:

watilde · 2017-05-05T08:26:08Z

~~That would be interesting. I just started working on it.~~ Oops, nvm for it. I didn't refresh the page :)

Cf. nodejs/node#12825.

Fixes: nodejs#12825 Refs: web-platform-tests/wpt#5792

TimothyGu added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label May 4, 2017

TimothyGu mentioned this issue May 5, 2017

[url] Test more whitespace stripping intricacies web-platform-tests/wpt#5792

Merged

TimothyGu mentioned this issue May 5, 2017

url: standard-conformant C0 control and whitespace handling #12846

Closed

3 tasks

domenic pushed a commit to web-platform-tests/wpt that referenced this issue May 10, 2017

URL: test more whitespace stripping intricacies

5ae94e1

Cf. nodejs/node#12825.

TimothyGu added a commit to TimothyGu/node that referenced this issue May 14, 2017

url: fix C0 control and whitespace handling

ddf2aa5

Fixes: nodejs#12825 Refs: web-platform-tests/wpt#5792

TimothyGu closed this as completed in 841bb4c May 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

url: erroneous WHATWG whitespace handling #12825

url: erroneous WHATWG whitespace handling #12825

TimothyGu commented May 4, 2017 •

edited

Loading

watilde commented May 5, 2017 •

edited

Loading

url: erroneous WHATWG whitespace handling #12825

url: erroneous WHATWG whitespace handling #12825

Comments

TimothyGu commented May 4, 2017 • edited Loading

watilde commented May 5, 2017 • edited Loading

TimothyGu commented May 4, 2017 •

edited

Loading

watilde commented May 5, 2017 •

edited

Loading