MIME types: test 0x0B and 0x0C #13615

annevk · 2018-10-19T11:39:36Z

No description provided.

wpt-pr-bot · 2018-10-19T11:39:43Z

There are no reviewers for this pull request besides its author. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you!

annevk · 2018-10-19T11:41:43Z

@MattMenke2 I'm not entirely sure what would be best. Perhaps you know the history behind how 0x0B and 0x0C are sometimes special in HTTP (both are called out by the HTTP specification as sometimes being equivalent in certain clients, but this being legacy).

The MIME type parser treats 0x0C in a special manner because it's part of the definition of ASCII whitespace that most of the platform uses. Should the MIME type parser instead use the HTTP definition of whitespace (one that excludes 0x0C)? Doing that might be confusing to some non-HTTP consumers of that parser though.

annevk · 2018-10-19T11:45:39Z

(To be clear, this question also affects X-Content-Type-Options, which also currently uses the ASCII whitespace definition that's inclusive of 0x0C (but not 0x0B) when splitting. Implementations are somewhat divided as far as I can tell.)

MattMenke2

Unfortunately, I don't know the history here. I wouldn't be surprised if this were inherited from how parsing of MIME types of email attachments are parsed, but that's purely speculation. For purely HTTP use, I'd rather treat 0x0C and 0x0D like any other character, if we can get away with it.

I'd be surprised if either character were common in non-HTTP consumers, just because it seems weird to randomly add whitespace characters that the average text editor/viewer has no idea what to do with, but that's again just speculation. Probably be best to consult someone familiar with those other contexts.

annevk · 2018-10-30T13:00:21Z

For this header, only Safari treats 0x0C as whitespace. (I'm assuming 0x0D was a typo above and meant to be 0x0B, which nobody treats specially here.)

Tests: web-platform-tests/wpt#13615. Fixes #89.

domenic

Implementing things in whatwg-mimetype gives the following test mismatches:

  ● mime-types.json › text/html;charset=
                                        gbk (text/html;charset=\vgbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;charset=
                                        gbk (text/html;charset=\fgbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;
                                charset=gbk (text/html;\vcharset=gbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;
                                charset=gbk (text/html;\fcharset=gbk)

    Expected value to equal:
      "text/html;charset=gbk"
    Received:
      "text/html"

I think you need to remember that \u000B (\v) and \u000C (\f) are not HTTP token code points and so terminate the parsing algorithm in other ways.

MattMenke2 · 2018-11-01T20:15:20Z

Not sure how to reply with quotes, @domenic are you sure about that? It looks to me like in the MIME sniff spec, values with non-token characters are ignored on a per-subfield basis:

If all of the following are true

parameterName is not the empty string
parameterName solely contains HTTP token code points
parameterValue solely contains HTTP quoted-string token code points
mimeType’s parameters[parameterName] does not exist

then set mimeType’s parameters[parameterName] to parameterValue.

The earlier rules for type/subtype fail if they contain non-token characters, but the name/value fields are just ignored if they have non-token characters.

domenic · 2018-11-01T20:22:01Z

Right, I guess "terminate" was imprecise. "Ignored the current construct" is what I meant.

So for example, given text/html;charset=\vgbk, this will be the same as text/html, which should have "encoding": null, but the JSON file gives "encoding": "UTF-8".

Similarly, for text/html;\fcharset=gbk, that whole parameter should get thrown away, and so the serialization should be text/html, but the JSON file gives "output": "text/html;charset=gbk" (and UTF-8 again).

MattMenke2 · 2018-11-01T20:24:36Z

Sorry, you're right. Was thinking we should be defaulting to UTF-8, but null is the default value.

annevk · 2018-11-05T14:47:39Z

I added some additional tests here for "HTTP newlines" and 0x0B and 0x0C in failure conditions (when "part of" type or subtype).

domenic

All tests pass

Tests: web-platform-tests/wpt#13615. Fixes #89.

MIME types: test 0x0B and 0x0C

fd1d2c5

wpt-pr-bot added mimesniff status:needs-reviewers labels Oct 19, 2018

MattMenke2 reviewed Oct 19, 2018

View reviewed changes

annevk mentioned this pull request Oct 30, 2018

MIME type parsing: stop treating 0x0C as whitespace whatwg/mimesniff#89

Closed

don't treat 0x0C as whitespace

a9ec75f

annevk added a commit to whatwg/mimesniff that referenced this pull request Oct 30, 2018

Use HTTP rather than ASCII whitespace

7a80894

Tests: web-platform-tests/wpt#13615. Fixes #89.

annevk mentioned this pull request Oct 30, 2018

Use HTTP rather than ASCII whitespace whatwg/mimesniff#90

Merged

MattMenke2 approved these changes Oct 30, 2018

View reviewed changes

domenic requested changes Nov 1, 2018

View reviewed changes

annevk added 2 commits November 2, 2018 09:46

oops, thanks Domenic! (still need to figure out the spec here)

1782ec6

also test newlines and 0x0B and 0x0C in error conditions

a6f479d

jugglinmike mentioned this pull request Nov 7, 2018

Verify correctness of Taskcluster PR checks #13194

Closed

domenic approved these changes Nov 9, 2018

View reviewed changes

annevk merged commit 6aeb68e into master Nov 12, 2018

annevk deleted the annevk/mime-types-0x0b-0x0c branch November 12, 2018 09:27

annevk added a commit to whatwg/mimesniff that referenced this pull request Nov 13, 2018

Use HTTP whitespace rather than ASCII whitespace

126286a

Tests: web-platform-tests/wpt#13615. Fixes #89.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIME types: test 0x0B and 0x0C #13615

MIME types: test 0x0B and 0x0C #13615

annevk commented Oct 19, 2018

wpt-pr-bot commented Oct 19, 2018

annevk commented Oct 19, 2018

annevk commented Oct 19, 2018

MattMenke2 left a comment

annevk commented Oct 30, 2018

domenic left a comment

MattMenke2 commented Nov 1, 2018 •

edited

Loading

domenic commented Nov 1, 2018 •

edited

Loading

MattMenke2 commented Nov 1, 2018

annevk commented Nov 5, 2018

domenic left a comment

MIME types: test 0x0B and 0x0C #13615

MIME types: test 0x0B and 0x0C #13615

Conversation

annevk commented Oct 19, 2018

wpt-pr-bot commented Oct 19, 2018

annevk commented Oct 19, 2018

annevk commented Oct 19, 2018

MattMenke2 left a comment

Choose a reason for hiding this comment

annevk commented Oct 30, 2018

domenic left a comment

Choose a reason for hiding this comment

MattMenke2 commented Nov 1, 2018 • edited Loading

domenic commented Nov 1, 2018 • edited Loading

MattMenke2 commented Nov 1, 2018

annevk commented Nov 5, 2018

domenic left a comment

Choose a reason for hiding this comment

MattMenke2 commented Nov 1, 2018 •

edited

Loading

domenic commented Nov 1, 2018 •

edited

Loading