Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME types: test 0x0B and 0x0C #13615

Merged
merged 4 commits into from
Nov 12, 2018
Merged

MIME types: test 0x0B and 0x0C #13615

merged 4 commits into from
Nov 12, 2018

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 19, 2018

No description provided.

@wpt-pr-bot
Copy link
Collaborator

There are no reviewers for this pull request besides its author. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you!

@annevk
Copy link
Member Author

annevk commented Oct 19, 2018

@MattMenke2 I'm not entirely sure what would be best. Perhaps you know the history behind how 0x0B and 0x0C are sometimes special in HTTP (both are called out by the HTTP specification as sometimes being equivalent in certain clients, but this being legacy).

The MIME type parser treats 0x0C in a special manner because it's part of the definition of ASCII whitespace that most of the platform uses. Should the MIME type parser instead use the HTTP definition of whitespace (one that excludes 0x0C)? Doing that might be confusing to some non-HTTP consumers of that parser though.

@annevk
Copy link
Member Author

annevk commented Oct 19, 2018

(To be clear, this question also affects X-Content-Type-Options, which also currently uses the ASCII whitespace definition that's inclusive of 0x0C (but not 0x0B) when splitting. Implementations are somewhat divided as far as I can tell.)

Copy link
Member

@MattMenke2 MattMenke2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't know the history here. I wouldn't be surprised if this were inherited from how parsing of MIME types of email attachments are parsed, but that's purely speculation. For purely HTTP use, I'd rather treat 0x0C and 0x0D like any other character, if we can get away with it.

I'd be surprised if either character were common in non-HTTP consumers, just because it seems weird to randomly add whitespace characters that the average text editor/viewer has no idea what to do with, but that's again just speculation. Probably be best to consult someone familiar with those other contexts.

@annevk
Copy link
Member Author

annevk commented Oct 30, 2018

For this header, only Safari treats 0x0C as whitespace. (I'm assuming 0x0D was a typo above and meant to be 0x0B, which nobody treats specially here.)

annevk added a commit to whatwg/mimesniff that referenced this pull request Oct 30, 2018
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing things in whatwg-mimetype gives the following test mismatches:

  ● mime-types.json › text/html;charset=
                                        gbk (text/html;charset=\vgbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;charset=
                                        gbk (text/html;charset=\fgbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;
                                charset=gbk (text/html;\vcharset=gbk)

    Expected value to equal:
      "UTF-8"
    Received:
      null

  ● mime-types.json › text/html;
                                charset=gbk (text/html;\fcharset=gbk)

    Expected value to equal:
      "text/html;charset=gbk"
    Received:
      "text/html"

I think you need to remember that \u000B (\v) and \u000C (\f) are not HTTP token code points and so terminate the parsing algorithm in other ways.

@MattMenke2
Copy link
Member

MattMenke2 commented Nov 1, 2018

Not sure how to reply with quotes, @domenic are you sure about that? It looks to me like in the MIME sniff spec, values with non-token characters are ignored on a per-subfield basis:

If all of the following are true

parameterName is not the empty string
parameterName solely contains HTTP token code points
parameterValue solely contains HTTP quoted-string token code points
mimeType’s parameters[parameterName] does not exist

then set mimeType’s parameters[parameterName] to parameterValue.

The earlier rules for type/subtype fail if they contain non-token characters, but the name/value fields are just ignored if they have non-token characters.

@domenic
Copy link
Member

domenic commented Nov 1, 2018

Right, I guess "terminate" was imprecise. "Ignored the current construct" is what I meant.

So for example, given text/html;charset=\vgbk, this will be the same as text/html, which should have "encoding": null, but the JSON file gives "encoding": "UTF-8".

Similarly, for text/html;\fcharset=gbk, that whole parameter should get thrown away, and so the serialization should be text/html, but the JSON file gives "output": "text/html;charset=gbk" (and UTF-8 again).

@MattMenke2
Copy link
Member

Sorry, you're right. Was thinking we should be defaulting to UTF-8, but null is the default value.

@annevk
Copy link
Member Author

annevk commented Nov 5, 2018

I added some additional tests here for "HTTP newlines" and 0x0B and 0x0C in failure conditions (when "part of" type or subtype).

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests pass

@annevk annevk merged commit 6aeb68e into master Nov 12, 2018
@annevk annevk deleted the annevk/mime-types-0x0b-0x0c branch November 12, 2018 09:27
annevk added a commit to whatwg/mimesniff that referenced this pull request Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants