Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines, quotes and backslashes in field names or filenames not decoded correctly. #60

Open
defnull opened this issue Oct 12, 2024 · 0 comments
Assignees
Labels

Comments

@defnull
Copy link
Owner

defnull commented Oct 12, 2024

The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. The traditional header quoting (which involves backslash-escaping quotes and backslashes) is NOT applied. The string is (always?) quoted, but since it cannot contain quotes that need backslash-escaping, backslashes are left alone and not doubled.

This means that Multipart currently decodes field names or filenames incorrectly if they contain ", \r, \n or two backslashes in a row. The first three characters are percent-encoded by browsers but not percent-decoded by multipart, and two backslashes in a row are not escaped (doubled) by browsers but multipart un-escapes them.

For the record: other control characters (e.g. null bytes) are encoded as \xef\xbf\xbd, which is the UTF8 sequence for the unicode "Replacement character". Good to know.

Interestingly, this is not full percent-encoding as mentioned (but not required or suggested) by RFC-7578. Only three specific characters are percent-encoded, other non-ASCII characters are fair game and encoded as UTF8. There may be other clients that do full percent-encoding, but since we cannot know that for sure, and such a behavior would be a-typical for browsers, it's probably best to not touch those. Implementing the correct behavior in multipart would not destroy any information in those full-percent-encoded strings, though. Applications that know they have to deal with outdated clients would still be able to decode those strings.

The impact of this bug is probably very low. Field names or filenames containing newlines or backslashes should be extremely rare and quotes are also very uncommon. But we aim to fully support whatever browsers may send, and that's part of the deal.

@defnull defnull self-assigned this Oct 12, 2024
@defnull defnull added the Bug label Oct 12, 2024
@defnull defnull changed the title Special characters in field names or filenames not decoded correctly. Newlines, quotes and backslashes in field names or filenames not decoded correctly. Oct 12, 2024
defnull added a commit that referenced this issue Oct 17, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
defnull added a commit that referenced this issue Oct 18, 2024
The HTML5 specification defines that "field names and filenames for file fields [...] must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes." and tests show that modern browsers actually do that. This is different from traditional header quoting (which involves backslash-escaping quotes and backslashes).

fixes #60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant