-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support RFC 5987 for attribute filename* in HTTP header Content-Disposition #4647
Merged
Merged
Changes from 2 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
4522d94
Add support RFC 5987 for attribute filename* in HTTP header Content-D…
aserkes 9c1437b
Update year in Copyright comments
aserkes 3089053
Fix problem with capital letters, add more filename patterns for tests
aserkes 109e685
code refactoring
aserkes b8fac12
Encode a filename parameter if it is not encoded, throw an exception …
aserkes 7d4d5d5
All charsets for the filename* parameter are permitted
aserkes b523c01
Refactoring
aserkes 10ca154
Update year in Copyright comments
aserkes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not expert on RFC 5987 but for example this (ISO-8859-1 in lower case) will not match:
iso-8859-1'language-us'abc%a1abc%a2%b1!#$&+.^_`|~-
It would be good to pick many examples of valid strings for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the PR should be to allow multi-language (non ASCII) file names. And, for example, this pattern
fileNameExt = "UTF-8''українська_назва.pdf";
does not work. Same applies for the pattern:fileNameExt = "UTF-8''nombre_español.pdf";
. And what is the most disapointing here, it does not allow capital letters (even ASCII) in the name.@aserkes, could you please fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 5987 does not allow any symbols except in ASCII encoding. Symbols in any other encoding have to be encoded as described in https://tools.ietf.org/html/rfc3986#section-2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the pattern for multi-language file name shall look like
fileNameExt = "UTF-8''nombre_espa%c3%b1ol.pdf"
And what about language-tag? The RFC 5987 references Section 2.3 of [RFC2978] but I'm not sure it can include the word
language
itself. It might be that the whole[ language ]
construct in theext-value
description shall be substituted by language code from RFC 2978. Or do I understand this wrong?And yes, it is legitimate to indicate encoding character insensitive, so comment and fix about this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language-Tag can exist or can be omitted. Requirements to this tag are described in https://tools.ietf.org/html/rfc5646#section-2.1 . This tag can have various content that is not always standardized and can expand, but main pattern for it looks like
([a-z]{2,8}(-[a-z0-9]+)?)?
(if I understand RFC 5646 correctly).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern seems to be relevant,
and another thing comes to mind - it might be good to extend existing constructor (along with related builders) with the fileNameExt parameter. Just not to take fileNameExt from header and to have the whole ContentDisposition class consistent.
@jansupol what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC 5987 describes:
filename*
values for characters that do not match the regular expression. The encoding is left on the users. Should they use a non-encoded symbol it is they who would deal with it at the end. The corresponding bug specifically asks for being able to add non-encoded symbols.For these two reasons, should we just have the following?:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal behaviour would be:
UriComponent#encode
. If the not encoded and not UTF-8, throw exception. If UTF-8, and non encoded, encode.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes :
At the same time I left only 2 possible charset for a filename* parameter (ISO-8859-1 and UTF-8) because as mentioned in RFC 5987 :