-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support RFC 5987 for attribute filename* in HTTP header Content-Disposition #4647
Conversation
…isposition Signed-off-by: Andrii Serkes <[email protected]>
Signed-off-by: Andrii Serkes <[email protected]>
@@ -40,6 +41,8 @@ | |||
private Date modificationDate; | |||
private Date readDate; | |||
private long size; | |||
private static final Pattern FILENAME_EXT_VALUE_PATTERN = | |||
Pattern.compile("(UTF-8|ISO-8859-1)'([a-z]{2,8}(-[a-z0-9]+)?)?'(%[a-f0-9]{2}|[a-z0-9!#$&+.^_`|~-])+"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not expert on RFC 5987 but for example this (ISO-8859-1 in lower case) will not match:
iso-8859-1'language-us'abc%a1abc%a2%b1!#$&+.^_`|~-
It would be good to pick many examples of valid strings for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the PR should be to allow multi-language (non ASCII) file names. And, for example, this pattern fileNameExt = "UTF-8''українська_назва.pdf";
does not work. Same applies for the pattern: fileNameExt = "UTF-8''nombre_español.pdf";
. And what is the most disapointing here, it does not allow capital letters (even ASCII) in the name.
@aserkes, could you please fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 5987 does not allow any symbols except in ASCII encoding. Symbols in any other encoding have to be encoded as described in https://tools.ietf.org/html/rfc3986#section-2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the pattern for multi-language file name shall look like fileNameExt = "UTF-8''nombre_espa%c3%b1ol.pdf"
And what about language-tag? The RFC 5987 references Section 2.3 of [RFC2978] but I'm not sure it can include the word language
itself. It might be that the whole [ language ]
construct in the ext-value
description shall be substituted by language code from RFC 2978. Or do I understand this wrong?
And yes, it is legitimate to indicate encoding character insensitive, so comment and fix about this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language-Tag can exist or can be omitted. Requirements to this tag are described in https://tools.ietf.org/html/rfc5646#section-2.1 . This tag can have various content that is not always standardized and can expand, but main pattern for it looks like ([a-z]{2,8}(-[a-z0-9]+)?)?
(if I understand RFC 5646 correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern seems to be relevant,
and another thing comes to mind - it might be good to extend existing constructor (along with related builders) with the fileNameExt parameter. Just not to take fileNameExt from header and to have the whole ContentDisposition class consistent.
@jansupol what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC 5987 describes:
- The character set the recipient is about to support. Mandatory is UTF-8, ISO-88592-1 is optional as per RFC 8187, and other charsets are possible. However, Jersey does not support the charsets per se, the PR only checks the EBNF. From this point of view, I am not sure it makes sense to prohibit other charsets.
- File-name value characters. Jersey encodes data such as URL, query params,... for the users. This PR does not encode anything, however. Since the encoding is left on the user, I am not sure whether the code should restrict
filename*
values for characters that do not match the regular expression. The encoding is left on the users. Should they use a non-encoded symbol it is they who would deal with it at the end. The corresponding bug specifically asks for being able to add non-encoded symbols.
For these two reasons, should we just have the following?:
private void createParameters() throws ParseException {
fileName = parameters.get("filename");
String fileNameExt = parameters.get("filename*");
if (fileNameExt != null) {
fileName = fileNameExt;
}
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal behaviour would be:
- Check whether the file name is encoded. If not, check the char set. Only UTF-8 needs to be supported per RFC 8187, and only UTF-8 is supported by Jersey internal econding utility
UriComponent#encode
. If the not encoded and not UTF-8, throw exception. If UTF-8, and non encoded, encode. - If encoded, we do not care about the char set. UTF-8, ISO-8859-2, or any other is fine. Just check for valid characters, if not valid, throw exception. If valid check language.
- The language should be checked by the regular expression you have in PR. If not valid, throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes :
- Added encoding for a filename* parameter if it is valid, not encoded, and its charset is UTF-8.
- Jersey will throw an exception if a filename* parameter is not valid.
At the same time I left only 2 possible charset for a filename* parameter (ISO-8859-1 and UTF-8) because as mentioned in RFC 5987 :
Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1"
([ISO-8859-1]) character set.
Signed-off-by: Andrii Serkes <[email protected]>
tests/e2e/src/test/java/org/glassfish/jersey/tests/api/ContentDispositionTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrii Serkes <[email protected]>
…if a filename parameter is not valid Signed-off-by: aserkes <[email protected]>
media/multipart/src/main/java/org/glassfish/jersey/media/multipart/ContentDisposition.java
Show resolved
Hide resolved
Signed-off-by: aserkes <[email protected]>
if (matcher.group(CHARSET_GROUP_NAME).equalsIgnoreCase("UTF-8")) { | ||
return new StringBuilder(matcher.group(CHARSET_GROUP_NAME)) | ||
.append("'") | ||
.append(matcher.group(LANG_GROUP_NAME) == null ? "" : matcher.group(LANG_GROUP_NAME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regexps are possibly not the fastest piece of code. Can you store the result of matcher.group(LANG_GROUP_NAME)
and reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: aserkes <[email protected]>
I would advise to bump copyright year to 2021 since those files where modified in the current year already. |
Signed-off-by: aserkes <[email protected]>
Add support RFC 5987 for attribute filename* in HTTP header Content-Disposition