Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support RFC 5987 for attribute filename* in HTTP header Content-Disposition #4647

Merged
merged 8 commits into from
Jan 14, 2021
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2010, 2019 Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2010, 2020 Oracle and/or its affiliates. All rights reserved.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License v. 2.0, which is available at
Expand All @@ -20,6 +20,7 @@
import java.util.Collections;
import java.util.Date;
import java.util.Map;
import java.util.regex.Pattern;

import org.glassfish.jersey.message.internal.HttpDateFormat;
import org.glassfish.jersey.message.internal.HttpHeaderReader;
Expand All @@ -40,6 +41,8 @@ public class ContentDisposition {
private Date modificationDate;
private Date readDate;
private long size;
private static final Pattern FILENAME_EXT_VALUE_PATTERN =
Pattern.compile("(UTF-8|ISO-8859-1)'([a-z]{2,8}(-[a-z0-9]+)?)?'(%[a-f0-9]{2}|[a-z0-9!#$&+.^_`|~-])+");
Copy link
Member

@jbescos jbescos Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not expert on RFC 5987 but for example this (ISO-8859-1 in lower case) will not match:
iso-8859-1'language-us'abc%a1abc%a2%b1!#$&+.^_`|~-

It would be good to pick many examples of valid strings for testing.

Copy link
Contributor

@senivam senivam Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the PR should be to allow multi-language (non ASCII) file names. And, for example, this pattern fileNameExt = "UTF-8''українська_назва.pdf"; does not work. Same applies for the pattern: fileNameExt = "UTF-8''nombre_español.pdf";. And what is the most disapointing here, it does not allow capital letters (even ASCII) in the name.
@aserkes, could you please fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 5987 does not allow any symbols except in ASCII encoding. Symbols in any other encoding have to be encoded as described in https://tools.ietf.org/html/rfc3986#section-2.1

Copy link
Contributor

@senivam senivam Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the pattern for multi-language file name shall look like fileNameExt = "UTF-8''nombre_espa%c3%b1ol.pdf"
And what about language-tag? The RFC 5987 references Section 2.3 of [RFC2978] but I'm not sure it can include the word language itself. It might be that the whole [ language ] construct in the ext-value description shall be substituted by language code from RFC 2978. Or do I understand this wrong?
And yes, it is legitimate to indicate encoding character insensitive, so comment and fix about this is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language-Tag can exist or can be omitted. Requirements to this tag are described in https://tools.ietf.org/html/rfc5646#section-2.1 . This tag can have various content that is not always standardized and can expand, but main pattern for it looks like ([a-z]{2,8}(-[a-z0-9]+)?)? (if I understand RFC 5646 correctly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern seems to be relevant,

and another thing comes to mind - it might be good to extend existing constructor (along with related builders) with the fileNameExt parameter. Just not to take fileNameExt from header and to have the whole ContentDisposition class consistent.

@jansupol what do you think?

Copy link
Contributor

@jansupol jansupol Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC 5987 describes:

  1. The character set the recipient is about to support. Mandatory is UTF-8, ISO-88592-1 is optional as per RFC 8187, and other charsets are possible. However, Jersey does not support the charsets per se, the PR only checks the EBNF. From this point of view, I am not sure it makes sense to prohibit other charsets.
  2. File-name value characters. Jersey encodes data such as URL, query params,... for the users. This PR does not encode anything, however. Since the encoding is left on the user, I am not sure whether the code should restrict filename* values for characters that do not match the regular expression. The encoding is left on the users. Should they use a non-encoded symbol it is they who would deal with it at the end. The corresponding bug specifically asks for being able to add non-encoded symbols.

For these two reasons, should we just have the following?:

   private void createParameters() throws ParseException {
        fileName = parameters.get("filename");

        String fileNameExt = parameters.get("filename*");
        if (fileNameExt != null) {
            fileName = fileNameExt;
        }
        ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal behaviour would be:

  1. Check whether the file name is encoded. If not, check the char set. Only UTF-8 needs to be supported per RFC 8187, and only UTF-8 is supported by Jersey internal econding utility UriComponent#encode. If the not encoded and not UTF-8, throw exception. If UTF-8, and non encoded, encode.
  2. If encoded, we do not care about the char set. UTF-8, ISO-8859-2, or any other is fine. Just check for valid characters, if not valid, throw exception. If valid check language.
  3. The language should be checked by the regular expression you have in PR. If not valid, throw exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes :

  1. Added encoding for a filename* parameter if it is valid, not encoded, and its charset is UTF-8.
  2. Jersey will throw an exception if a filename* parameter is not valid.
    At the same time I left only 2 possible charset for a filename* parameter (ISO-8859-1 and UTF-8) because as mentioned in RFC 5987 :

Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1"
([ISO-8859-1]) character set.


protected ContentDisposition(final String type, final String fileName, final Date creationDate,
final Date modificationDate, final Date readDate, final long size) {
Expand Down Expand Up @@ -183,6 +186,11 @@ protected void addLongParameter(final StringBuilder sb, final String name, final
private void createParameters() throws ParseException {
fileName = parameters.get("filename");

String fileNameExt = parameters.get("filename*");
if (fileNameExt != null && FILENAME_EXT_VALUE_PATTERN.matcher(fileNameExt).matches()) {
fileName = fileNameExt;
}

creationDate = createDate("creation-date");

modificationDate = createDate("modification-date");
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2014, 2018 Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2014, 2020 Oracle and/or its affiliates. All rights reserved.
*
* This program and the accompanying materials are made available under the
* terms of the Eclipse Public License v. 2.0, which is available at
Expand Down Expand Up @@ -97,6 +97,29 @@ public void testToString() {
assertEquals(header, contentDisposition.toString());
}

@Test
public void testFileNameExt() {
final Date date = new Date();
final String dateString = HttpDateFormat.getPreferredDateFormat().format(date);
final String fileName = "test.file";
String fileNameExt = "testExt.file";
final String prefixHeader = contentDispositionType + ";filename=\"" + fileName + "\";"
+ "creation-date=\"" + dateString + "\";modification-date=\"" + dateString + "\";read-date=\""
+ dateString + "\";size=1222" + ";name=\"testData\";" + "filename*=\"";
String header = prefixHeader + fileNameExt + "\"";
try {
ContentDisposition contentDisposition =
new ContentDisposition(HttpHeaderReader.newInstance(header), true);
assertEquals(fileName, contentDisposition.getFileName());
fileNameExt = "ISO-8859-1'language-us'abc%a1abc%a2%b1!#$&+.^_`|~-";
header = prefixHeader + fileNameExt + "\"";
contentDisposition = new ContentDisposition(HttpHeaderReader.newInstance(header), true);
assertEquals(fileNameExt, contentDisposition.getFileName());
} catch (ParseException ex) {
fail(ex.getMessage());
}
}

protected void assertContentDisposition(final ContentDisposition contentDisposition, Date date) {
assertNotNull(contentDisposition);
assertEquals(contentDispositionType, contentDisposition.getType());
Expand Down