Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

fix incorrect encoding of filenames with spaces in #2090

Merged
merged 7 commits into from
Mar 11, 2019

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Apr 1, 2017

@ara4n
Copy link
Member Author

ara4n commented Apr 1, 2017

btw, it looks like this never particularly deliberately special-cased non-utf8;
when Mjark added utf8 support in #259 he just left the original code path alone, assuming it worked.
@ara4n
Copy link
Member Author

ara4n commented Apr 1, 2017

bah, guess i need to fix the tests. will wait for PR review first

ara4n added a commit to matrix-org/sytest that referenced this pull request Apr 1, 2017
@ara4n
Copy link
Member Author

ara4n commented Apr 1, 2017

matrix-org/sytest#350 should cheer up sytest.

@erikjohnston
Copy link
Member

Given the compatibility problems I wonder if it would be a good idea to only use the utf8 foo when we need it?

Also, I wonder if we should start giving out the url with the filename at the end: e.g. https://jki.re/_matrix/media/v1/download/matrix.org/zFHVTzeUHJnsbomaeHEehfSI/matthew.jpg

@ara4n
Copy link
Member Author

ara4n commented Apr 11, 2017

i don't think any modern browsers have compatibility problems with the utf-8 encoding? or is there something specific i'm missing?

is your proposal to only use the utf-8 encoding if the filename includes spaces, then? this feels magical and fragile; we don't want it randomly breaking in weird ways on files with spaces - they surely shouldn't be special-cased at all.

Agreed that it might be nicer to switch to putting the filename on the end of the URL as the most reliable solution, but i think this one actually works?

@ara4n
Copy link
Member Author

ara4n commented Apr 19, 2017

this is getting really annoying, my downloads folder is filling up with files with beautiful names like 2017-04-12.1%20Vector%20Stuff%20Whatever.pptx :(

@richvdh
Copy link
Member

richvdh commented Sep 5, 2017

@matrixbot retest this please

@erikjohnston erikjohnston assigned ara4n and unassigned erikjohnston May 3, 2018
@ara4n
Copy link
Member Author

ara4n commented Aug 15, 2018

empirically my Downloads folder is no longer filling up with 2017-04-12.1%20Vector%20Stuff%20Whatever.pptx, but also this never got merged. any idea what changed?

@richvdh
Copy link
Member

richvdh commented Feb 27, 2019

empirically my Downloads folder is no longer filling up with 2017-04-12.1%20Vector%20Stuff%20Whatever.pptx, but also this never got merged. any idea what changed?

I suspect you've updated to a more recent version of Chrome, which (incorrectly, afaict) decodes the %-encoding in the filename. This is still a bug on Firefox, at least as of 62.0

@richvdh
Copy link
Member

richvdh commented Feb 27, 2019

for the record: RFC6266 defines both filename and filename*.

filename is defined to be a value, which is defined by RFC2616 to be a token or a quoted-string, where a token is (essentially) a single US-ASCII word, and a quoted-string is a US-ASCII string surrounded by double-quotes, using backslash as an escape charater. Note that %-encoding is not permitted.

filename* is defined to be an ext-value, which is defined in RFC5987 to be charset "'" [ language ] "'" value-chars, where value-chars is essentially a %-encoded string in the given charset.

So the fix in this PR is correct inasmuch as it means we always use the filename* variant, where %-encoding is permitted.

@richvdh
Copy link
Member

richvdh commented Feb 27, 2019

however, I am minded to think that we should instead fix the escaping used in the filename parameter, for maximum compatibility.

@codecov
Copy link

codecov bot commented Feb 27, 2019

Codecov Report

Merging #2090 into develop will decrease coverage by 16.5%.
The diff coverage is 100%.

@@             Coverage Diff              @@
##           develop    #2090       +/-   ##
============================================
- Coverage    75.08%   58.58%   -16.51%     
============================================
  Files          340      324       -16     
  Lines        34895    33769     -1126     
  Branches      5717     5576      -141     
============================================
- Hits         26201    19782     -6419     
- Misses        7080    12609     +5529     
+ Partials      1614     1378      -236

@richvdh richvdh changed the title always treat repo filenames as utf-8 fix incorrect encoding of filenames with spaces in Feb 27, 2019
@richvdh
Copy link
Member

richvdh commented Feb 27, 2019

I've substantially rewritten this, so a sanity-check from someone else on the team would be appreciated. I suggest just looking at the diff, which should speak for itself, rather than trying to pick your way through the history.

Tests at matrix-org/sytest#350.

@richvdh richvdh requested a review from a team February 27, 2019 23:15
# [2]: https://tools.ietf.org/html/rfc2616#section-3.6
# [3]: https://tools.ietf.org/html/rfc5987#section-3.2.1

# We avoid the quoted-string version of filename, because (a) synapse didn't
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd suggest saying `filename` here, to make it clear you're talking about `filename` rather than `filename*`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point, done

# correctly interpret those as of 0.99.2 and (b) they are a bit of a pain and we
# may as well just do the filename* version.
if _can_encode_filename_as_token(upload_name):
disposition = 'inline; filename=%s' % (upload_name, )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do filename* everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main thinking is that some things won't have good support for filename, and we should cater to simple clients where possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean good support for filename*?

okay, fair enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, yes that's what I mean.

richvdh pushed a commit to matrix-org/sytest that referenced this pull request Mar 11, 2019
@richvdh richvdh merged commit 2326e00 into develop Mar 11, 2019
@richvdh richvdh deleted the matthew/fix-filename-escaping branch March 11, 2019 09:53
anoadragon453 added a commit to matrix-org/sytest that referenced this pull request Mar 13, 2019
…server_specific_tests

* 'develop' of github.com:matrix-org/sytest: (44 commits)
  test endpoint for updating backup versions (#559)
  Add test for transferring bans on a room upgrade (#563)
  fix tests for matrix-org/synapse#2090 (#350)
  Fix typo
  Fixup
  Incorporate review
  Remove prev_state from federation API
  Make the userdir synced not rely on being able to search for yourself (#567)
  Add basic soft fail test
  Check that event_id is given over state fetching over federation (#566)
  Fix registration rate limiting settings
  Fix comments
  Make sytest support auth rate limiting
  Fixup to not send 100 messages
  Test history visibility works for backfill
  Check that Server ACLs are preserved on room upgrade
  Better diagnostics from synapse startup (#561)
  Don't set PYTHONPATH when running synapse (#560)
  Regression test for redactions in room v3 (#558)
  Add test for PDU limit on transactions API (#555)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants