-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Support non-Unicode paths #108
Conversation
There are several spots where paths are assumed to be Unicode. However, some (all?) operating systems support non-Unicode paths which causes `trash-rs` to panic if encountered. I switched some of those code to use `OsString`s instead of `String`s. Unfortunately, I had to add a new dependency, `urlencoding`, in order to properly handle decoding non-UTF8 byte slices. As of this commit, the test suite passes and code should be ready, but I will try to remove the `url` crate and use `urlencoding` in its place in the next commit. Closes: Byron#105
(Also, run `cargo fmt`)
Hm, I'm guessing that macOS paths are always Unicode? I can feature gate that new test if necessary. Edit: Oops, I thought clippy would check the code for Windows and Mac as well. I'll have to edit the code for those a bit too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for tackling this, it's much appreciated!
And the breaking change is well worth the benefit of not making assumptions about the encoding of filesystem paths.
I left a few comments, but they are minor and what's really preventing a merge right now is CI. It's interesting to see that MacOS filesystems don't allow illegal UTF8 at all, something I wasn't even aware of, but certainly am grateful for looking at the legacy to deal with on Windows filesystems.
src/freedesktop.rs
Outdated
let in_trash_name = if appendage > 1 { | ||
format!("{}.{}", filename.to_str().unwrap(), appendage) | ||
let in_trash_name: Cow<'_, OsStr> = if appendage > 1 { | ||
// Length of the digits plus the dot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems that 999
yields 2
, so the +1
is compensating for that only with the .
missing in the calculation. (see playground).
Personally I find this memory optimisation in code that deals with comparatively slow disk a bit of a cognitive burden, despite being guilty of doing it myself when I can.
Here I do recommend to remove it, particularly when looking at decode_uri_path
I think avoiding allocations here is futile and not worth the added complexity.
|
||
// Add invalid UTF-8 byte | ||
let mut bytes = base.into_encoded_bytes(); | ||
bytes.push(168); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the quickest way to do this that I have seen so far! Byte 168
, I shall remember that.
src/freedesktop.rs
Outdated
let fake = format!("/tmp/{}", get_unique_name()); | ||
let path = decode_uri_path(&fake); | ||
|
||
assert_eq!(fake, path.to_str().expect("Path is valid Unicode"), "Decoded path shouldn't be different"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I go with assert_eq(actual, expected)
, but it might be there is precedent for doing it the other way around somewhere in the codebase already. But if not, let's go with (actual, expected)
instead.
* Both UTF8 tests are gated to non-macOS, non-Android, and non-iOS Unixes.
Alright, so I made the requested changes and gated the non-UTF8 tests. I'm not completely sure, but according to this Super User question macOS only supports Unicode file names. I added an additional, gated test to check that listing trash items with invalid Unicode names succeeds. CI is passing for everything except Windows, but I'll work on that tomorrow. Let me know if there's anything I missed. 😁 |
Thanks, looks good now, and thanks for the extra test! Once Windows passes, this can be merged. |
* Removes an unneeded `unsafe`
I opted to not remove the UTF8 verification for Windows' platform specific code. While it's unlikely that Windows' API would return invalid Strings, the extra check for a filename can't hurt whereas removing it would require modifying a decent chunk of the code. The old code performed the check, and converting a String to an OsString is free. Path of least resistance.
85bb8d0
to
e4b7119
Compare
Ooh nice it passed! The commit messages explain the changes. I didn't have to do anything major. |
Awesome work, thanks so much! Here is the new release: https://github.com/Byron/trash-rs/releases/tag/v5.0.0 |
Closes: #105
I only replaced the
unwrap()
s that directly affect the issue. I also wrote unit tests to verify the new code works.I switched
url
to a lower level crate,urlencoding
, becauseurl
only operated on Rust Strings.Finally, and probably most concerning, I modified the public API a teensy bit.
TrashItem
'sname
field is anOsString
now instead of aString
. Beyond that, I didn't modify any public APIs.