Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Support non-Unicode paths #108

Merged
merged 6 commits into from
Jun 18, 2024
Merged

Conversation

joshuamegnauth54
Copy link
Contributor

Closes: #105

I only replaced the unwrap()s that directly affect the issue. I also wrote unit tests to verify the new code works.

I switched url to a lower level crate, urlencoding, because url only operated on Rust Strings.

Finally, and probably most concerning, I modified the public API a teensy bit. TrashItem 's name field is an OsString now instead of a String. Beyond that, I didn't modify any public APIs.

There are several spots where paths are assumed to be Unicode. However,
some (all?) operating systems support non-Unicode paths which causes
`trash-rs` to panic if encountered. I switched some of those code to use
`OsString`s instead of `String`s. Unfortunately, I had to add a new
dependency, `urlencoding`, in order to properly handle decoding non-UTF8
byte slices.

As of this commit, the test suite passes and code should be ready, but I
will try to remove the `url` crate and use `urlencoding` in its place
in the next commit.

Closes: Byron#105
@joshuamegnauth54
Copy link
Contributor Author

joshuamegnauth54 commented Jun 16, 2024

Hm, I'm guessing that macOS paths are always Unicode? I can feature gate that new test if necessary.

Edit: Oops, I thought clippy would check the code for Windows and Mac as well. I'll have to edit the code for those a bit too.

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this, it's much appreciated!

And the breaking change is well worth the benefit of not making assumptions about the encoding of filesystem paths.

I left a few comments, but they are minor and what's really preventing a merge right now is CI. It's interesting to see that MacOS filesystems don't allow illegal UTF8 at all, something I wasn't even aware of, but certainly am grateful for looking at the legacy to deal with on Windows filesystems.

src/freedesktop.rs Show resolved Hide resolved
let in_trash_name = if appendage > 1 {
format!("{}.{}", filename.to_str().unwrap(), appendage)
let in_trash_name: Cow<'_, OsStr> = if appendage > 1 {
// Length of the digits plus the dot
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it seems that 999 yields 2, so the +1 is compensating for that only with the . missing in the calculation. (see playground).

Personally I find this memory optimisation in code that deals with comparatively slow disk a bit of a cognitive burden, despite being guilty of doing it myself when I can.

Here I do recommend to remove it, particularly when looking at decode_uri_path I think avoiding allocations here is futile and not worth the added complexity.


// Add invalid UTF-8 byte
let mut bytes = base.into_encoded_bytes();
bytes.push(168);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the quickest way to do this that I have seen so far! Byte 168, I shall remember that.

let fake = format!("/tmp/{}", get_unique_name());
let path = decode_uri_path(&fake);

assert_eq!(fake, path.to_str().expect("Path is valid Unicode"), "Decoded path shouldn't be different");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I go with assert_eq(actual, expected), but it might be there is precedent for doing it the other way around somewhere in the codebase already. But if not, let's go with (actual, expected) instead.

* Both UTF8 tests are gated to non-macOS, non-Android, and non-iOS
  Unixes.
@joshuamegnauth54
Copy link
Contributor Author

Alright, so I made the requested changes and gated the non-UTF8 tests. I'm not completely sure, but according to this Super User question macOS only supports Unicode file names.

I added an additional, gated test to check that listing trash items with invalid Unicode names succeeds. CI is passing for everything except Windows, but I'll work on that tomorrow.

Let me know if there's anything I missed. 😁

@Byron
Copy link
Owner

Byron commented Jun 17, 2024

Thanks, looks good now, and thanks for the extra test!

Once Windows passes, this can be merged.

I opted to not remove the UTF8 verification for Windows' platform
specific code. While it's unlikely that Windows' API would return
invalid Strings, the extra check for a filename can't hurt whereas
removing it would require modifying a decent chunk of the code.

The old code performed the check, and converting a String to an OsString
is free. Path of least resistance.
@joshuamegnauth54
Copy link
Contributor Author

Ooh nice it passed! The commit messages explain the changes. I didn't have to do anything major.

@Byron Byron merged commit 0971b8f into Byron:master Jun 18, 2024
4 checks passed
@Byron
Copy link
Owner

Byron commented Jun 18, 2024

Awesome work, thanks so much!

Here is the new release: https://github.com/Byron/trash-rs/releases/tag/v5.0.0

@joshuamegnauth54 joshuamegnauth54 deleted the non-utf8-paths branch June 19, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when deleting file with a non-UTF-8 name
2 participants