fix: Support non-Unicode paths #108

joshuamegnauth54 · 2024-06-16T06:04:24Z

Closes: #105

I only replaced the unwrap()s that directly affect the issue. I also wrote unit tests to verify the new code works.

I switched url to a lower level crate, urlencoding, because url only operated on Rust Strings.

Finally, and probably most concerning, I modified the public API a teensy bit. TrashItem 's name field is an OsString now instead of a String. Beyond that, I didn't modify any public APIs.

There are several spots where paths are assumed to be Unicode. However, some (all?) operating systems support non-Unicode paths which causes `trash-rs` to panic if encountered. I switched some of those code to use `OsString`s instead of `String`s. Unfortunately, I had to add a new dependency, `urlencoding`, in order to properly handle decoding non-UTF8 byte slices. As of this commit, the test suite passes and code should be ready, but I will try to remove the `url` crate and use `urlencoding` in its place in the next commit. Closes: Byron#105

(Also, run `cargo fmt`)

joshuamegnauth54 · 2024-06-16T06:11:14Z

Hm, I'm guessing that macOS paths are always Unicode? I can feature gate that new test if necessary.

Edit: Oops, I thought clippy would check the code for Windows and Mac as well. I'll have to edit the code for those a bit too.

Byron

Thanks a lot for tackling this, it's much appreciated!

And the breaking change is well worth the benefit of not making assumptions about the encoding of filesystem paths.

I left a few comments, but they are minor and what's really preventing a merge right now is CI. It's interesting to see that MacOS filesystems don't allow illegal UTF8 at all, something I wasn't even aware of, but certainly am grateful for looking at the legacy to deal with on Windows filesystems.

src/freedesktop.rs

Byron · 2024-06-16T06:47:47Z

src/freedesktop.rs

-        let in_trash_name = if appendage > 1 {
-            format!("{}.{}", filename.to_str().unwrap(), appendage)
+        let in_trash_name: Cow<'_, OsStr> = if appendage > 1 {
+            // Length of the digits plus the dot


Actually it seems that 999 yields 2, so the +1 is compensating for that only with the . missing in the calculation. (see playground).

Personally I find this memory optimisation in code that deals with comparatively slow disk a bit of a cognitive burden, despite being guilty of doing it myself when I can.

Here I do recommend to remove it, particularly when looking at decode_uri_path I think avoiding allocations here is futile and not worth the added complexity.

Byron · 2024-06-16T06:59:52Z

src/freedesktop.rs

+
+        // Add invalid UTF-8 byte
+        let mut bytes = base.into_encoded_bytes();
+        bytes.push(168);


That's the quickest way to do this that I have seen so far! Byte 168, I shall remember that.

Byron · 2024-06-16T07:01:37Z

src/freedesktop.rs

+        let fake = format!("/tmp/{}", get_unique_name());
+        let path = decode_uri_path(&fake);
+
+        assert_eq!(fake, path.to_str().expect("Path is valid Unicode"), "Decoded path shouldn't be different");


Generally, I go with assert_eq(actual, expected), but it might be there is precedent for doing it the other way around somewhere in the codebase already. But if not, let's go with (actual, expected) instead.

* Both UTF8 tests are gated to non-macOS, non-Android, and non-iOS Unixes.

joshuamegnauth54 · 2024-06-17T06:28:18Z

Alright, so I made the requested changes and gated the non-UTF8 tests. I'm not completely sure, but according to this Super User question macOS only supports Unicode file names.

I added an additional, gated test to check that listing trash items with invalid Unicode names succeeds. CI is passing for everything except Windows, but I'll work on that tomorrow.

Let me know if there's anything I missed. 😁

Byron · 2024-06-17T07:10:15Z

Thanks, looks good now, and thanks for the extra test!

Once Windows passes, this can be merged.

* Removes an unneeded `unsafe`

I opted to not remove the UTF8 verification for Windows' platform specific code. While it's unlikely that Windows' API would return invalid Strings, the extra check for a filename can't hurt whereas removing it would require modifying a decent chunk of the code. The old code performed the check, and converting a String to an OsString is free. Path of least resistance.

joshuamegnauth54 · 2024-06-18T05:47:35Z

Ooh nice it passed! The commit messages explain the changes. I didn't have to do anything major.

Byron · 2024-06-18T07:47:09Z

Awesome work, thanks so much!

Here is the new release: https://github.com/Byron/trash-rs/releases/tag/v5.0.0

joshuamegnauth54 added 2 commits June 16, 2024 01:32

Remove url and replace with urlencoding

67fb256

(Also, run `cargo fmt`)

Byron reviewed Jun 16, 2024

View reviewed changes

joshuamegnauth54 added 2 commits June 17, 2024 00:41

Cleanup non-Unicode support for readability

2f31116

Impl test for listing invalid UTF8 trash items

209db9d

* Both UTF8 tests are gated to non-macOS, non-Android, and non-iOS Unixes.

joshuamegnauth54 added 2 commits June 17, 2024 22:37

Simplify Linux/BSD only tests for non-UTF8 paths

559b57b

* Removes an unneeded `unsafe`

joshuamegnauth54 force-pushed the non-utf8-paths branch from 85bb8d0 to e4b7119 Compare June 18, 2024 05:38

Byron merged commit 0971b8f into Byron:master Jun 18, 2024
4 checks passed

joshuamegnauth54 deleted the non-utf8-paths branch June 19, 2024 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Support non-Unicode paths #108

fix: Support non-Unicode paths #108

joshuamegnauth54 commented Jun 16, 2024

joshuamegnauth54 commented Jun 16, 2024 •

edited

Loading

Byron left a comment •

edited

Loading

Byron Jun 16, 2024

Byron Jun 16, 2024

Byron Jun 16, 2024

joshuamegnauth54 commented Jun 17, 2024

Byron commented Jun 17, 2024

joshuamegnauth54 commented Jun 18, 2024

Byron commented Jun 18, 2024

fix: Support non-Unicode paths #108

fix: Support non-Unicode paths #108

Conversation

joshuamegnauth54 commented Jun 16, 2024

joshuamegnauth54 commented Jun 16, 2024 • edited Loading

Byron left a comment • edited Loading

Choose a reason for hiding this comment

Byron Jun 16, 2024

Choose a reason for hiding this comment

Byron Jun 16, 2024

Choose a reason for hiding this comment

Byron Jun 16, 2024

Choose a reason for hiding this comment

joshuamegnauth54 commented Jun 17, 2024

Byron commented Jun 17, 2024

joshuamegnauth54 commented Jun 18, 2024

Byron commented Jun 18, 2024

joshuamegnauth54 commented Jun 16, 2024 •

edited

Loading

Byron left a comment •

edited

Loading