Switch from `encoding` to `encoding_rs`. #9

Ethiraric · 2024-03-23T15:37:16Z

With these changes, we seem to retain the encoding functionalities we had before, aside from the Call variant.
I am not well versed in that side of parsing. Would this require more thorough testing?

cc @mkmik

Fixes #8

Ethiraric · 2024-03-23T15:41:00Z

Actually, I could use the *_without_replacement version and handle the Malformed variant.

Ethiraric · 2024-03-23T16:14:54Z

This proved to be much more complicated than I anticipated but it should work.

davvid

Good stuff, this looks good to me. I just had some minor sugs but nothing worth holding up a merge over.

davvid · 2024-03-24T01:03:18Z

src/yaml.rs

+            // If the output is full, we must reallocate.
+            (DecoderResult::OutputFull, bytes_read) => {
+                total_bytes_read += bytes_read;
+                output.reserve(input.len() / 10);


I'd lean towards removing the / 10 here since we're just reserving. It should lead to less reallocs overall. Another common strategy for really large buffers is to double the reserve() size each time we hit OutputFull.

The output is already reserved to the size of the input. Would it be common to have instances of inputs which, when converted to utf8, are more than 1.1x their size?

This definitely needs a max though. If the input size is less than 10 bytes that's a reserve of 0.

Ah, that's true. Most of the time we'd probably expect something encoded in utf32 to become smaller when converted to utf8, so the / 10 makes sense, especially with a short comment to explain the magic number. / 4 would give a safe buffer for some pathological cases, but in light of this is probably overkill. I wonder whether this code path is ever hit in the wild.

Added a comment, thanks!

I wonder whether this code path is ever hit in the wild.

I know nearly nothing about encodings, but I'm curious 🤔

davvid · 2024-03-24T01:13:43Z

src/yaml.rs

+                            "Invalid character sequence at {byte_idx}: {malformed_sequence:?}",
+                        ))));
+                    }
+                    YAMLDecodingTrap::Call(f) => {


sug: rename f to fun or func.

See rustsec/advisory-db#1605.

Ethiraric requested a review from davvid March 23, 2024 15:37

Ethiraric force-pushed the fix-8 branch from fc07ee3 to b01f02e Compare March 23, 2024 16:14

Ethiraric force-pushed the fix-8 branch 2 times, most recently from 289a091 to d46eb0a Compare March 23, 2024 16:21

davvid approved these changes Mar 24, 2024

View reviewed changes

Switch from encoding to encoding_rs.

4f76346

See rustsec/advisory-db#1605.

Ethiraric force-pushed the fix-8 branch from d46eb0a to 4f76346 Compare March 24, 2024 16:14

Ethiraric merged commit 4f76346 into master Mar 24, 2024
3 checks passed

Ethiraric deleted the fix-8 branch March 24, 2024 16:16

danielhjacobs mentioned this pull request Mar 25, 2024

Migrated from yaml-rust to its well-maintaned fork mitsuhiko/insta#461

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from `encoding` to `encoding_rs`. #9

Switch from `encoding` to `encoding_rs`. #9

Ethiraric commented Mar 23, 2024

Ethiraric commented Mar 23, 2024

Ethiraric commented Mar 23, 2024

davvid left a comment

davvid Mar 24, 2024

Ethiraric Mar 24, 2024

davvid Mar 24, 2024

Ethiraric Mar 24, 2024

davvid Mar 24, 2024

Switch from encoding to encoding_rs. #9

Switch from encoding to encoding_rs. #9

Conversation

Ethiraric commented Mar 23, 2024

Ethiraric commented Mar 23, 2024

Ethiraric commented Mar 23, 2024

davvid left a comment

Choose a reason for hiding this comment

davvid Mar 24, 2024

Choose a reason for hiding this comment

Ethiraric Mar 24, 2024

Choose a reason for hiding this comment

davvid Mar 24, 2024

Choose a reason for hiding this comment

Ethiraric Mar 24, 2024

Choose a reason for hiding this comment

davvid Mar 24, 2024

Choose a reason for hiding this comment

Switch from `encoding` to `encoding_rs`. #9

Switch from `encoding` to `encoding_rs`. #9