-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover error strings on Unix from_lossy_utf8 #99609
Conversation
Some language settings can result in unreliable UTF-8 being produced. This can result in failing to emit the error string, panicking instead. from_lossy_utf8 allows us to assume these strings usually will be fine.
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
Aiui some legacy encodings such as shift-jis aren't an ascii-superset. So if there's a system using such a locale they'd still garbage and this wouldn't really be a fix. |
Alas, "Not output garbage" is not the goal of this PR. A more complicated fix requires more nuanced handling, you are correct, but immediately after writing this, I started doing many more smaller changes to improve things further until I realized the The primary location to find Shift-JIS is in fact on Windows platforms, which have an even more eccentric interpretation of it than you might imagine. I am quite familiar with how Shift-JIS breaks systems, as it is what gives the phenomenon of incorrect character emission the name mojibake. |
I'm not understanding the relevance of Windows here? In Rust for Windows, In Windows itself all local encodings are lossily converted to/from UTF-16. I guess the only issue would be reading the stdout of a third party tool that spits out a locale-specific encoding. |
@ChrisDenton The only relevance is that to Actually Fix the Issue so that we don't ever emit garbage would require changing the signature of And I may or may not get around to doing all that. However, here the standard library is panicking needlessly while trying to construct a mere error report. Emitting junk data is genuinely preferable. "Really fixing it" can be left to future work. |
To be clear: Before I ever touched programming, I did localization. If you have questions about fonts, locales, languages, or encodings, I will happily answer them. But I would rather save any such explanations to the comments and commit messages of my intended future PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@workingjubilee asked me to take a look at this, and I think this makes sense given the justification in #99609 (comment), that this is just to avoid panicking and producing a good error message on a best-effort basis.
That said I won't snipe the review from the assigned reviewer, but r=me if they don't want to take it.
I'll also try to be clear. I do not have a problem with this PR at all. I think it's a good improvement over the current behavior. I would however be much more nervous about a more substantial fix that potentially also affects Windows as the original issue demonstrated a *nix specific problem. But I agree, this conversation can be better had at a more appropriate time. |
@bors r+ |
Sorry to have missed this. LGTM as well. Thanks @thomcc. |
… r=thomcc Recover error strings on Unix from_lossy_utf8 Some language settings can result in unreliable UTF-8 being produced. This can result in failing to emit the error string, panicking instead. from_lossy_utf8 allows us to assume these strings usually will be fine. This fixes rust-lang#99535.
☀️ Test successful - checks-actions |
Finished benchmarking commit (8e9c93d): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Some language settings can result in unreliable UTF-8 being produced.
This can result in failing to emit the error string, panicking instead.
from_lossy_utf8 allows us to assume these strings usually will be fine.
This fixes #99535.