-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467
Comments
last unresolved question isn't a real unresolved question, it was included in the RFC for completeness but does not block this issue. |
@joshtriplett Please check that the list of checkboxes above are satisfactory. :) @Manishearth alright; leave a note under it to that effect? |
The note saying so is already in the unresolved q |
Substituting "rare" or "unusual" for "less used" seems to me a simple, if not necessarily final, improvement, replacing the somewhat awkward "less used" with a single, shorter, more usual synonym. (Edit: I note that I personally oppose allowing non-ASCII identifiers, but I recognize that the Rust Team favors it, and I have no problem bowing to their decision and chipping in my cents to help.) |
I like "unusual"
-Manish Goregaokar
…On Mon, Oct 29, 2018 at 6:56 PM 8573 ***@***.***> wrote:
Is there a better name for the less_used_codepoints lint?
Substituting "rare" or "unusual" for "less used" seems to me a simple, if
not necessarily final, improvement, replacing the somewhat awkward "less
used" with a single, shorter, more usual synonym.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55467 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABivSBq4pwBDJPCioj_7Jlu_fx5eoCRNks5up09wgaJpZM4X-3kG>
.
|
I would prefer “rare” as it sounds more objective to me than “unusual”, and perhaps less judgemental as well. |
My first thought was "uncommon", but that's not strong enough of an adjective to get the intended meaning across. |
I'm partial towards "rare" as well; |
If we need something even stronger we might try "mythic". 😛 |
I prefer |
As a native and competent English-user who generally is seen as serious/boring [1], while I agree that "legendary" and "mythic" sound rather fantastical [2], I don't think "rare" does. The distinction I would draw between [1] (I recognize this may not have been a trait I displayed when I was a member of your channel.) |
I'm much against non-ascii identifiers, but I'm not going to repeat what other folks said. I just noticed that no example of malicious code was provided so far, so I'm providing one: fn list_items_in_category(category: &str) -> io::Result<String> {
let cate𝚐ory = sanitize_untrusted_input(category);
debug!("Listing category {}", cate𝚐ory);
system(format!("grep '^{},' /my/simple/database | awk -F , "{{ print $2 }}", category))
} Who can spot the problem without looking at character codes? As a side note, I'd like to provide my experience with attempts to localize everything (feel free to skip the rest of my comment if you want to remain technical). I went to a school where they translate literally every technical term. In an attempt to make everything understandable to everyone, they translated even things that are very difficult to translate reasonably. You'd expect that it was much easier to learn at that school compared to others schools that don't do that, right? Well, life is weird. It was hard to understand, I felt like Alice in wonderland and it took me a week to realize that "that weird term I didn't hear before" was the actual thing I wanted to study and the very reason I signed up for that specific school! Of course, this is not directly an argument against non-ascii identifiers. I just wanted to express my concerns to all those wonderful loving people (seriously) who want for everyone to feel great in Rust community, so that they remain vigilant and avoid accidentally going against their own beliefs. |
(Agreed. My comment was entirely in jest (as I hope should've been evident?), and I had no intention of tarnishing "rare" by association, a word which is itself ordinary and common.) |
It's trivial on my font. Also, we want to start with lints against this sort of thing. |
Github's monospace font configuration on my machine ended up using a binocular glyph for U+0067 LATIN SMALL LETTER G but a monocular one for U+1D690 MATHEMATICAL MONOSPACE SMALL G, so I spotted it instantly. That said, the |
The Oxford English Dictionary has as one definition of "uncommon":
which seems especially appropriate 😉(Emphasis mine.) I feel it's a more suitable choice of words than "rare", which has significantly more meanings than "uncommon", some associated specifically with being good, e.g. (also OED):
I also think more reserved wording is generally appropriate for compiler naming conventions. |
That's precisely what you're doing -- your "counterexample" is caught by both the less_used_codepoints lint and the confusable_idents lint. At this point these counterexample discussions have been done to death (and we're leveraging a unicode spec designed to deal with this!) -- please actually check if your "counterexample" isn't something we or the Unicode Consortium have thought of already. |
Exactly. People have thought a lot about this, and it's certainly possible to implement a feature that many people will find useful while dealing with complications that might arise. At this point, adding this feature is a given. If you have ideas about how to improve the lints for finding confusable identifiers, by all means share them, but there's no need to simply point out an issue that everyone is already aware of. |
Re. I was personally in favor of Everyone OK with this? |
works for me. Slight preference for rare but very slight. Both work for me, and I don't think it's really worth bikeshedding this too much :) |
@Manishearth Sorry, I didn't mean to argue that the example is unsolved, I just wanted to provide actual code for those who might have difficulties imagining how to turn this feature into something malicious. I wonder though, why confusables lint is not mandatory (according to RFC). It sounds to me like making borrow checker non-mandatory. My understanding is that Rust should be safe by default, where you can opt-into unsafety. For me, this means |
This isn't a matter of safety in the way that Rust describes it. Making the warnings on by default has been discussed. Please let's not use this tracking issue to relitigate things which have already been decided through a rather long RFC. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
A concern has been raised in #83923 that extern blocks are not handled as described in the RFC. I would appreciate considering addressing that before this is stabilized. I suspect a validation check would be easy to add if that indeed should be rejected. |
@joshtriplett #83936 has merged now |
@rfcbot resolved extern-blocks |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Stablization PR is at #83799 . |
… r=Manishearth Stablize `non-ascii-idents` This is the stablization PR for RFC 2457. Currently this is waiting on fcp in [tracking issue](rust-lang#55467). r? `@Manishearth`
…=Manishearth Stablize `non-ascii-idents` This is the stablization PR for RFC 2457. Currently this is waiting on fcp in [tracking issue](rust-lang#55467). r? `@Manishearth`
Stablization PR has landed, closing. |
Why not support Chinese names |
@Mr-Zzg They are! See https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=296bbb7bc2d69f8d2c4245b9df93992a This feature hasn't hit stable yet, it will in the next release. |
@Hexawolf GHSA-rcv6-wg5m-24v6 |
We are not affected by the homoglyph attack, please see the mitigations that were implemented as a part of this RFC. |
This is a tracking issue for the RFC "Allow non-ASCII identifiers" (rust-lang/rfcs#2457).
Steps:
#![forbid(non_ascii_idents)]
works. (non_ascii_idents
lint (part of RFC 2457) #61883)confusable_idents
(Implementconfusable_idents
lint. #71542, Implement mixed script confusable lint. #72770)less_used_codepoints
uncommon_codepoints
(Implement uncommon_codepoints lint. #67810)bad stylenon_standard_style
" lints. (See Split and expand nonstandard-style lints unicode unit test. #73839)mixed_script_confusables
(Implement mixed script confusable lint. #72770)mod фоо;
), extern crates and paths with a first segment naming a crate should not be able to do filesystem search using those non-ASCII identifiers (i.e. no ,extern crate ьаг;
orму_сгате::baz
). (Disallow loading crates with non-ascii identifier name. #73305)non-ascii-idents
#83799)Unresolved questions:
Resolved: DWARF and debuggers handle UTF-8 just fine
less_used_codepoints
lint?Resolved in favour of
uncommon_codepoints
Resolved in favor of
mixed_script_confusables
(Statics shadow local variables causing "refutable pattern error", and non-obvious bugs. #7526, We shouldn't even try to resolve irrefutable patterns as constants #49680)?
Can we improve precision of linting here?
mixed_script_confusables
, do we actually need to make an exception forLatin
identifiers?XID_Start
/XID_Continue
? XID_Start / XID_Continue might not be quite right #4928zulip channel topic for real-time discussion:
https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/nonascii.20identifiers(rfc.202457)
The text was updated successfully, but these errors were encountered: