Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"char": add a type and function for Unicode Character Categories #1348

Closed
kud1ing opened this issue Dec 20, 2011 · 6 comments
Closed

"char": add a type and function for Unicode Character Categories #1348

kud1ing opened this issue Dec 20, 2011 · 6 comments

Comments

@kud1ing
Copy link

kud1ing commented Dec 20, 2011

For Unicode Character Categories see http://www.fileformat.info/info/unicode/category/index.htm

Haskell implements the type "GeneralCategory" and a function to determine a character's "GeneralCategory".
Their implementation goes like this:

I propose to write a Python script, which does something similar.

Having such a type and function in Rust enables us to correctly implement functions in the "char" module. See http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/Data-Char.html

@graydon
Copy link
Contributor

graydon commented Dec 20, 2011

The module-in-progress called 'unicode::' in libstd is where I was going to sketch out an interface to libicu. The decision is not actually very simple for most of the character classes, and ICU has this well handled. I guess we can expose it under core::char if everyone's cool with adopting a dependency on libicu?

@kud1ing
Copy link
Author

kud1ing commented Dec 21, 2011

libicu provides many additional desirable features, and it is probably present on most computers (Python uses it, so it should be fine for us).

Do we want to provide public libicu bindings or just use it internally in modules like "char", "str" etc?
I tend to lean for the latter.

@kud1ing
Copy link
Author

kud1ing commented Dec 21, 2011

To implement the functions in Rust's "char" correctly using libicu, i think we only need to call functions like "u_isspace()", "u_isdigit ()", "u_forDigit()" (http://icu-project.org/apiref/icu4c/uchar_8h.html).

We wouldn't need full libicu-bindings (including the many constants definitions) yet.

@kud1ing
Copy link
Author

kud1ing commented Dec 22, 2011

I think we should go for the libicu route. See #1370

@lambda-fairy
Copy link
Contributor

Can we re-open this? We don't depend on libicu any more, but there's still no easy way of finding a character's category.

@sourtin
Copy link

sourtin commented Jul 25, 2016

Sorry to comment on a thread so old, I actually just implemented much of the UCD (v9.0.0) here. It doesn't depend on libicu, nor the standard library, so hopefully it should be easy to use with projects (though it's probably not as reliable as ICU).

coastalwhite pushed a commit to coastalwhite/rust that referenced this issue Aug 5, 2023
celinval added a commit to celinval/rust-dev that referenced this issue Jun 4, 2024
…-lang#1730)

- Fix rust-lang#1348: Fix `cargo kani --debug` by redirecting kani-compiler
  logs to the STDERR so it doesn't conflict with cargo's output
  expectations.
- Fix rust-lang#1631: Remove `kani-compiler` logs from the output of `--verbose`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants