Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[char_property] Implement *CharProperty traits #89

Merged
merged 4 commits into from
Aug 9, 2017
Merged

[char_property] Implement *CharProperty traits #89

merged 4 commits into from
Aug 9, 2017

Conversation

behnam
Copy link
Member

@behnam behnam commented Aug 8, 2017

Character Properties are of different kinds and shapes, and as UNIC
components grow, we need a better way to be able to categorize them by
their shape, and a way to make sure we have consistent, noncolliding
API for them.

This is the first step into building a CharProperty taxonomy, with as
little as possibly needed to provide the assurances desired.

We hope that the implementation can be improved over time with new
features added to the language. There's already some proposals in this
front. See these discussions for more details:

@behnam behnam force-pushed the charprop branch 2 times, most recently from fef3999 to e2f3c6f Compare August 8, 2017 23:25
@CAD97
Copy link
Collaborator

CAD97 commented Aug 8, 2017

The first link gives me a "Oops! That page doesn’t exist or is private." I think you have a trailing > which got included in the URL. https://users.rust-lang.org/t/traits-as-contract-without-changes-to-call-sites/11938/11

The second link follows a couple redirects and then 404s. Since the URL starts with behnam/rust-unic, I suspect github's intra-github linking autocomplete got the better of you. rust-lang/rfcs/pull/1406

@behnam
Copy link
Member Author

behnam commented Aug 8, 2017

Oops. Fixed the links!

@behnam behnam force-pushed the charprop branch 3 times, most recently from 03a8168 to 39494f5 Compare August 9, 2017 01:46
@behnam
Copy link
Member Author

behnam commented Aug 9, 2017

Okay, I think I have covered all the basics of all the UCD properties we already have and now this PR is ready for review and land.

Here's the summary of implementations:

$ g g -e 'impl \w*CharProperty' unic/ucd/
unic/ucd/age/src/age.rs:impl CharProperty for Age {
unic/ucd/bidi/src/bidi_class.rs:impl CharProperty for BidiClass {
unic/ucd/bidi/src/bidi_class.rs:impl EnumeratedCharProperty for BidiClass {
unic/ucd/category/src/category.rs:impl CharProperty for GeneralCategory {
unic/ucd/category/src/category.rs:impl EnumeratedCharProperty for GeneralCategory {
unic/ucd/normal/src/canonical_combining_class.rs:impl CharProperty for CanonicalCombiningClass {
unic/ucd/normal/src/canonical_combining_class.rs:impl NumericCharProperty<u8> for CanonicalCombiningClass {
unic/ucd/normal/src/decomposition_type.rs:impl OptionCharProperty for DecompositionType {
unic/ucd/normal/src/decomposition_type.rs:impl EnumeratedCharProperty for DecompositionType {

What do you think, @CAD97?

Copy link
Collaborator

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

I've just left a few minor notes inline.

As for the matter of having trait methods in scope, it would make sense for the char_property macro to make these methods intrinsic and only expose them into the traits. Hopefully the future will make the syntax cleaner, but hey, it's hidden behind the macro implementation for now.

/// A Character Property defined on all characters.
///
/// Examples: *Age*, *Name*, *General_Category*, *Bidi_Class*
pub trait CharProperty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably extend from OptionCharProperty, as do Eq : PartialEq and Ord : PartialOrd.

I would then do a blanket impl from CharProperty to OptionCharProperty:

impl<P: CharProperty> OpionCharProperty for P {
    fn of(ch: char) -> Option<Self> {
        Some(<Self as CharProperty>::of(ch))
    }
}

Really rough playground proof of concept

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing about Eq/Ord and their Partial equivalents is that the function signature stays the same, and it's only the semantics that different.

Here, we have a different signature for of, which we can pull of to work somehow, but I don't think it's good practice.

Have you seen anything more like this case on std/3rd-party libs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only real example is the (ongoing?) discussion of an impl<T: From> TryFrom for T.

I'd be fine without this, but this would also simplify the marker for Enumerated/NumericCharProperty to just be OptionCharProperty since CharProperty implies CharProperty.

I don't really have an argument for it, but it just seems like "infallible transformation from char" is a subset of "fallible transformation from char". (Thus, the same use case as Result<_, !>?)

/// Usage Note: If the property is of type *Catalog* (as defined by Unicode), it's recommended to
/// (in some way) mark the type as *non-exhaustive*, so that adding new variants to the `enum` type
/// won't result in API breakage.
pub trait EnumeratedCharProperty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably extend from (Option)CharProperty, unless it would ever make sense to have a EnumeratedCharProperty which is not also a (Option)CharProperty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I can introduce a marker here to ensure more enforcement.

/// A Character Property with numeric values.
///
/// Examples: *Numeric_Value*, *Canonical_Combining_Class*
pub trait NumericCharProperty<Value>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably extend from (Option)CharProperty, unless it would ever make sense to have a NumericCharProperty which is not also a (Option)CharProperty.

pub trait NumericCharProperty<Value>
where
Self: Copy + Debug + Default + Display + Eq + Hash,
Value: NumericCharPropertyValue,
Copy link
Collaborator

@CAD97 CAD97 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of taste, but I would put this trait bound inside the <>. (So pub trait NumericCharProperty<Value: NumericCharPropertyValue>

Reasoning: it's simple, and is very important for the definition of the trait.

/// Return whether the given character is a combining mark (`General_Category=Mark`)
pub fn is_combining_mark(c: char) -> bool {
GENERAL_CATEGORY_MARK.find(c).is_some()
const TABLE: &'static [(char, char)] = include!("tables/general_category_mark.rsv");
TABLE.find(c).is_some()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait how is this working when you aren't useing unic_utils::CharDataTable anywhere in the file?

Copy link
Collaborator

@CAD97 CAD97 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it working with use unic_utils::CharDataTable; under mod mark and CharDataTable::<()>::find(&TABLE, c).is_some().

Unfortunately &[(char, char)] matches &[(char, V)] as well, thus the reason I want to get around to adding the CharRange structure.

Once all of unic-gen is engaged and I can make the transition to actually using it across the board, I'll work on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm splitting the clean up diff, plus a fix for this, into another PR. Will update this PR with only the char-props matters.

Character Properties are of different kinds and shapes, and as UNIC
components grow, we need a better way to be able to categorize them by
their shape, and a way to make sure we have consistent, noncolliding
API for them.

This is the first step into building a CharProperty taxonomy, with as
little as possibly needed to provide the assurances desired.

We hope that the implementation can be improved over time with new
features added to the language. There's already some proposals in this
front. See these discussions for more details:

* [Traits as contract, without changes to call-sites](https://users.rust-lang.org/t/traits-as-contract-without-changes-to-call-sites/11938/11>)

* [RFC: delegation of implementation](rust-lang/rfcs#1406)

// Make char property type T also a parameter to this trait, otherwise there will be "conflicting
// implementation" error when marking traits below as `CharProperty`.
impl<T> CharProperty<T> for T {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this makes the marker useless, since every T now is a CharProperty! :( Need to figure out another solution here...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like what you want is CompleteCharProperty and OptionCharProperty to be mutually exclusive...

Making OptionCharProperty a parent of CompleteCharProperty and using OptionCharProperty would solve the problem (and in any case I suspect of will be intrinsic and used as such), but it seems difficult to use this pattern like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can't figure out a way to tell compiler that these two traits are mutually exclusive.

What do you mean by "I suspect of() will be intrinsic and used as such"?

Copy link
Collaborator

@CAD97 CAD97 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to see (really sketchy syntax)

enum Property {}
impl Property {
  fn of()...
}
impl CharProperty for Property {
  fn of() = (Self as Property).of()...
}

if that works such that you can just do Property::of without knowing about CompleteCharProperty.

But ¯\_(ツ)_/¯ I'm just assuming shapes of unwritten things right now

/// A Character Property defined on all characters.
///
/// Examples: *Age*, *Name*, *General_Category*, *Bidi_Class*
pub trait CompleteCharProperty: PartialCharProperty + Default {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works very well. Thanks for the idea, @CAD97!

The only issue is the error if none of these are implemented and a Property Range type is used, in which case rustc claims that CompleteCharProperty is required to be implemented, which is invalid, and only PartialCharProperty is required. I suppose I should file an issue for this upstream.

@behnam behnam merged commit 473e128 into master Aug 9, 2017
@behnam behnam deleted the charprop branch August 9, 2017 21:06
@behnam
Copy link
Member Author

behnam commented Aug 9, 2017

Posted about the side issues notice here on the forum: https://users.rust-lang.org/t/unexpected-behaviors-of-trait-bounds/12286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants