Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Self in type definitions allowing enum List<T> { Nil, Cons(T, Box<Self>) } #2300

Merged
merged 12 commits into from
Mar 23, 2018

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Jan 17, 2018

Rendered

Tracking issue


The special Self identifier is now permitted in struct, enum, and union
type definitions. A simple example struct is:

enum List<T>
where
    Self: PartialOrd<Self> // <-- Notice the `Self` instead of `List<T>`
{
    Nil,
    Cons(T, Box<Self>) // <-- And here.
}

Thanks to @eddyb for giving me the idea, and among others to @est31 for reviewing.

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 17, 2018
@withoutboats
Copy link
Contributor

@rfcbot fcp merge

To me, this is just a bug fix. I remember when it was really inconsistent where you could use Self, and we've mostly made it "everywhere," this seems like the last real gap. This RFC is also very thoroughly written. I figure FCPing it now will expedite the process because I think this position will have widespread consensus, though anyone with objections should feel free to raise them.

As a note, we'll probably want to watch what error message you get if you try to use Self as a parameter to self: struct Foo<Self>. If you do this in impls right now you get a kind of confusing error: https://play.rust-lang.org/?gist=e3e6c35eb2dfa79ac28f5d1ecb1e45f6&version=stable

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 17, 2018

Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged teams:

Concerns:

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jan 17, 2018
@mark-i-m
Copy link
Member

That was the fastest FCP I've ever seen 😆

@jonhoo
Copy link
Contributor

jonhoo commented Jan 17, 2018

I just want to point out that tuples also don't support Self constructors. See rust-lang/rust#42601 and rust-lang/rust#47085.

@Centril
Copy link
Contributor Author

Centril commented Jan 17, 2018

A note on Self for tuple structs is that this works:

struct F(usize); impl F { fn x() { Self { 0: 1337 }; } }

But I won't recommend anyone to write it that way =P

If we can make Self(1337) work, we should.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jan 17, 2018

This is certainly not a bug fix and also a breaking change.
The change may be worth it, but it needs to be considered more carefully, and with an eye on future compatibility.
(To be continued...)

@cramertj
Copy link
Member

@petrochenkov Can you explain why it's a breaking change? Neither of the following compiles today:

struct Foo<Self> { Variant(Self) }

struct Bar;
impl Bar {
    fn bar() {
        struct Baz(Self);
    }
}

@nrc
Copy link
Member

nrc commented Jan 19, 2018

@Fcpbot concern teaching when to use Self vs naming the type

I fully approve of the technical change here. However, I have a concern about best practices for where to use Self and where to spell things out. There are places where using Self is necessary (to name the self type in a trait, for example), and there are places that Self just happens (e.g., macro-generated code) and I think that has historically been our motivation. More and more I am seeing Self used as a type because it is convenient and Clippy even has a lint recommending using it sometimes. I'm not sure if this has been discussed or even who has responsibility for this kind of discussion, but I would like it to be. My personal feeling is that when reading code Self adds a level of indirection (I usually want to know the actual type, not that it is the self type, so I need to look at the impl or whatever to find out what Self is). This RFC includes in the guide text a recommendation for users to take advantage of the feature, so this is not just to make writing macros easier. I think it is worth considering the trade-offs here.

@nrc
Copy link
Member

nrc commented Jan 19, 2018

@rfcbot concern teaching when to use Self vs naming the type

@Centril
Copy link
Contributor Author

Centril commented Jan 19, 2018

@nrc I assume you are referring to the Migration advice section and specifically:

Since the new introduced syntax is often clearer and shorter, especially when dealing with long type names with many generic parameters, tools like clippy should add new lints that mention the new syntax and ask users to migrate to it.

I understand your concern about the level of indirection and I think it's a valid one.

However, I think that, and especially for longer and generic types, Self is easier to search for mentally, especially when there are many variants that could use Self. The level of indirection is not great here (as you mention, it is one level).

Also, often there won't be that many variants, so a reader can keep the enum MyType<T> header in visual scope without scrolling, so they can check it quickly...

With generic types, you also get a lot of nested < > angle brackets, which can make it hard to read. In the case of scenarios like Expr<Int>, Expr<T> as seen below I think using Self also makes it easier to see where you really are referring to Self and not the type constructor applied with different types.

#[derive(Debug)]
enum Expr<T: Ty> {
    Lit(T::Repr),
    Add(Box<Expr<Int>>, Box<Expr<Int>>),
    If(Box<Expr<Bool>>, Box<Self>, Box<Self>),
}

I myself prefer to write -> Self inside my own code as well as read -> Self in the documentation of others'. On the writing part, refactoring is made easier with Self, especially when it is used many times. On the reading side of the equation, I've always preferred seeing -> Option<Self::Item> rather than the concrete type expanded, because then I know where the type is coming from and understand the context better. For complex types, it is also easier to be sure that the type is really Self when seeing that identifier compared to when seeing the complex type fully expanded.

Syntax highlighting also has an easier time rendering Self in a special way compared to if you had written the expansion of Self out.

However, the standard library, as far as I had time to check, seems not to agree with me. I checked a few impls of Iterator, and fn next(&mut self) -> Option<Self::Item> is not used (as far as I can tell). Rather, the type used on the RHS of type Item = <some_type> is repeated instead of using Self::Item.

These are my thoughts so far on the subject. If you believe these do not assuage your concerns and are not motivation enough, I am happy to move that section out of the RFC and propose the lint for clippy at a later point in time.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jan 20, 2018

@cramertj

Can you explain why it's a breaking change? Neither of the following compiles today:

Phew. It is great that it errors now, because technically it should not error.
The error is "can't use type parameters from outer items", but if this case Self is not a type parameter, it's just a type alias.

I think we should continue this tradition and keep items as "barriers" for new Selfs proposed by this RFC even if these Selfs are not type parameters either, for future compatibility reasons.

@petrochenkov
Copy link
Contributor

Here I'll try to audit items and item-likes with regards to possible meaning of Self in type namespace for them.

  • extern crate, use, mod, extern block, global asm!, macro, macro_rules!. Self type doesn't make sense.
  • impl. Self has existing meaning - the type for which we implement methods/consts (this applies to both inherent impls and trait impls). This is the single possible reasonable meaning.
  • enum, struct, union. The meaning of Self is described by this RFC - it's a type of the item. This is the single possible reasonable meaning.
  • trait, trait alias. Self has existing meaning - "zeroth" type parameter of the trait. There's alternative possible meaning - object type associated with the trait (that would be more consistent with the meaning for structs/enums/etc).
  • type. There's a single possible reasonable meaning - it's a type of the type alias item, but this RFC doesn't propose it (btw, why?).
  • fn. Each function item have a unique type and Self can potentially refer to it.
  • static, const. Self can refer to the type of the const or static item, but it's kinda stretch (const MY_ZERO: u8 = Self::ZERO; anyone?).
  • type, fn, const in traits and impls. Self has existing meaning dictated by the outer trait or impl. There are possible alternative meanings inherited from "free" type, fn and const items.
  • type, fn, static in extern blocks. Meaning of Self can be inherited from "free" type, fn and static items.

So, the found inconsistencies are:

  • Self in trait doesn't refer to the type of the item, unlike with other items.
  • Self in type/fn/const in traits and impls is potentially inconsistent with Self in free type/fn/const items.

Future compatibility:
If we keep item "barriers" for Self (#2300 (comment)), then we can extend it to type, fn, const and static items in the future.

@eddyb
Copy link
Member

eddyb commented Jan 20, 2018

type. There's a single possible reasonable meaning - it's a type of the type alias item, but this RFC doesn't propose it (btw, why?).

type Foo = ... Foo ...; is always an error (cyclic dependency between types).
Correctly modelling it requires μ types (structural type recursion), which rustc doesn't currently have internally, and which pose some implementation problems (I defer to @nikomatsakis here).

@petrochenkov
Copy link
Contributor

type Foo = ... Foo ...; is always an error (cyclic dependency between types).

Ah, right, I was focused on name resolution too much.
In this case "supporting" or "not supporting" type means whether we report a "cyclic dependency" error or "unresolved name" error (the first one may be more useful).

@joshtriplett
Copy link
Member

I have a concern about how this potentially interacts with nested and unnamed types. Suppose we end up merging #2102 , and for that matter future extensions to that to allow defining struct and union types inline inside other types that contain them. In that case, we'd have two questions:

  1. Does Self refer to the immediately containing struct or union, or does it refer to the top-level struct or union? That ambiguity seems potentially confusing to me.
  2. If the former, that gives access to the type of unnamed fields, which Unnamed fields of struct and union type #2102 carefully went out of its way to avoid doing, to avoid introducing full anonymous struct/union types at this time.

@Centril
Copy link
Contributor Author

Centril commented Jan 20, 2018

@joshtriplett Thanks for bringing that to my attention!

It seems to me that unnamed/nested types in structs and unions are somewhat similar to "structs" inside variants of enums as in enum Foo { Bar { quux: usize, wibble: String }, .. } where Bar is struct-like. In that case, Self refers to Foo and not Bar - so it somewhat flows from that that Self should refer to the top-level struct or union in the case you brought up. I agree it is potentially confusing, but not enough to be make-or-break for this RFC in my opinion. If that sounds acceptable to you, I can include this note with a reference to #2102 inside the text of the RFC.

EDIT: a third option could be to simply not allow Self for now inside unnamed types and revisit at a later point in time, if that is your preference.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jan 20, 2018

@Centril
Enum variants will likely get their own types eventually (postponed RFC 1450).
In this case the third conservative option "do not allow Self inside enum variants" would make the feature significantly less useful.

@Centril
Copy link
Contributor Author

Centril commented Jan 20, 2018

@petrochenkov Right. Tho the third option could possibly not apply to enum variants (that is probably inconsistent?) and instead only to anonymous structs and unions.

In any case, I'd prefer to go with option 2, "Self always applies to the top level" which is simple and easy to remember.

@est31
Copy link
Member

est31 commented Jan 21, 2018

In any case, I'd prefer to go with option 2, "Self always applies to the top level" which is simple and easy to remember.

Sadly we already let Self apply to the innermost level: https://play.rust-lang.org/?gist=19d4ca95e57936c4a78d948f66a7edc9&version=stable

@Centril
Copy link
Contributor Author

Centril commented Jan 21, 2018

Sadly we already let Self apply to the innermost level: https://play.rust-lang.org/?gist=19d4ca95e57936c4a78d948f66a7edc9&version=stable

I don't think that one counts... ^,- In that case it is an item shadowing Self in an impl of a type, not in the type definition itself. I think that impl shadowing Self is intuitive. But since you brought it up, I'll amend the rule and still call it simple and easy to remember.

"Self always applies to the top level inside type definitions"

@Centril
Copy link
Contributor Author

Centril commented Jan 21, 2018

Added notes about RFC 2102 to the reference level explanation.

@0x7CFE
Copy link

0x7CFE commented Jan 25, 2018

In any case, I'd prefer to go with option 2, "Self always applies to the top level" which is simple and easy to remember.

IMO, it's not very intuitive in case of complex type hierarchies with unnamed fields.

I believe, the most reliable and forward compatible solution would be to disallow Self in such case and force the user to write fully clarified types. The same goes for other syntax sugar patterns where ambiguity arises. In that sense, Self expansion is similar to how elided lifetimes work.

So, my opinion would be to preserve the behavior consistent over all sugar expansion patterns.

@Centril
Copy link
Contributor Author

Centril commented Jan 27, 2018

@0x7CFE Please excuse my late reply =)

IMO, it's not very intuitive in case of complex type hierarchies with unnamed fields.

My intuition about this is that the type of unnamed structs and unions are not nameable, so Self can not refer to them, and they can not be constructed in an indirection independent of their outer type - so the only alternative that remains is that it refers to the only new name that has been introduced, that is, the name at the top level. I also think that any initial ambiguity can be solved by teaching the rule I've proposed explicitly both when talking about Self and when talking about unnamed fields.

I think that disallowing Self inside unnamed structs and such will inevitably lead to someone proposing that it be allowed eventually because they expected it to work, which it didn't, and so their intuition was broken. And I don't we can then offer a good rationale for why this was not allowed. The analogy with lifetimes is in my opinion somewhat strained since types and lifetimes are different beasts.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 30, 2018

@rfcbot fcp concern

I have a concern here. One thing we have talked about doing is permitting more kinds of items in the body of impls. In particular, I believe @aturon and I wanted to permit:

impl Trait for Foo {
    struct Bar { ... }
}

as an equivalent to:

struct _Bar { ... }
impl Trait for Foo {
    type Bar = _Bar;
}

But what would the meaning of Self be in such a case?

This seems to be a direct consequence of trying to give Self a meaning outside of an impl: it means that anything where Self has a meaning cannot then be put into an impl without having two competing meanings.

(Note that abstract type Bar: Trait in an impl is a place where we were going to allow this sort of item to be declared, since that is kind of "shorthand" for defining an abstract type and then setting the type alias Bar to it.)

@Centril
Copy link
Contributor Author

Centril commented Mar 7, 2018

@petrochenkov

The result is actively harmful - you can't e.g. search for Pat and get all the positions in which a pattern can appear in AST anymore.

This is true, and I'll include it in the drawbacks; I should have noted that Self in types is less advantageous in a large industrial compiler, however, with a small example such as:

enum JsonNode {
  Null,
  Bool(bool),
  Number(f64),
  String(String),
  Array(Vec<Self>),
  Map(HashMap<String, Self>),
}

I don't think this is a problem because you are not searching for all positions frequently. My main conclusion from your experiment is that the migration advice should be removed, which I've now done.

@nrc
Copy link
Member

nrc commented Mar 8, 2018

@rfcbot resolved teaching when to use Self vs naming the type

@nikomatsakis
Copy link
Contributor

So, I'm almost ready to mark my concern was resolved, but not quite: I would like an unresolved question to be added:

  • This syntax creates ambiguity if we ever permit types to be declared directly within impls (for example, as the value for an associated type). Do we ever want to support that, and if so, how should we resolve the ambiguity?

I realize that the "syntax" I semi-proposed earlier is only half the battle; the real question has to do with the confusion that will arise if we allow types to be directly declared within impls.

For example, using the new syntax:

impl Foo for SomeType {
    type Bar = struct { value: f32, next: Option<Self> };
    ...
}

What type does that next field have? Is it Option<SomeType>, or Option<SomeType::Bar>?

@nikomatsakis
Copy link
Contributor

Also, why doesn't the RFC permit Self in where-clauses? That seems very useful to me:

struct Foo<T>
    where Bar: SomeTrait<Self>
{
}

@Centril
Copy link
Contributor Author

Centril commented Mar 9, 2018

@nikomatsakis where clauses were intended but it slipped my mind ;) they are now permitted.
I've also added the unresolved question you wanted =)

@nikomatsakis
Copy link
Contributor

@rfcbot resolve self-in-impl-vs-self-in-type

@nikomatsakis
Copy link
Contributor

I still have some reservations about the ever-expanding scope of Self, but I'm willing to give it a try and see how it feels.

@burdges
Copy link

burdges commented Mar 9, 2018

I like declaring type directly within impls, but if you allow that then you might allow type aliases too, so

impl Foo for SomeType {
    type Self0 = Self;
    type Bar = struct { value: f32, next: Option<Self0> };
    ...
}

And maybe an analog of super::, like

impl Foo for SomeType {
    type Bar = struct { value: f32, next: Option<up::Self> };
    ...
}

@Centril
Copy link
Contributor Author

Centril commented Mar 10, 2018

@burdges Unfortunately, within an impl, we wouldn't be able to distinguish between type aliases and associated types syntactically unless you had something like alias Self0 = Self;.

The analog to super:: sounds interesting - tho I think up::Foo would refer to the module up. Perhaps this is all too niche?

@nikomatsakis
Copy link
Contributor

@rfcbot resolved self-in-impl-vs-self-in-type

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 12, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Mar 12, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 23, 2018

The final comment period is now complete.

@Centril Centril merged commit 28a10d5 into rust-lang:master Mar 23, 2018
@Centril
Copy link
Contributor Author

Centril commented Mar 23, 2018

Huzzah! This RFC has been merged!

Tracking issue: rust-lang/rust#49303

@Centril Centril deleted the rfc/self-in-typedefs branch March 23, 2018 13:25
@Centril Centril added A-data-types RFCs about data-types A-syntax Syntax related proposals & ideas A-resolve Proposals relating to name resolution. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-resolve Proposals relating to name resolution. A-syntax Syntax related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.