Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Structural Records #2584

Closed
wants to merge 15 commits into from
Closed

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Nov 2, 2018

🖼️ Rendered

📝 Summary

Introduce structural records of the form { foo: 1u8, bar: true } of type { foo: u8, bar: bool } into the language. Another way to understand these sorts of objects is to think of them as "tuples with named fields", "unnamed structs", or "anonymous structs".

💖 Thanks

To @kennytm, @alexreg, @Nemo157, and @tinaun for reviewing the draft version of this RFC.
To @varkor and @pnkfelix for good and helpful discussions.

Closure comment

@Diggsey
Copy link
Contributor

Diggsey commented Nov 2, 2018

This looks like a super well thought out and comprehensive RFC.

It might be worth clarifying the behaviour of #[derive(...)] with respect to types like:

struct RectangleTidy {
    dimensions: {
        width: u64,
        height: u64,
    },
    color: {
        red: u8,
        green: u8,
        blue: u8,
    },
}

Presumably, there will be no "magic" here, and you will only be able to derive traits which are implemented for the anonymous structs themselves.

Another question is around trait implementations: at the moment, 3rd party crates can provide automatic implementations of their traits for "all" tuples by using macro expansion to implement them for 0..N element tuples. With anonymous structs this will not be possible. Especially with the likely arrival of const generics in the not too distant future, negating the need for macros entirely, anonymous structs will become second class citizens. Is this a problem, and are there any possible solutions to allow implementing traits for all anonymous structs?

other_stuff(color.1);
...
yet_more_stuff(color.2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn do_stuff_with((red, green, blue): (u8, u8, u8)) {
    some_stuff(red);
    other_stuff(green);
    yet_more_stuff(blue);
}

Copy link
Contributor Author

@Centril Centril Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, with #2522:

fn do_stuff_with((red: u8, green: u8, blue: u8)) {
    some_stuff(red);
    other_stuff(green);
    yet_more_stuff(blue);
}

However, while if you write it in this way it is clear what each tuple component means, it is not clear at the call site...

let color = (255, 0, 0);
// I can guess that this is (red, green, blue) because that's
// the usual for "color" but it isn't clear in the general case.

blue: u8,
},
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax looks very similar to another already accepted RFC, but semantics seems different - #2102.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; I have to think about if some unification can be done in some way here.

@Centril
Copy link
Contributor Author

Centril commented Nov 2, 2018

@Diggsey

This looks like a super well thought out and comprehensive RFC.

Thanks!

It might be worth clarifying the behaviour of #[derive(...)] with respect to types like:

struct RectangleTidy {
    dimensions: {
        width: u64,
        height: u64,
    },
    color: {
        red: u8,
        green: u8,
        blue: u8,
    },
}

Presumably, there will be no "magic" here, and you will only be able to derive traits which are implemented for the anonymous structs themselves.

This bit is not currently well specified, but it should be so I will fix that (EDIT: fixed)... I see two different ways to do it:

  1. Use the structural records in there as types; this means that #[derive(Default)] would produce:

    impl Default for RectangleTidy {
        fn default() -> Self {
            RectangleTidy {
                dimensions: Default::default(),
                color: Default::default()
            }
        }
    }

    The main benefit of this approach is that it is quite simple to implement. It will also work for all derivable standard library traits because implementations are auto-provided for structural records.

  2. Use the structural records in there as syntax; this means that #[derive(Default)] would produce:

    impl Default for RectangleTidy {
        fn default() -> Self {
            RectangleTidy {
                dimensions: {
                   width: Default::default(),
                   height: Default::default(),
                },
                color: {
                    red: Default::default(),
                    green: Default::default(),
                    blue: Default::default(),
                }
            }
        }
    }

    This would need to be done recursively in the macro but it shouldn't be very hard to implement.

    The main benefit of this is flexibility. More types would be derivable.

    However, due to the automatically provided implementations as noted in 1. there is no benefit to this approach for the standard library so I think the sane behaviour is 1.

    That said, a crate like serde can benefit from using the "magical" approach in 2. and we certainly cannot mandate what a procedural macro must do in the language so it will be up to each derive macro to decide what to do.

Another question is around trait implementations: at the moment, 3rd party crates can provide automatic implementations of their traits for "all" tuples by using macro expansion to implement them for 0..N element tuples.

Yup. Usually up to 12, emulating the way the standard library does this.

Especially with the likely arrival of const generics in the not too distant future,
negating the need for macros entirely, anonymous structs will become second
class citizens.

Nit: to not have to use macros for implementing traits for tuples you need variadic generics, not const generics. :)

Is this a problem, and are there any possible solutions to allow implementing traits for all anonymous structs?

I think it might be possible technically; I've written down some thoughts about it in the RFC. However, the changes needed to make it possible might not be what folks want.

However, not solving the issue might also be good; by not solving the issue you add a certain pressure to gradually move towards nominal typing once there is enough operations and structure that you want on the type.

standard traits that are implemented for [tuples]. These traits are: `Clone`,
`Copy`, `PartialEq`, `Eq`, `PartialOrd`, `Ord`, `Debug`, `Default`, and `Hash`.
Each of these traits will only be implemented if all the field types of a struct
implements the trait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's... quite a bit of magic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; this is noted in the drawbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideas for possible magic reduction discussed below in #2584 (comment).

+ For `PartialEq`, each field is compared with same field in `other: Self`.

+ For `ParialOrd` and `Ord`, lexicographic ordering is used based on
the name of the fields and not the order given because structural records
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With macros fields can have names with same textual representation, but different hygienic contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that change anything wrt. the implementations provided tho? I don't see any problems with hygiene intuitively, but maybe I can given elaboration?

Copy link
Contributor

@petrochenkov petrochenkov Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably relevant to other places where the order of fields needs to be "normalized" as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. :) But I'm not sure if you are pointing out a problem or just noting...
Any sorting / "normalization" is done post expansion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how to sort hygienic contexts in stable order, especially in cross-crate scenarios. That probably can be figured out somehow though.

```rust
ty ::= ... | ty_srec ;

ty_srec ::= "{" (ty_field ",")+ (ty_field ","?)? "}" ;
Copy link
Contributor

@petrochenkov petrochenkov Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think types can start with { syntactically.
There are multiple already existing ambiguities where parser assumes that types cannot start with {, and upcoming const generics introduce one more big ambiguity ({} is used to disambiguate in favor of const generic arguments).
#2102 uses struct { field: Type, ... } to solve this (and also to discern between struct { ... } and union { ... }).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about expressions/patterns.
Unlimited lookahead may also be required, need to check more carefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple already existing ambiguities where parser assumes that types cannot start with {

Such as? (examples please...)

Const generics allow literals and variables as expressions but anything else needs to be in { ... }.
This is not ambiguous with structural records (because the one-field-record requires a comma...) but requires lookahead.

Unlimited lookahead may also be required, need to check more carefully.

Scroll down ;) It's discussed in the sub-section "Backtracking".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's discussed in the sub-section "Backtracking".

It's a huge drawback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as? (examples please...)

Nothing that can't be solved by infinite lookahead, but here's "how to determine where where clause ends":

fn f() where PREDICATE1, PREDICATE2, { a: ...

Does { a: ... belong to the function body or to the third predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laaas Because it is inconsistent with Tuples/Tuple-structs and also less ergonomic.

I don't think it's confusing for readers, the one-field case is clearly disambiguated by an extra comma and that is consistent with how tuples are treated. In other words, if { x, } is confusing, then so is (x,).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's going to be drawbacks of this pattern for the structural records, either for slowing parser or cause types can't start with { syntactically, couldn't we use .{ instead?

Inspiration by Zig programming language

Copy link
Contributor Author

@Centril Centril Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've been thinking about this recently and I've come to the conclusion that if we lean into the ambiguity, then this is an non-issue.

The need for backtracking here stems from ambiguity with type ascription. This can be fixed by simply assuming that { ident: is always going to be a structural record, either in expression or type position (also solves the const generics issue). That is, if given { foo: Vec<u8> } in an expression context, we will attempt parsing this as a structural record, and then fail at < with a parse error (backtracking can be done inside diagnostics only if we want to). This way, we have retained LL(k) (standing at {, the : is 2 tokens away and so therefore k = 2 in this instance).

If a user wants to disambiguate in favor of ascription, they can write { (ident: Type) }. Disambiguating in this way should not be a problem as { ident: Type } is not a very useful thing to write. Specifically, in the case of PREDICATE2, { a: ..., this should not be a problem because the return type itself is a coercion site and informs type inference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it fail at <? If you're trying to parse Vec<u8> as an expression, wouldn't the < be parsed as less-than, and you only fail when you see > (and in general arbitrarily far away)?

It seems like the same issue as when we tried to get rid of turbofish...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comex Oh; yeah, you're right. It should fail at >. Still, we can take a snapshot before trying to parse the expression after ident: and backtrack if it turned out to be a type and provide good MachineApplicable diagnostics here.

It seems like the same issue as when we tried to get rid of turbofish...

Well in the sense that turbofish would also be unnecessary if we are OK with backtracking, but it seems like some people object to backtracking, so I've come up with a solution using LL(k) at the cost of not being able to recognize { binding: Type } as a block with a tail-expression ascribing binding. Because such code is, in my view, pathological, that should be fine. It's much less of an issue than turbofish from a learnability POV imo.

field_init ::= ident | field ":" expr ;
```

Note that this grammar permits `{ 0: x, 1: y }`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1595 was closed, so I'd expect this to be rejected.
This saves us from stuff like "checking that there are no gaps in the positional fields" as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably; I've left an unresolved question about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct A(u8);
let _ = A { 0: 0u32 };

is accepted, so this is currently inconsistent as it stands anyway.

Copy link
Contributor Author

@Centril Centril Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varkor yeah that was accepted to make macro writing easier (and it totally makes it easier...!)

Copy link

@LionsAd LionsAd Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrochenkov I am not sure I understand the reasoning why this is not allowed:

struct Foo {
    0: i32, // This is the only thing that fails ...
    arg_1: i32,
    named_1: i32,
}

fn main() {
    let foo = Foo { 0: 0, arg_1: 1, named_1: 42} ;
    println!("{}", foo.0);
}

fails only at the struct definition with "^ expected identifier". Would it not just be necessary to change 'IDENTIFIER' to 'IDENTIFIER_OR_NUMBER' (pseudo-code wise)?

to make it all work as everything else seems to be able to work with '.0', '0: ', etc.?

I am also not seeing the drawback given that it is already inconsistent ...

My personal motivation with that is to mirror named function arguments in structs as they then perfectly match the:

0: arg_0, 1: arg_1, named_1: arg_3

pattern. Also getting that (0: as allowed identifier in structural records) in independent of structural records is one less thing that structural records need to handle.

To be precise: I am only suggesting to allow numbers as identifiers, nothing else - so there is still plenty of distinction from Tuple to Struct, but they can be 1:1 mapped, but as for every other identifier you need to be explicit about the 0, 1, ...:

Tuple (i32, i32) ~= Struct { 0: i32, 1: i32 }

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 2, 2018

I think the language should be frozen for large additions like this in general, for a couple of years at least.
This is not a crucial feature and not a completion of another crucial feature, motivation seems really insufficient for a change of this scale.

@scottmcm
Copy link
Member

scottmcm commented Nov 3, 2018

The trait auto-implementation magic worries me, mostly from the point of view of things not the compiler wanting to implement traits on things. Today crates can do a macro-based implementation for the important tuples, and I can easily see a path to variadic generics where the impls could easily support any tuple. But I can imagine serde really wanting to be able to support these, so I wouldn't want to accept them without some sort of plan -- especially as we're getting closer to finally having good stories for our two other structural type families.

I'd also be tempted to block this on changing struct literal syntax to using = -- I really don't like the lookahead requirement here.

Overall, I think I feel that this is cool, but not necessary.

@clarfonthey
Copy link
Contributor

I'm very concerned with properties of the struct explicitly relying on lexicographic ordering. I don't think that Ord or PartialOrd should be implemented for structural records-- rather, they should be treated as having unordered fields. If someone wishes to enforce Ord or PartialOrd, I feel that they should be required to either name the struct or use a tuple.

I think that defining lexicographic behaviour for Hash and Debug, however, is completely fine. Hash really only needs to uphold that x == y => hash(x) == hash(y), and any ordering (as long as it's consistent) will satisfy this. Additionally, lexicographic ordering makes the most sense for debugging.

But as a user, I would be surprised to find the below code fail:

let first = { foo: 1, bar: 2 };
let second = { foo: 2, bar: 1};
assert!(first < second);

@Centril
Copy link
Contributor Author

Centril commented Nov 3, 2018

@clarcharr

I've added an unresolved question to be resolved prior to merging (if we do that...) about whether (Partial)Ord should be provided or not. I'm certainly open to your suggestion.

@scottmcm

I'd also be tempted to block this on changing struct literal syntax to using = -- I really don't like the lookahead requirement here.

I would have loved if we had used = instead... but blocking on that seems like blocking on a thing that no one will agree to changing because it would cause churn in more or less every single Rust program/library ever written... ;)

The trait auto-implementation magic worries me, mostly from the point of view of things not the compiler wanting to implement traits on things.

Sure; I entirely agree with that sentiment; it is by far the biggest drawback.

Today crates can do a macro-based implementation for the important tuples, and I can easily see a path to variadic generics where the impls could easily support any tuple. But I can imagine serde really wanting to be able to support these, so I wouldn't want to accept them without some sort of plan -- especially as we're getting closer to finally having good stories for our two other structural type families.

I noted somewhere in the RFC that with the combination of const generics and variadic generics, you may be able to extend this to structural records as well. For example (please note that this is 100% a sketch), using a temporary syntax due to Yato:

impl
    <*(T: serde::Serialize, const F: meta::Field)>
    // quantify a list of pairs with:
    // - a) type variables T all bound by `serde::Serialize`,
    // - b) a const generic variables F standing in for the field of type meta::Field
    //       where `meta` is the crate we reserved and `Field` is a compile time
    //       reflection / polymorphism mechanism for fields.
    serde::Serialize
for
   { *(F: T) }
   // The structural record type; (Yes, you need to extend the type grammar *somehow*..)
{
    // logic...
}

This idea is inspired by Glasgow Haskell's Symbol.

@rfcbot rfcbot closed this Apr 10, 2021
@Stargateur
Copy link

Ah shame, this would be a nice to have for readability in single-use types, per rfc

the amount of work, and problem this RFC come with is not worth the price.

@jtmoon79
Copy link

jtmoon79 commented Mar 25, 2022

For anyone that would still like to use Structural Records, as of this writing there are two crates to try:


Full Disclosure: I haven't tried either. Just suggestions for anyone that ends up here.

@crumblingstatue
Copy link

crumblingstatue commented Sep 15, 2022

Use case: Named arguments for Fn*

mut drawfn: impl FnMut(&mut Vec<Vertex>, f32, f32, &[u8], usize, Color)

This would be clearer as

mut drawfn: impl FnMut({
        vertex_buffer: &mut Vec<Vertex>,
        x: f32,
        y: f32,
        data: &[u8],
        idx: usize,
        color: Color
})

Rust doesn't have direct support for named arguments in Fn*, but this feature makes it very convenient to emulate them, without having to create distinct types (that the user has to click on in documentation to see the arguments, have to import to use, etc.).

@Stargateur
Copy link

Stargateur commented Sep 15, 2022

that the user has to click on in documentation to see the arguments

You would still need to look doc to see name.

have to import to use

not important

etc.

?

without having to create distinct types

I really advice to think about it

struct Draw<'a, 'b> {
  vertex_buffer: &'a mut Vec<Vertex>,
  x: f32,
  y: f32,
  data: &'b [u8],
  idx: usize,
  color: Color,
}

This may seem verbose but that force you to think twice about your api, if this struct look ugly for you this probably mean you could use some structure to better group logical variable (maybe a tuple for (x, y) ?). This also mean your function should reduce the number of argument it's take, these contrains will force you to improve your api.

@crumblingstatue
Copy link

crumblingstatue commented Sep 16, 2022

You would still need to look doc to see name.

Are you taking about the argument names?
Yes, you have to look at the function documentation to see what arguments it takes. But you don't have to leave the page of the function to go to a struct documentation page, just to see what arguments it takes.

not important

That's your opinion. In my opinion ergonomics are important.

This may seem verbose but that force you to think twice about your api, if this struct look ugly for you this probably mean you could use some structure to better group logical variable. This also mean your function could reduce the number of argument it's take, that together will improve your api.

I think my arguments are just fine, thank you. I just want names for them, like you can have for regular functions.
Imagine the documentation nightmare if regular function parameters didn't have names.

@Stargateur
Copy link

Stargateur commented Sep 16, 2022

I just want names for them, like you can have for regular functions.

Oh sorry I miss the fact your case is about FnMut Trait. Generally if you use Fn family trait I would advice to keep it very simple, they are mostly for closure that should have 1 or 2 argument max. Thus I maintain my point but I will add that you may use a trait like:

trait Draw {
    fn draw(
        &mut self,
        vertex_buffer: &mut Vec<Vertex>,
        x: f32,
        y: f32,
        data: &[u8],
        idx: usize,
        color: Color,
    );
}

// this allow auto closure impl and function
impl<T> Draw for T
where
    T: FnMut(&mut Vec<Vertex>, f32, f32, &[u8], usize, Color),
{
    fn draw(
        &mut self,
        vertex_buffer: &mut Vec<Vertex>,
        x: f32,
        y: f32,
        data: &[u8],
        idx: usize,
        color: Color,
    ) {
        self(vertex_buffer, x, y, data, idx, color)
    }
}

This would result in mut drawfn: impl Draw, I'm not sure if that would fill your need but you may like this approach.

@crumblingstatue
Copy link

Generally if you use Fn family trait I would advice to keep it very simple, they are mostly for closure that should have 1 or 2 argument max.

Why this arbitrary restriction? Should regular functions also have 1 or 2 arguments max? If not, what makes closures so different?

but I will add that you may use a trait like

I have no idea what I'm looking at honestly.
It sounds more convoluted than simply taking arguments in a closure. With which the only problem is the lack of naming of arguments, which this feature could help with.
I close my case, as I don't want to derail this thread any further. I presented a potential use case for structural records, everyone can decide for themselves whether it's a good use case or not.

@Stargateur
Copy link

Stargateur commented Sep 16, 2022

Why this arbitrary restriction? Should regular functions also have 1 or 2 arguments max? If not, what makes closures so different?

I think yes, function should have max 2, and on few case 3 arguments. If you look at std I think there are few 3 arguments function and probably very few that have more than 3 (exclude self). If a function require a lot of argument it's become difficult to remember argument order and you can easily make mistake, cause, no named argument. (That also why I think function should not have argument of same type for +3 argument count). That also why I disadvise tuple in public api that have +3 elements.

While named argument is a feature very close to structural record we both lack of them in Rust. I argue that it's not a bad thing cause it's force dev to make greater api. With these features a dev could easily choice the easy way to have function with a lot of argument. Most python lib I used are like that, and honestly I don't like it at all, I already used function that have 10 arguments, it's not something I would like in Rust crate.

Rust is build around structural programing, define struct, define trait is rusty. I think Rust lang is doing dev a favor to force them to be verbose, to force them to use structure or trait that structure the code, force you to have strong typing. I think having naming function argument or naming structure record would make more evil than good.

If I would be a user and I see this:

mut drawfn: impl FnMut({
        vertex_buffer: &mut Vec<Vertex>,
        x: f32,
        y: f32,
        data: &[u8],
        idx: usize,
        color: Color
})

I would clearly not like it, there is too many argument, I would probably need to make a function cause a closure would be too big for that, also there would be no direct documentation. My trait solution would solve most of these problems and would allow you a very nice documentation (bonus rust-analyzer would make it very fast to impl it) but I think you could also refine your api, surely data and idx could be linked, and vertex_buffer x y look also linked.

/// write something useful
struct VertexBuf<'a> {
    buf: &'a mut Vec<Vertex>,
    x: f32,
    y: f32,
}

/// write something useful
struct Data<'a> {
    data: &'a [u8],
    i: usize,
}

trait Draw {
    /// write something useful
    fn draw(&mut self, vertex_buffer: VertexBuf, data: Data, color: Color);
}

fn some_function(mut drawfn: impl FnMut(VertexBuf, Data, Color)) {}
// or using trait
fn some_function(mut draw: impl Draw) {}

I close my case, as I don't want to derail this thread any further. I presented a potential use case for structural records, everyone can decide for themselves whether it's a good use case or not.

I think we are on topic, sorry if I started an off topic conversation.

@MatrixDev
Copy link

I would clearly not like it, there is too many argument

I agree with many arguments in a function is a very bad practice. But forcing user to declare a struct that might be used only in one function is also not an ideal solution.

I think Rust lang is doing dev a favor to force them to be verbose, to force them to use structure or trait that structure the code, force you to have strong typing.

Sometimes arguments are added to a function one-by-one as part of some small commit here and there. In this case, especially on a bigger project, it is very rare what code will be refactored (due to time constraints or other reasons) just to make API look correct. It is probably better to allow at least some way of naming argument in-place to increase overall code readability. Maybe I'm the unlucky one here but on most projects with 30+ people I was on this is the reality.

IMHO named arguments for a function will remove most requirements for the structural records.

@crumblingstatue
Copy link

crumblingstatue commented Sep 16, 2022

I think yes, function should have max 2, and on few case 3 arguments. If you look at std I think there are few 3 arguments function and probably very few that have more than 3 (exclude self).

Counter example: rustc codebase
It has many functions that take a lot of arguments. Here is just 3 picked at random:

The standard library APIs have a fairly simple job of exposing OS APIs and some core data structures.
They don't have to manage complex application state, so they are a bad example.

The rustc developers have better things to do than try to play type tetris to cram everything into structs so everything meets the arbitrary requirement of at most 3 arguments per function.

@Stargateur
Copy link

@crumblingstatue I don't see how that prove anything or change anything to my point, never I said the quality of rustc implementation was great. Your example only reinforce my opinion that such function is a nightmare to maintain. Also, these functions are mostly call once and are totally different that the use case you bring up and about public API design I try to explain since 3 messages.

I have the feeling you don't understand my points. I will so stop arguing more about this further since this go nowhere with the last two messages.

@dnut
Copy link

dnut commented Mar 20, 2023

@pnkfelix:
a feature we should have in the long term...
...not something we can justify investing project time on in the foreseeable future.

This is an unusual reason to close an RFC in an open source project. "This is a feature that we should have" warrants a formally positive response, even if it is too costly for a particular set of contributors to justify implementing at this time. My instinct would be to keep the RFC open, but allocate no resources to it. If a perfect implementation falls in our lap, we'd be happy to accept it. Just don't take time away from other work to implement it.

Rust is open source. If prior non-contributors really want to see this feature happen, they might decide to work on it themselves, despite never previously having interest to work on rust. After working on it, they might become regular contributors. But without leaving this window open, they might never join the project.

There is a feedback mechanism between the resources available to a project and the contributions the project is willing to accept. Clearly communicating an openness to improvements, even if most of the current contributors think the improvements are not worth their effort, could increase the amount of resources available to a project, compensating for any incurred costs.

@digama0
Copy link
Contributor

digama0 commented Mar 21, 2023

@pnkfelix:
a feature we should have in the long term...
...not something we can justify investing project time on in the foreseeable future.

This is an unusual reason to close an RFC in an open source project. "This is a feature that we should have" warrants a formally positive response, even if it is too costly for a particular set of contributors to justify implementing at this time. My instinct would be to keep the RFC open, but allocate no resources to it. If a perfect implementation falls in our lap, we'd be happy to accept it. Just don't take time away from other work to implement it.

At first I agreed with this point, usually such RFC's are closed as postponed rather than closed. But on reading the history of this issue, I think it is an appropriate choice.

The point is that this is not a change that can be accomplished by an RFC, this needs something closer to a lang team initiative or a working group to get done, because it has far reaching effects on all other parts of the language. And there is not enough bandwidth on the team to consider that in the foreseeable future, and by the time it comes around a new "v2" RFC will be needed anyway, even though it may draw heavily on aspects of this one. A contributor dropping an implementation on the team's lap wouldn't solve all the other issues around interaction with other features (present and future) which are part of the current vision for the language.

I think the best way to a motivated contributor to push this forward would be to drum up support for a working group about it.

@scottmcm
Copy link
Member

To extend @digama0's point here, I don't think this is a case where, necessarily,

If a perfect implementation falls in our lap, we'd be happy to accept it.

Adding a new fundamental kind of type is a pervasive impact. It's not just a lang question, it's now got type-team impact, opsem-team impact, miri impact, libs-api impact, proc macro impact, different-backends impact, etc.

I pretty much feel the same now as I did two years ago (#2584 (comment)): the way forward is not to try to make the whole new type kind right away, but to look at the motivation and attempt to address some of those pieces in a lower-impact way. What parts of the goodness of this can be done in a way that they can mostly reuse existing inference, existing MIR, existing derives, and the existing type system?

What makes it hard to define a type? What makes it hard to use a type? Can we introduce shorthands for common cases? Could we allow them in more places? Could this mostly be done with Voldemort types, rather than structural ones? Etc.

(Just speaking for me, not stating a lang team position here.)

@wongjiahau
Copy link

wongjiahau commented Mar 21, 2023

TLDR;

The syntax of strutural and nominal record access should differ so that most changes occur only at the compiler frontend.

Longer version

As far as I know, structural records and tuples carry the same semantics, so why can't we implement structural record as a syntactic sugar of tuples? Doing so requires minimal changes to the backend, i.e. type checking, Miri, code generation etc.

For this to work, the syntax of structural records access has to be different from that of nominal records (the current struct system), so we don't have to incorporate extra mechanisms into the type-checker to disambiguate structural record access from nominal record access, which is probably the main reason why some think this proposal is pervasive.

For the sake of explanation, let's pretend the syntax of structural record access looks like this:

expression '#' identifier

For example:

myObject#hello

Other syntactical aspects of structural records, such as type annotation, construction, and destructuring should already be distinguishable from nominal records, thus they require no syntax changes.

As a proof of concept:

  1. During parsing, desugars structural records construction into tuples construction, for example: { x: 1, y: 2 } desugars into (usize, usize). The value should be sorted by their corresponding key lexicographically.
  2. During type checking, desugars structural record access, for example, expr#x desugars into (expr.0).
    2.1. This translation works because the type of expr must be known before any structural record access can be applied.
  3. Structural record destructuring works the same as 2.
  4. For diagnostics purposes, the intermediate representation of a tuple needs to carry additional information, namely structural_record_keymap: Option<StruturalRecordKeymap>.

In conclusion, as long as we keep structural and nominal record access syntax distinct, no pervasive changes are needed.

@digama0
Copy link
Contributor

digama0 commented Mar 21, 2023

  1. During parsing, desugars structural records construction into tuples construction, for example: { x: 1, y: 2 } desugars into (usize, usize). The value should be sorted by their corresponding key lexicographically.

  2. During type checking, desugars structural record access, for example, expr#x desugars into (expr.0).
    2.1. This translation works because the type of expr must be known before any structural record access can be applied.

If {x: usize, y: usize} is "mere sugar" for (usize, usize), which is to say they are the same type according to the type checker, then {x: usize, y: usize} and {y: usize, z: usize} are the same type. So how would you "desugar" expr#y if expr has the type (usize, usize)? For a more concrete example:

let a = if cond() {
  let b: {x: usize, y: usize} = (0, 1);
  b
} else {
  let c: {y: usize, z: usize} = (2, 3);
  c
};
a#y

Does the last line desugar to a.0 or a.1?

I don't think the expr#x syntax can be sensibly implemented unless structural record types are part of the type system, distinct from tuples (so that we don't lose track of the name -> index mapping). And then we have the indicated challenges wrt type inference, impl coherence, and layout / MIR changes.

@wongjiahau
Copy link

wongjiahau commented Mar 21, 2023

So how would you "desugar" expr#y if expr has the type (usize, usize)? For a more concrete example:

Sorry, I miscommunicated earlier. For Step 2.1, the desugars only happen after type-checking is completed without error. In other words, the type checker is aware of the existence of structural record type.

let a = if cond() {
  let b: {x: usize, y: usize} = (0, 1);
  b
} else {
  let c: {y: usize, z: usize} = (2, 3);
  c
};
a#y

Does the last line desugar to a.0 or a.1?

This question is invalid because the type of a is unknown, as {x: usize, y: usize} cannot be unified with {y: usize, z: usize}, at least not without structural subtyping, which is coherent with the semantics of tuple as of now, as tuple does not support subtying, for example (usize, usize) cannot be unified with (usize, usize, usize).

So again, most of the changes are on frontend, namely parsing and type checking (or type inference).
Everything beyond type checking is the same, except that the Tuple node has to carry an optional name-index mapping.

@digama0
Copy link
Contributor

digama0 commented Mar 21, 2023

If it's a separate type, then this isn't a desugaring at all, as it has to participate in the whole frontend and middle end. The MIR uses the same type system as the type checker (except for erasing regions after borrowck). At that point { x: usize, y: usize } is no more similar to (usize, usize) than struct Foo { x: usize, y: usize } is.

Another issue that arises: Is {x: usize, y: usize} distinct from (usize, usize) for impl coherence purposes?

  • I believe "no" is unsound, because you can use impl coherence to force the type system to think two generic types are equal, and thereby construct a variation on the example I gave earlier, by proving that {x: usize, y: usize} == {y: usize, z: usize} and then projecting e#y with surprising results.
  • "Yes" is also problematic for reasons detailed in the RFC: you can't implement traits on all structural types because there is no way you can do something similar to "macro implement for tuples of length up to N" for structural records.

@scottmcm
Copy link
Member

why can't we implement structural record as a syntactic sugar of tuples?

C# went down this road. It works reasonably well, but it's still risky. For example, this compiles https://dotnetfiddle.net/fYHBJa

	public static (int B, int A) Foo() => (1, 2);
	
	public static void Main()
	{
		(int A, int B) pair = Foo();
		Console.WriteLine(pair);
	}

and doesn't do what you'd want from a proper structural type.

So you can sortof do it, but to make it work well you have to add a ton of random little "oh, wait, we'd better lint that" stuff. Not to mention all the "and what do you get from Any?" and "why don't I get the names in serde?" things.

Thus once you want it to be preserved through closures, say, all the "kinda like a type but isn't" are maybe even worse than having it be a real type in the first place. (This reminds me of all the repr(packed) problems, where there's a bunch of code that has to pass around alignment separately from the type, because fields in packed structs are this kind of "type-like thing but not actually part of the type" problem.)

@programmerjake
Copy link
Member

programmerjake commented Mar 21, 2023

edit:
created an internals thread for discussing my desugaring idea: https://internals.rust-lang.org/t/structural-types-again-desugar-to-named-struct-in-std/18550?u=programmerjake

please discuss on internals since here is waay too cluttered

original post:
a quick idea: instead of desugaring to tuples, how about desugaring to an instance of a named struct in std:

// in std::somewhere
pub struct AnonymousStruct<Fields: std::marker::Tuple, const NAMES: &'static [&'static str]>(pub Fields);

impl<Fields: std::marker::Tuple, const NAMES: &'static [&'static str]> AnonymousStruct<Fields, NAMES> {
    /// Self::get_field_index("abc") = Ok(5) means self.abc desugars to self.0.5
    pub const fn get_field_index(name: &str) -> Result<usize, ()> {
        const { assert!(NAMES.is_sorted(), "NAMES must be sorted"); };
        NAMES.binary_search(&name).map_err(|_| ())
    }
}

pseudo-code:

fn desugar_anonymous_struct_type(fields: Vec<(Member, Type)>) -> Result<Type> {
    // sorts by field name
    let mut fields: BTreeMap<Member, Type> = fields.into_iter().collect();
    let mut tuple_fields: Punctuated<Type, _> = (0..).map_while(|i| fields.remove(&Member::Unnamed(i.into()))).collect();
    if fields.is_empty() {
        // actually a tuple, since all fields are named using successive integers
        return Ok(TupleType { paren: Paren::default(), elems: tuple_fields }.into());
    }
    if !tuple_fields.is_empty() {
        bail!("all fields must be named");
    }
    let mut keys = vec![];
    let mut types = vec![];
    for (k, v) in fields {
        let Member::Named(k) = k else {
            bail!("all fields must be named");
        };
        keys.push(LitStr::new(&k.to_string(), k.span()));
        types.push(v);
    }
    Ok(parse_quote!(::std::somewhere::AnonymousStruct<(#(#types,)*), { &[#(#keys),*] }>))
}

I would expect the code for the field access operator . to be structured such that a special case can be easily added for a lhs of AnonymousStruct since it needs to resolve the lhs type anyway in order to resolve existing types' field names.

@programmerjake
Copy link
Member

programmerjake commented Mar 21, 2023

desugaring to a named struct in std instantiated with a tuple of the sorted field types and a const generic slice of sorted field names for the anonymous struct neatly solves nearly all the implementation and how-to-impl-traits-for-all-anonymous-structs issues afaict.

@programmerjake
Copy link
Member

created an internals thread for discussing my desugaring idea: https://internals.rust-lang.org/t/structural-types-again-desugar-to-named-struct-in-std/18550?u=programmerjake

please discuss on internals since here is waay too cluttered

@the-a-man-006
Copy link

This sounds so right, guys, please implement it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-expressions Term language related proposals & ideas A-patterns Pattern matching related proposals & ideas A-product-types Product type related proposals A-structural-typing Proposals relating to structural typing. A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas closed This FCP has been closed (as opposed to postponed) finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.