Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bikeshedding the 'gen' keyword & syntax #123731

Closed
dhardy opened this issue Apr 10, 2024 · 17 comments
Closed

Bikeshedding the 'gen' keyword & syntax #123731

dhardy opened this issue Apr 10, 2024 · 17 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. F-gen_blocks `gen {}` expressions that produce `Iterator`s T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@dhardy
Copy link
Contributor

dhardy commented Apr 10, 2024

#117078 lists "unresolved" questions:

  • Whether we should use some keyword other than gen.
  • Whether we should reserve gen as a full keyword or as a contextual one.

From the RFC:

@yoshuawuyts For the FCP I started above, the gen keyword and gen { ...} blocks and async gen { ... } blocks are in-scope, while gen fn is out of scope.

So we need a keyword for "gen blocks"; gen fn might get added in the future.


Conflicts

This conflicts with rand::Rng::gen (see rust-random/rand#1435). There are quite a few hits within the rand crate. The rand_distr crate has 3 uses within documentation, 2 within tests and 14 in algorithms.

This isn't a lot of data, but hints at quite a few real uses of Rng::gen outside rand, not counting other possible uses of gen.

From the RFC

I haven't read all comments on the RFC, but several indicate that gen fn may not be a good thing.

From @Kray:

IMO this function syntax, while analogous to how async functions work, looks very surprising and non-obvious:

gen fn odd_dup(values: impl Iterator<Item = u32>) -> u32

Without inspecting the opposite end of the line, the function looks like it returns a single u32. I know async fn works the same way, but at least when you see an async fn foo() -> u32 you know it will eventually return a single u32.

Not to mention there's good arguments why async fn might have also been a mistake.

Because the "simple" case of an owned iterator doesn't require complicated type signatures unlike some Futures generated by async fn, wouldn't this be a good time to adopt this function expression syntax suggested many times in the past (including the blog post above)?

fn foo() -> impl Iterator<Item = u32> = gen { ... }

From @withoutboats:

(NOT A CONTRIBUTION)

I think there are several good arguments for replacing -> with something like yield or yields. I'm pretty much just summarizing here:

* Educational: it helps users understand that this yields many times, instead of returning once. It ties the declaration syntax to the different return keyword inside the body.

* Forward compatibility: I personally don't think general purpose coroutines should be declared with the same keyword as generators-as-Iterators, but this does leave space for that possibility in the syntax by allowing the syntax to support a separate `->` type declaration.

The only possible difficulty you need to consider is that you want to be forward compatibility with the ability to declare the type yielded by a gen { } block. This may lean toward yields; not sure if gen yield i32 could be supported by the parser or if its ambiguous.

From @emilyyyylime:

I was thinking potentially fn f() gen i32 for 'generates'

There is also discussion on usage of generator instead and discussion on gen as a contextual keyword.


Since I'm not fully read-up on the status of this RFC and its implementation, I'll just leave this here with the question of whether gen will become a keyword.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 10, 2024
@jieyouxu

This comment was marked as resolved.

@jieyouxu jieyouxu added T-lang Relevant to the language team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. F-gen_blocks `gen {}` expressions that produce `Iterator`s and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 10, 2024
@jieyouxu
Copy link
Member

Relevant Zulip thread regarding reserving the gen keyword: https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60gen.60.20keyword

@PatchMixolydic
Copy link
Contributor

PatchMixolydic commented Apr 10, 2024

As a user, I'm not too concerned about what the keyword is, as long as it's fairly succinct and suggests either generators or iterators1. gener seems a little weird, but would probably avoid most name collisions. generator seems a little too long, at least for Rust.

I'd be highly in favour of providing some sort of gen fn syntax to make it easier to use generators, both as iterators and as restricted coroutines, but I'm not entirely sure what that'd look like. In isolation, gen fn name(...) yield YieldTy { ... } seems fine to me, but that might lead to incongruencies if the language team ever stabilizes coroutines with return types other than ()2.

Footnotes

  1. Coroutines would also work, but might conflict with syntax space for general (semi)coroutines.

  2. Ignoring resume arguments, the obvious choices for general coroutine syntax would be co fn name(...) yield YieldTy -> RetTy, which is internally inconsistent, or co fn name(...) yield YieldTy return RetTy, which is inconsistent with normal fns.

@Yokinman
Copy link

Yokinman commented Apr 11, 2024

  • I don't think async fn was a mistake. The consistency among keywords that affect a function's syntax is definitely useful, at least documentation-wise (const, async, unsafe, etc.). In my opinion, gen affecting the entire function's control flow should keep consistent with keywords like async.

  • It feels natural to read -> T as, "this produces T", but it is different from the usual meaning. I think this syntax looks confusing when return ends the function without a value.

    It would also make it possible to confuse functions returning single types versus iterator types, just in terms of skimming function return types in documentation or source code. I would prefer anything else like:

    • gen fn() -> yield i32 - Could retain -> T as "produces T". e.g. coroutine fn() -> yield i32, return i32.
    • gen fn() yield i32 - Uses an existing keyword. Slightly awkward to read.
    • gen fn() yields i32 - Requires a new keyword. Reads very nicely.
    • gen fn() ^ i32 - Upwards arrow (yields to the caller, may resume later).
    • gen fn() >> i32 - Two arrows (produces multiple values).
    • gen fn() => i32 - Is -> meaningfully distinct from =>? I suppose -> is usually for types and => is usually for values.

I can't wait for generator functions, I love generator functions.

Your friend,
Yokin

@oli-obk
Copy link
Contributor

oli-obk commented Apr 11, 2024

One thing that I think is important is that we pick the same keyword for coroutines and generators, as they are very similar. Coroutines take arguments on every resume, even the first one, so they need a closure-style syntax (e.g. coro |x| {...} or gen |x| { ... }). So if gen is considered too breaking, we can instead use co {} or coro {} blocks, which both have significantly less edition breakage.

I personally am not convinced that the extra language surface of gen fn are worth it, and would like to see whether fn foo() -> impl Iterator<Item = i32> is too annoying in practice and use that as information on whether to add gen fn. It's also not clear to me whether gen fn is sugar for fn() -> impl Iterator or fn() -> impl PinnedIterator (doesn't exist yet, but we likely want it), and if we stabilize it as impl Iterator we can't take it back. This is in contrast to gen {} blocks, which can just implement more traits in the future without it being a breaking change.

@ds84182
Copy link

ds84182 commented Apr 11, 2024

Eh. gen fn |x| {...} ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯
use fn to disambiguate, or even require it.

Slight off topic: fn(x) => expr and fn(x) {} are just 2 to 4 extra characters and share similarity with match. Too late to change the syntax.

@rpjohnst
Copy link
Contributor

One thing that I think is important is that we pick the same keyword for coroutines and generators, as they are very similar.

I don't think this is something we should commit to at this point. Both generators and async functions are specializations of the more general underlying control flow mechanism, exposing different subsets of that behavior- neither expose the "resume argument" to the user and both fix the yield and return types in particular ways.

Coroutines should remain free to change arbitrarily, so long as await and yield can desugar to them, until we're happy with them as a user-facing feature (if ever).

@oli-obk
Copy link
Contributor

oli-obk commented Apr 14, 2024

Yea, that makes sense. Especially since I was reminded that gen closures could be a different thing, the equivalent to async closures

@oli-obk
Copy link
Contributor

oli-obk commented Apr 14, 2024

Throwing some options out there:

  • stick with gen, break lots of crates
  • use gener (personal opinion: it's not clear what it means, tho we'll learn)
  • impl Iterator {}, kinda long but very extensible to builtin syntax for other traits (impl Generator, impl Try), conflicts with inherent impls
  • iter/iterator, breaks too much code
  • something!, make it a builtin macro, so local definitions and imports override it
  • generator, wordy, but explicit. We don't need to stick with the "keywords must be short" rule and it likely has no problematic breakage
  • use an attribute: #[gen] {}, a bit odd, but very extensible
  • ask T-lang to revisit gen becoming a contextual keyword
  • yields {}, may be too confusing with yield
  • prefix with some existing keyword (move gen {} could signal the difference to static gen {} which would not create iterator impls, but only some so-far-non-existent PinnedIterator trait). Not really necessary (we can just infer which traits to implement for gen blocks, though this is another nail in the coffin of gen fn). Also it's very confusing as move gen is different from gen move (move all variables instead of letting the generator be movable) and implies the existence of move gen move.

@tmandry
Copy link
Member

tmandry commented Apr 16, 2024

Right now I'm -1 on any syntax that uses -> T to mean "yields type T"; I think using yield or yields is much better. I'm also okay with deferring the question and using -> impl Iterator<Item = T> for now; I think that's how the RFC ended up.

  • yields {}, may be too confusing with yield

If we reserve yields it could be reused for the above purpose. I guess explicit type annotation for generator blocks would look like this then: yields T { ... }

  • use gener (personal opinion: it's not clear what it means, tho we'll learn)

Seems no worse than gen in that regard; maybe slightly clearer.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 17, 2024

An insight on the rand edition breakage:

Gen::gen is never implemented by users. So it would be fine to just deprecate it on all editions and add a generate method instead. This is a minor version bump change only.

Considering we have two weeks left until the first 2024 edition deadline, I would say we go with gen as T-lang has already decided on the RFC and resolve the gen keyword bikeshed. I am getting the feeling that the general mood is for gen as a keyword, we're just unhappy to different degrees with the side effects that decision gives us.

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Apr 17, 2024

Apologies to @oli-obk - it was just yesterday that I said I wouldn't post in this thread and would prefer to write a blog instead to explore the implications of the yields keyword. But I'm finding myself short on time this week, so here's my longer-form thinking on the choice of keywords for generator blocks and closures.

Gen keyword

With generator blocks and closures there is a distinction between the logical return type and the logical yield type. While Iterator cannot express any return type other than (), we have to anticipate that at least syntactically generator blocks and closures will want to support distinguishing between the type yielded and the type returned. Using the bare gen {} syntax currently being proposed this would result in the following notations for generator blocks and closures.

// block, inferred
let iter = gen { yield 12; }

// block, yield type annotated
let iter: impl Iterator<Item = u32> gen { yield 12; }

// closure, inferred
let iter = gen || { yield 12; }

// closure, yield type annotated
let iter: impl Iterator<Item = u32> = gen || { yield 12; }

// closure, return type annotated
let iter = gen || -> () { yield 12; }

We're not yet making a decision on generator functions, though they are available on unstable. But it's worth pointing out that much of the same rationale I've listed here applies to those too, but unlike generator blocks and closures, generator functions must always spell out their yield type - they cannot just rely on inference the way blocks and closures do.

If we were to use -> T to mean "yields type T", that would make it impossible for generators notation to express a different return type, which as an implication means that generalized coroutines would need to use a different keyword with different semantics. This directly means that we'd need to be comfortable having e.g. both a gen || -> T {} and coro || -> T {} where -> T means different things. I believe that would be bad.

Yields keyword

On the RFC PR I floated the idea of using yields as the notation for bare generator blocks and closures. I wasn't entirely sure of that idea at the time, but I believe it has merit spelling out the example in full, so let's do that here:

// block, inferred
let iter = yields { yield 12; }

// block, yield type annotated
let iter = yields u32 { yield 12; }

// closure, inferred
let iter = yields || { yield 12; }

// closure, yield type annotated
// NOTE: must be unambiguous with `|| yields {}`
let iter = yields u32 || { yield 12; }

// closure, return type annotated
// NOTE: must be unambiguous with `|| yields {}`
let iter = yields || -> () { yield 12; }

// closure, yields + return type annotated
// NOTE: must be unambiguous with `|| yields {}`
let iter = yields u32 || -> () { yield 12; }

As you can see here, the fact that we're supporting both generator blocks and generator functions ends up leading to syntactic ambiguity. If we were to adopt || yields {} as the notation, it would be unclear whether this was a function returning an generator, or a generator function. These are distinct concepts, and I believe it's worth keeping these separate.

To resolve this ambiguity we'd need to put the yields keyword before the || argument list, which seems a little strange. In effect we're declaring a function output in a position where normally we'd describe function inputs. If we were to consider the syntax for generator functions, it's unlikely we'd want to write: fn yields u32 () {} or similar. We almost certainly would prefer any form of yields to come after the () argument list.

Gen + Yields

One way to still reap the benefits of a yields notation without the parsing ambiguity would be for generator blocks and closures to always require gen to signal it's a generator, but permit yields as an explicit notation after the argument list to explicitly declare the yield type.

// block, inferred
let iter = gen { yield 12; }

// block, yield type annotated
let iter = gen yields u32 { yield 12; }

// closure, inferred
let iter = gen || { yield 12; }

// closure, yield type annotated
let iter = gen || yields u32 { yield 12; }

// closure, return type annotated
let iter = gen || -> () { yield 12; }

// closure, yields + return type annotated
let iter = gen || yields u32 -> () { yield 12; } // either…
let iter = gen || -> () yields u32 { yield 12; } // …or

While a little more verbose than the bare yields example, I believe this makes up for it by being unambiguous. It does introduce a small wart in the sense that generator functions don't require being annotated, so there may be a loss of parity between the two notations.

// Having a `gen fn` notation here would be
// unlikely to carry its weight here since it
// does not resolve any parser ambiguity.
gen fn iter() yields u32 -> () {} // either…
gen fn iter() -> () yields u32 {} // …or

// The `yields` keyword on its own here is enough
// to indicate this is a generator function.
fn iter() yields u32 -> () {} // either…
fn iter() -> () yields u32 {} // …or

Having gone through this exercise, I believe that of the three options this option carries the least worst tradeoffs.

Conclusion

I think gen {} as a keyword here is fine, as long as we eventually pair it up with some form of yields notation to be able to annotate the yield type. I mostly wanted to make sure it was spelled out why yields as a keyword is necessary, and why introducing it without an additional prefix marker for closures and blocks would lead to ambiguity in the parser.

Basically that's a long way to say that I find myself agreeing with what @tmandry expressed earlier.

@Yokinman
Copy link

Yokinman commented Apr 17, 2024

I could be wrong, but I don't think the syntax can be gen || 🔹 Y -> R since that would be ambiguous with function types.

gen || yields fn() -> i32 {}
gen || yields impl Fn() -> i32 {}

On the other hand, the syntax gen || -> R 🔹 Y is probably unambiguous.

gen || -> fn() yields i32 {}
gen || -> impl Fn() yields i32 {}

I don't really like this second one though, cause in my head it reads "returns the type R which yields the type Y" like some kind of syntax sugar for function pointers (fn() yields i32 meaning fn() -> impl Iterator<Item=i32>). It's also arguably backwards, reading the yield type after the return type.

I think the syntax gen || -> 🔹 Y 🔸 R would solve this, like gen || -> 🔹 Y, return R or gen || -> 🔹 Y; R.

gen || -> yield fn(), return i32 {}
gen || -> yield impl Fn(), return i32 {}
gen || -> yield fn(); i32 {}
gen || -> yield impl Fn(); i32 {}

The first feels a bit more readable, refactorable, extensible. The second flows better as a "mini function body" kind of syntax. Similar to fn() -> i32 { 1 }, you would have the corresponding fn() -> yield i32; i32 { yield 1; 2 }.

@tmandry
Copy link
Member

tmandry commented Apr 18, 2024

// Having a `gen fn` notation here would be
// unlikely to carry its weight here since it
// does not resolve any parser ambiguity.
gen fn iter() yields u32 -> () {} // either…
gen fn iter() -> () yields u32 {} // …or

// The `yields` keyword on its own here is enough
// to indicate this is a generator function.
fn iter() yields u32 -> () {} // either…
fn iter() -> () yields u32 {} // …or

This is the only part of your comment I disagree with @yoshuawuyts. I think gen fn carries its weight in the same way async fn does, by signaling a transformation to the function signature and body. Specifically, it signals that the return type of the function is something like impl Iterator<Item = u32> rather than the written type, and none of the code inside the body is run until iteration begins.

There's also the question of consistency with async closures, blocks, and functions.

For completeness though, if we wanted the "minimal syntactic noise" option I guess we could use gen only to introduce generator blocks, and only the yields keyword for everything else. I don't actually think this is a good idea, for all of the reasons above.

// Generator block
let iter = gen { yield 12; }
// Generator closure
let iter = || yields u32 { yield 12; }
// Closure which returns a generator block
let iter = || gen { yield 12; }
let iter = || gen yields i32 { yield 12; }  // (with explicit yield type)
// Generator function
fn iter() yields u32 { yield 12; }

I could be wrong, but I don't think the syntax can be gen || 🔹 Y -> R since that would be ambiguous with function types.

@Yokinman We can resolve such ambiguities, which I would expect to be rare, by requiring parentheses to disambiguate.

gen || yields (fn()) -> i32 {}
gen || yields (impl Fn() -> i32) {}

In fact, we may want to use yields with GenFn and perhaps other traits, which would make the gen || -> R 🔹 Y syntax you raised equally ambiguous.

gen || yields (impl GenFn() yields i32) -> i32 {}

Admittedly, all of this discussion is getting outside the scope of the original issue. I don't think we have to reserve yields today since as long as it always follows gen/fn/||, it works as a contextual keyword.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 19, 2024

Specifically, it signals that the return type of the function is something like impl Iterator<Item = u32> rather than the written type, and none of the code inside the body is run until iteration begins.

until we know what traits it's supposed to implement (e.g. hypothetical PinnedIterator if there are borrows across yields), we probably shouldn't go down this route. And even then, we basically always have to make it return PinnedIterator, otherwise a seemingly small body change can be a breaking change, because suddenly an Iterator impl disappears.

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Apr 19, 2024

I think gen fn carries its weight in the same way async fn does, by signaling a transformation to the function signature and body. Specifically, it signals that the return type of the function is something like impl Iterator<Item = u32> rather than the written type, and none of the code inside the body is run until iteration begins. There's also the question of consistency with async closures, blocks, and functions.

@tmandry Oh I absolutely agree about the question about consistency. I see that as the main thing we'd be trading off. I also agree that we need some identifier in the function signature to signal: "Hey this function here, this is a generator function". We could certainly allow people to write the following:

gen fn meow() {}           // This signature…
gen fn meow() yields {}    // …could be a shorthand for this signature…
gen fn meow() yields () {} // …could be a shorthand for this signature.

I was thinking that if we disallowed that first case (gen fn without yields), the presence of yields would be enough to signal that we're dealing with a generator function. At which point, at least from a parser perspective, also having a gen keyword present would not be required. Not sure if you were seeing something different here?

@oli-obk
Copy link
Contributor

oli-obk commented Apr 29, 2024

as per #123731 (comment) I'm gonna close this as resolved now. We're going with gen

@oli-obk oli-obk closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. F-gen_blocks `gen {}` expressions that produce `Iterator`s T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants