Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gen blocks and reserve keyword in Rust 2024 #3513

Merged
merged 45 commits into from
Apr 7, 2024

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Oct 13, 2023

Rendered

Tracking issue: rust-lang/rust#117078

This RFC reserves the gen keyword in the Rust 2024 edition for generators and adds gen { .. } blocks to the language. Similar to how async blocks produce values that can be awaited with .await, gen blocks produce values that can be iterated over with for.

Writing iterators manually can be painful. Many iterators can be written by chaining together iterator combinators, but some need to be written with a manual implementation of Iterator. This can push people to avoid iterators and do worse things such as writing loops that eagerly store values to mutable state. With gen blocks, we can now write a simple for loop and still get a lazy iterator of values. E.g.:

// This example uses `gen` blocks, introduced in this RFC.
fn rl_encode<I: IntoIterator<Item = u8>>(
    xs: I,
) -> impl Iterator<Item = u8> {
    gen {
        let mut xs = xs.into_iter();
        let (Some(mut cur), mut n) = (xs.next(), 0) else { return };
        for x in xs {
            if x == cur && n < u8::MAX {
                n += 1;
            } else {
                yield n; yield cur;
                (cur, n) = (x, 0);
            }
        }
        yield n; yield cur;
    }.into_iter()
}

text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
@clarfonthey
Copy link
Contributor

One thing worth mentioning that I mentioned in the Generator tracking issue: unlike for iterators, generators naturally have a way of indicating that they're infinite by returning !, and I wonder if we could somehow retain this for gen syntax.

I personally think it would be a good idea to not limit this syntax to just iterators, instead allowing arbitrary generators, but special-casing to iterators when the return value is () or !.

Copy link
Contributor

@estebank estebank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to mention iterator_item which is based on propane and accepts "flexible" syntax. It allows anyone using it to mix and match with the alternatives already mentioned in this document, and it also has some care taken to making the diagnostics make sense, even without additional compiler support, so rustc could provide an even better experience.

text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
text/0000-gen-fn.md Outdated Show resolved Hide resolved
@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 13, 2023
@withoutboats
Copy link
Contributor

(NOT A CONTRIBUTION)

I'm very enthusiastic about adding something like this to the language, and there's nothing in this RFC I don't like. I don't have a strong opinion about what keyword is chosen for this feature.

In my own head, I think of the general purpose mechanism (called generators in rustc) as coroutines and use the term "generator" just for functions that return iterators. In other words generator is to Iterator as async is to Future.

I also want to highlight a quote from the original proposal to use external Iterators as the definition of Iterators in Rust, way back in 2013:

In the future, Rust can have generators using a yield statement like C#, compiling down to a fast state machine without requiring context switches, virtual functions or even closures. This would eliminate the difficulty of coding recursive traversals by-hand with external iterators.

(https://web.archive.org/web/20140716172928/https://mail.mozilla.org/pipermail/rust-dev/2013-June/004599.html)

The way it turned out, Rust adopted this sort of thing for futures long before it adopted it for iterators. But an RFC like this was always the plan!

There were some places with erroneous or unclear punctuation and
capitalization.  Let's fix those and make some related typographic
and linguistic improvements.
@rpjohnst
Copy link

Based on a DM with boats and inspired by the way borrowck handles non-self-referential coroutines, there does seem to be a way to express leasing coroutines without typestate, though it requires its own extension to the type system.

In a movable coroutine, borrowck treats yield as invalidating local variables. This works, and ensures the coroutine is not self-referential, because borrows of those locals have an inferred lifetime that incorporates those variable's places.

In a leasing coroutine, yield should invalidate anything the coroutine is leasing. Region variables can't really express this, for the same reason you can't write a function fn foo(&'? mut self, field_of_self: &'? T)- there is no shared place for these lifetimes to incorporate, and there is no syntax for this kind of lifetime.

If we had a way to quantify over places, we could incorporate those places into a single type to give the step parameter. For example (wildly inventing syntax):

for<place waker, place context> |cx: &'{context} mut Context<'{waker}>| {
    .. cx = yield ..
}

To get a leasing coroutine, something also needs to denote the effect of yield on these places. The maximally expressive approach might use a syntax like coroutine |..| yield(waker, context) { .. } that evokes moving those places.

This all gives unnecessarily complicated types, and presumably nobody wants to write this all down for their custom coroutines. Instead, we could leave this all out of the surface syntax, and introduce a new kind of lifetime elision specifically for coroutine step parameters: quantify over a place for every implicit lifetime, and infer which ones yield should invalidate from on the coroutine body.

The resulting implementation of the Coroutine trait would then satisfy bounds like for<'a, 'b: 'a> Coroutine<&'a mut Context<'b>>, without the restriction mentioned in rust-lang/rust#68923 (comment) that each yield produce a distinct type. Instead yield would appear (from the coroutine's perspective) to invalidate and then re-acquire any leases.

Notably I don't think any of this is really tied to the syntax that treats the step argument as the result of yield. The important thing is that yield invalidates these places; the choice of syntax for re-acquiring the borrow afterward is orthogonal. (So of course I would prefer one that relieves the user of juggling multiple step arguments, and leaves the result of yield free for resume arguments.)

We had already opened a tracking issue for this work, so let's fill
that in here.
In addition to giving the file the correct number, let's call this
`gen-blocks` rather than `gen-fn` since we removed `gen fn` from the
main body of this RFC.
The feature name in the draft was a placeholder.  Let's update this to
the actual feature name now in use.
We had a mix between hard wrapped lines of various widths and
unwrapped lines.  Let's unwrap all lines.
The main body of the RFC discusses how we might implement
`FusedIterator` for the iterators produced by `gen` blocks, but this
was not listed as a future possibility.  Let's do that.
There was a statement in the draft about, as a downside, something
needing to be pinned for the entire iteration rather than just for
each call to `next`.  But, of course, under the pinning guarantees,
these are equivalent.  Once something is pinned, unless it is `Unpin`,
it must be treated as pinned until it is destructed.  Let's remove
this statement.
In RFC 3101 we reserved in Rust 2021 prefixed identifiers such as
`prefix#ident`.  For this reason, we can make `gen` blocks available
in Rust 2021 using `k#gen` as was anticipated in the (currently
pending) RFC 3098.

It's less clear what to do about Rust 2015 and Rust 2018, however, so
let's mark this as an open question.

(Thanks to tmandry for raising this point.)
We had been meaning to do some final copyediting prior to this RFC
being merged, so let's do that.  In addition to making the text a bit
more regular and precise, fixing some minor errors, removing outdated
information, and adding references between sections, we've tried to
"tighten it up" a bit where possible.  We've been careful to not
change anything of semantic significance or otherwise of significance
to the consensus.
The Koka language provides an interesting alternative data point for
how generators and other powerful control flow constructs could work
in a typed language such as Rust.  Let's include an example in the
prior art section.

(Thanks to zesterer for asking for this.)
Using the no-op `Waker`, we can express generators and coroutines in
Rust.  Let's close our list of prior art examples with that.
Under this RFC, it's possible to yield one last value concisely with
`return yield EXPR`.  Let's make a note of that.

(Thanks to Nemo157 for pointing this out and to pnkfelix for
suggesting that this be noted in the RFC.)
The motivating example we had given for `gen` blocks admitted too easy
an implementation with existing stable iterator combinators.  Let's
make the example more *motivating* by showing a simple algorithm,
run-length encoding, that's more difficult to implement in other ways.

(Thanks to Ralf Jung for pointing out the need for a better example.)
@traviscross traviscross changed the title Reserve gen keyword in 2024 edition for Iterator generators Add gen blocks and reserve gen keyword in Rust 2024 Mar 30, 2024
@traviscross traviscross changed the title Add gen blocks and reserve gen keyword in Rust 2024 Add gen blocks and reserve keyword in Rust 2024 Mar 30, 2024
Copy link

@dead-claudia dead-claudia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving an informative about how Kotlin did coroutines so it can be better explained.

[ruby-enumerator]: https://ruby-doc.org/3.2.2/Enumerator.html
[ruby-fiber]: https://ruby-doc.org/3.2.2/Fiber.html

## Kotlin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will note that the yield points for Kotlin's "coroutines" under the hood are delimited by continuations. Its suspendCoroutine { coroutine -> ... } isn't far off of await new Promise((resolve, reject) => { ... }) in ECMAScript/JavaScript. Likewise, the end accepts a continuation object (it's an interface) to essentially act like ECMAScript's/JavaScript's promise.then(...).

  • Its coroutine.resume(value) is like ECMAScript's resolve(value).
  • coroutine.resumeWithException(value) is almost a carbon copy clone of reject(value).

The way sequence under the hood handles it is by executing the suspend function with a context that store both the outer and inner continuation objects. So in essence, it's little more than a push-based async/await in the end.

@traviscross traviscross merged commit bc01ed8 into rust-lang:master Apr 7, 2024
@traviscross
Copy link
Contributor

This RFC has been accepted by the lang team and has now been merged.

Thanks to all those who reviewed this RFC and provided helpful feedback.

To keep up with further work on gen blocks, follow the tracking issue:

@RalfJung
Copy link
Member

@rpjohnst @withoutboats interesting discussion, I hope this won't get lost in this RFC thread!

In particular it may be worth thinking a bit more about this concept of the resume argument "matching" the previous yield point. One thing that occurred to me (and I think is more or less explicit in the discussion above) is that general coroutines with yield type Y and resume type R are a lot like functions with a single algebraic effect where Y is passed to the effect handler and R is passed back (assuming the -- common -- restriction that an effect handler can call the continuation at most once). A function with multiple effects is then something like a coroutine where the yield type is a sum of all the effect argument types, and similar for the resume type -- except that we lose some type safety that way since we can no longer enforce that the resume type matches the previous yield. So in that sense one could imagine adding algebraic effects to Rust via the existing state machine transform, but some type safety would be lost / a runtime check on each resume would be needed to check that the argument has the right type. Neat but probably complete overkill...

However, it turns out in a sense we already have two different kinds of yield points because of laziness! We can view the coroutine as immediately yielding at an "initial" yield point with resume type (), and then all future yield points have resume type R. The interface of a single resume/step function cannot express this, and that's where clutches like Python's requirement to pass None the first time come from. But if we can find a way to actually encode general algebraic effects, where the resume type is guaranteed to match the previous yield, then that would also be able to capture the requirement that the first step cannot take an argument...

I've created a Zulip thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-edition-2024 Area: The 2024 edition disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.