Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranged-based for for user-defined types #1885

Merged
merged 52 commits into from
Aug 25, 2023

Conversation

Pixep
Copy link
Contributor

@Pixep Pixep commented Aug 3, 2022

The goal of this proposal is to provide a way for user-defined types to support range-based iteration with for.
The current proposed solution exposes 3 interfaces that can be implemented by user types to enable support for
ranged-for loops.

Co-authored-by: josh11b [email protected]

@Pixep Pixep changed the title for statement ranged-based for for user-defined types Aug 3, 2022
@Pixep Pixep added the proposal A proposal label Aug 6, 2022
@Pixep Pixep requested a review from a team August 6, 2022 04:28
@Pixep Pixep added the proposal rfc Proposal with request-for-comment sent out label Aug 6, 2022
@Pixep Pixep self-assigned this Aug 6, 2022
@Pixep Pixep marked this pull request as ready for review August 6, 2022 23:31
Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking promising!

I think we should have at least a sketch of a plan for mutability in this proposal. Given that our main way of providing mutable handles to objects is via pointers, I think we'd want some way to convert a container into a range of pointers to container elements. I think that can be done without any changes to this proposal, for example:

fn F() {
  var v: Array(i32, 5) = ...;
  for (p: i32* in v.MutableElements()) {
    // ...
  }
}

where MutableElements produces an Iterable where .ElementType == .Self.ElementType*. (Maybe we'd want a different interface for converting the container to a sequence of element pointers, but we don't need to decide on that here.)

proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
docs/design/generics/overview.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
The solution highlighted embeds the end condition in `Next()`, returns a null
`Optional` at the last elements. This guarantees that `Iterator`s cannot be used
improperly (code safety goal), and exposes a simple interface, but does have a
performance cost (cost of the `Optional`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it could also have an interoperability and migration cost, because Carbon ranged-for loops won't work with C++ containers, and vice-versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option for addressing interoperability without binding ourselves to the C++ API would be to have the interop layer automatically inject adopters for language-conforming iterators. e.g., a Carbon Iterable could get adapted to look like a C++ iterator for C++ code, whereas a C++ iterator could have an external impl defined for Carbon code (and likely should need that regardless), letting each language keep its own idioms.

(it may make sense to mark interop as an open question since it's not designed out?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be challenging to bridge the gap between these two APIs with adapters, because there's a substantial "impedance mismatch" between them that's hard to hide.

  • With this design, a Carbon iterator basically plays the same role as a begin/end iterator pair in C++, so there isn't a one-to-one relationship between the objects on either side of the adaptation. That will be a particular problem for automatic adaptation, but may also cause problems for manual adaptation.
  • A C++ forward iterator (or better) must support calling operator* multiple times, returning the same result so long as the iterator is not advanced. To satisfy that requirement, an adaptor would have to cache the value returned from Next.
  • C++ iterators support mutation via operator*. That compounds the previous problem, because mutation has to apply to the original element, not a cached copy.
  • We could avoid those last two problems by saying that Carbon iterators are exposed in C++ as input iterators, but that excludes mutating use cases, and could dramatically degrade performance even for read-only cases, because many algorithms can't be implemented efficiently (or at all) in terms of the input iterator API.

Copy link
Contributor Author

@Pixep Pixep Oct 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last points definitely highlight the need for a Get function, in place of a simpler Next-only approach. I propose to move in that direction.

Regarding the iterator pairs, it is an option to expose an way to get an end iterator for comparison purposes, and interoperability. From a user standpoint, I would probably prefer to not be forced to rely on this end operator, which can be akward at times (needing the container in scope and such), but as a additional feature I believe it makes sense, and likely to have other uses (reverse and such).

Edit: Proposal amended with Get for Iterator.

The solution highlighted embeds the end condition in `Next()`, returns a null
`Optional` at the last elements. This guarantees that `Iterator`s cannot be used
improperly (code safety goal), and exposes a simple interface, but does have a
performance cost (cost of the `Optional`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option for addressing interoperability without binding ourselves to the C++ API would be to have the interop layer automatically inject adopters for language-conforming iterators. e.g., a Carbon Iterable could get adapted to look like a C++ iterator for C++ code, whereas a C++ iterator could have an external impl defined for Carbon code (and likely should need that regardless), letting each language keep its own idioms.

(it may make sense to mark interop as an open question since it's not designed out?)

proposals/p1885.md Outdated Show resolved Hide resolved
@geoffromer geoffromer mentioned this pull request Aug 15, 2022
proposals/p1885.md Outdated Show resolved Hide resolved
The solution highlighted embeds the end condition in `Next()`, returns a null
`Optional` at the last elements. This guarantees that `Iterator`s cannot be used
improperly (code safety goal), and exposes a simple interface, but does have a
performance cost (cost of the `Optional`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be challenging to bridge the gap between these two APIs with adapters, because there's a substantial "impedance mismatch" between them that's hard to hide.

  • With this design, a Carbon iterator basically plays the same role as a begin/end iterator pair in C++, so there isn't a one-to-one relationship between the objects on either side of the adaptation. That will be a particular problem for automatic adaptation, but may also cause problems for manual adaptation.
  • A C++ forward iterator (or better) must support calling operator* multiple times, returning the same result so long as the iterator is not advanced. To satisfy that requirement, an adaptor would have to cache the value returned from Next.
  • C++ iterators support mutation via operator*. That compounds the previous problem, because mutation has to apply to the original element, not a cached copy.
  • We could avoid those last two problems by saying that Carbon iterators are exposed in C++ as input iterators, but that excludes mutating use cases, and could dramatically degrade performance even for read-only cases, because many algorithms can't be implemented efficiently (or at all) in terms of the input iterator API.

@zygoloid zygoloid mentioned this pull request Aug 25, 2022
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
Consumable interface can be handled in a different proposal, and is not
key to moving forward for the range-based for support.
@@ -81,6 +85,12 @@ The original `for` proposal does not define a way for user-defined types to
opt-in to support range-based for loops. An approach is proposed in this
document.

This can be achieved using either a cursor or an iterator. A cursor benefits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth spelling out the difference between "cursor" and "iterator"; I'm not sure everyone is familiar with that terminology.

proposals/p1885.md Outdated Show resolved Hide resolved
// `container` is now empty
```

### Limitations and future language improvements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also mention language support for const, and language support for user-defined l-value expressions, because those will determine how iterators should support mutability.

proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
@Pixep Pixep requested a review from geoffromer April 6, 2023 23:53
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
Comment on lines 628 to 629
- `begin()` and `end()` methods or free functions
- supporting pre-increment operator `++` and indirection operator `*`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • Something to connect these two requirements to make it clear that the type returned by begin() and end() is what has to support these operators.
  • I believe the iterator type must also support inequality comparison (at least). According to https://en.cppreference.com/w/cpp/language/range-for a C++ range-based for loop gets rewritten to include for ( ; __begin != __end; ++__begin).

We may need to be careful about closely matching C++ so we can be as compatible as we can manage, as long as it doesn't impact the usability for ordinary users. In C++, my understanding is as long as the rewrite type checks, the ranged-based for loop is legal, like a template. This means we may need to support begin() and end() returning different types, as long as != is defined on those two types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely yes, clarified.

proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
Comment on lines 775 to 779
auto begin() -> IterateIterator<Iterable<C, T>> {
return IterateIterator<Iterable<C, T>>(*this, NewCursor());
}
auto end() -> IterateIterator<Iterable<C, T>> {
return IterateIterator<Iterable<C, T>>::Invalid(*this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Iterable<...> here? It looks like IterateIterator expects to be parameterized by a container type. Is Iterable some sort of adapter to define the CursorType and ElementType members? I don't see it defined anywhere else in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a typo, and should be Iterate<C, T>, which is the C++ equivalent to Carbon's type. Fixed.

Comment on lines +715 to +717
need to copy and cache the last value, to allow for `*it` to be called. This
puts a requirement on `ElementType` to be copyable for this interoperability to
be usable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will still work if ElementType is move-only (but see the comment below), it just means that IterateIterator will be move-only in those cases. That means it won't be a valid iterator according to the standard library, but it should still work with range-based for loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked back at the code:

     if (const auto v = container_->Next(*cursor_); v) {
      value_ = *std::move(v);
    }

Won't Next() require ElementType to be copyable nonetheless, as we are effectively returning a copy of an element, and not moving anything from the container? Even if we avoid an optional-like wrapper, or just move from it, the ElementType still has to be copied at some point (likely withing Next()).

Maybe I forgot something between now and then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recent versions of C++, if Next returns an ElementType rather than a reference, the code you quote won't actually copy or move the return value; it will use v itself as the storage for the returned temporary. Now, you're right that in this context, the body of Next will very likely need to make a copy, but Next is written in pure Carbon, so that problem isn't caused by caching in the interoperability layer, and in fact it's not caused by the interop layer at all -- the same problem exists in pure Carbon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this is by design, and not caused by the interop. To expand, I think we could say "move-only" in the C++ domain, most that would be ignoring a very likely "copiable" requirement on Carbon side / Next. So all-in-all, "copyable" seem more correct, and the upper bound of reqs, vs the lower bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this doc doesn't explicitly talk about copyability on the Carbon side, so by introducing it here as a new topic that's specifically tied to caching in the interop layer, it really makes it sound like native Carbon will not require copyability, but C++ interop will.

I'd suggest revising the "Large elements and Optional" section to explicitly talk about the fact that this design may force Next to make a copy, and hence require ElementType to be copyable. Then down here you can say that the caching requires ElementType to be movable, and note that this won't matter in practice if Carbon requires it to be copyable anyway.

proposals/p1885.md Outdated Show resolved Hide resolved
proposals/p1885.md Show resolved Hide resolved
Pixep and others added 2 commits April 23, 2023 15:18
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about ready for leads review

for (v: my_dict.(Iterate.ElementType) in my_dict.reversed) {}
```

### Large elements and Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandlerc I feel like we had a conversation about this and r-value representations. Could you recall where we ended up with this?

proposals/p1885.md Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about ready for leads review

Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this. It seems likely to me that we'll want to revisit some aspects of this eventually, such as ergonomic improvements for mutable iteration, but this seems like a good foundation to use as the basis for further exploration.

Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also approving on behalf of leads. Thank you for working on this!

@zygoloid zygoloid enabled auto-merge July 1, 2023 22:19
@zygoloid zygoloid added this pull request to the merge queue Aug 25, 2023
Merged via the queue into carbon-language:trunk with commit 1d4e9ee Aug 25, 2023
6 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Sep 8, 2023
Incorporates proposals: #1885, #2138, #2188, #2200, #2360, #2760, #2964,
and #3162.

---------

Co-authored-by: Richard Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal rfc Proposal with request-for-comment sent out proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants