Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExactSizeIterator impl for iter::Chain #66531

Conversation

LukasKalbertodt
Copy link
Member

In the issue about this impl (#34433), people concluded that it was impossible to add due to usize overflowing: if we create an iterator with Chain that contains more than usize::MAX items, then ExactSizeIterator::len would not be able to return the correct value. This is true, but I don't think that's a reason not to add the impl:

  • There are plenty of ways to create iterators with more than usize::MAX elements without using Chain. Range<u128>, iter::repeat, flat_map, just to name a few.
  • There are already multiple examples of functions involving usize that state they would panic on overflow: Iterator::count, Iterator::enumerate, Iterator::position.

So the "long iterator usize overflow" problem is already a thing and more importantly: it is already well documented how affected methods behave. So I don't think it's a problem to add this impl if it is properly documented (which hopefully this PR does). After that issue was closed, two people mentioned how useful that impl would be and that a panic in the overflow case would be certainly fine. I agree with that.

r? @SimonSapin
CC @oli-obk @RalfJung @bluss (some people involved in the original issue)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 18, 2019
@Centril
Copy link
Contributor

Centril commented Nov 19, 2019

This feels quite reasonable; cc also @scottmcm

@scottmcm scottmcm added needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 20, 2019
@scottmcm
Copy link
Member

This affects far more than just what's mentioned here. There are a bunch of implementations that rely on DEI+ESI to be able to correctly propagate DEI (because of things like rposition) that this would also break, such as https://doc.rust-lang.org/std/iter/struct.Zip.html#impl-DoubleEndedIterator

Also, this PR doesn't maintain the requirement from the ESI docs that

the implementation of size_hint must return the exact size of the iterator.

So I'm skeptical. It sounds like the actual proposed change here is essentially to change ESI's requirements to those of TrustedLen, just without the unsafe part and with a len method that does .size_hint().1.unwrap()?

I feel like there's a big difference between methods that can panic because they need to iterate (because anything can be .map(|_| panic!())) and methods that look like accessors but which can panic.

@LukasKalbertodt
Copy link
Member Author

Also, this PR doesn't maintain the requirement from the ESI docs that

the implementation of size_hint must return the exact size of the iterator.

Good point, I missed that.

In the issue I linked Range<u32> was brought up: it implements ExactSizeIterator despite Rust supporting 16 bit systems. The impl for Range<u64> was removed here. So in theory we should also remove the Range<u32> impl.

It sounds like the actual proposed change here is essentially to change ESI's requirements

I did not knowingly propose that. But yes, it seems like with the current definitions of the traits, it's indeed impossible to add this impl. But that would be a shame! I think this impl for Chain is something that is simply useful.

I don't know if/how much we could change ExactSizeIterator. Adding an exception to the trait would be helpful:

If the length of the iterator overflows usize, then len must panic and size_hint is not required to report the exact length (though it should return None as upper bound).

But that's technically not a backwards-compatible change as we relax some conditions. So what code would this break? I think the change that len must panic in that case is completely fine, because the trait does not currently guarantee that len would not panic. In general, people have to assume that some methods panic.

Regarding "correct size_hint": you could write code where you bound a type by ExactSizeIterator and then only use size_hint in the assumption that it always returns the exact size. But that code currently also can't deal with iterators that have more than usize elements (because size_hint simply can't return the correct length in those cases).

So I think the only real breakage is in code that uses ExactSizeIterator to infer that an iterator has less than usize elements. This can technically be inferred from the current ESI definition (although it is never explicitly stated!). This honestly doesn't sound like it would affect any real code, really. In particular, the change itself does not break code, it would simply change some minor assumptions, which in turn could lead to changes in user code which then could break things.

@Centril Centril added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 26, 2019
@SimonSapin
Copy link
Contributor

I’m sorry, I don’t have bandwidth at the moment look into and make a judgment about the consequences of this proposal. Reassigning to someone else in T-libs at random: r? @sfackler

@rust-highfive rust-highfive assigned sfackler and unassigned SimonSapin Nov 29, 2019
@Dylan-DPC-zz
Copy link

r? @KodrAus

@rust-highfive rust-highfive assigned KodrAus and unassigned sfackler Dec 10, 2019
@bors
Copy link
Contributor

bors commented Dec 25, 2019

☔ The latest upstream changes (presumably #67596) made this pull request unmergeable. Please resolve the merge conflicts.

Comment on lines +408 to +409
/// An overflow in those methods leads to a wrong result or a panic. If
/// debug assertions are enabled, a panic is guaranteed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we could guarantee a panic whether or not debug assertions are enabled? Reserving the right to give a wrong result doesn't seem helpful here.

Copy link
Contributor

@timvermeulen timvermeulen Jan 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserving this right is "helpful" in the sense that it means we don't have any performance overhead from overflow checks in release builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong that len() could return something smaller than size_hint().0 on an iterator that implements TrustedLen. What sort of overhead are we talking about in practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to all methods listed in that doc comment, not just len. Iterator::count currently does not check for overflow in release builds because it simply adds 1 repeatedly. In the case of len, I agree that it should probably panic rather than return an incorrect value, if we were to land this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len guarantees a panic on overflow. I changed this in response to this comment.

This sentence here is a bit vague, yes. It's still technically correct for len, but maybe I should add something like "ExactIterator::len even guarantees a panic in release mode"? On the other hand, this behavior is already documented on that method (see above).

As for the other methods: I don't think it's in the scope of this PR to change them. And currently they only have debug checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. In that case I am on board with this change. Want to propose FCP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtolnay I'm not quite sure if you want me to add another sentence to this part of the documentation or not. (Will rebase either way soon.)

Regarding FCP: the main concern is this one. I replied to that comment, but no one else did, so I don't think it has been resolved yet. But maybe FCP is a way to get the others to look at this?

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 18, 2020
@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jan 18, 2020

I think I like this change, but I think it would be better to change ExactSizeIterator to have the same requirements as TrustedLen (except without allowing unsafe code to depend on it to be correctly implemented because it's a safe trait). Then implement ExactSizeIterator for Chain<A, B>, Cycle<I>, Range<A>, RangeFrom<A>, RangeInclusive<A>, Repeat<A>, and RepeatWith<F>. Curiously, Cycle<I> doesn't have TrustedLen implementation, but I think it was simply missed (I don't think there would be anything preventing it).

Infinite iterators are kind of an edge case anyway, I think panicking or returning weird results is acceptable.

This shouldn't cause libraries that didn't have UB to suddenly have UB because the trait was safe, so libraries that did depend on that trait to be correct were incorrect to begin with.

@timvermeulen
Copy link
Contributor

Peekable, too.

@ollie27
Copy link
Member

ollie27 commented Jan 19, 2020

If you take the iterator:

(0..usize::max_value()).chain(0..10).skip(usize::max_value())

It returns exactly 10 items and has a size_hint of (0, None) but that's fine because it doesn't implement ExactSizeIterator. With this PR it suddenly would which means that ExactSizeIterator would become completely meaningless or Skip wouldn't be able to implement it anymore. Either way this is very much a breaking change.

@Dylan-DPC-zz
Copy link

Any updates @LukasKalbertodt

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Feb 3, 2020

Skip seems like a convincing argument to me why this shouldn't be done. It would be a breaking change to remove len method from Skip iterators.

Yes, this particular iterator is an edge case because calling its next() method will go into an infinite loop. However, something like (0..usize::max_value()).chain(0..10).skip(20) doesn't have known length, but wouldn't meet changed ExactSizeIterator requirements.

It would be more practical to make this change if Skip wasn't an ExactSizeIterator, but it cannot be done now.

@LukasKalbertodt
Copy link
Member Author

Thinking about it again, I might agree that this change is too tricky to land. The conditions for when len() is allowed to panic would be pretty complex. Is someone still in favor of this change and would dislike me closing this PR?

Also, I got another idea how one could easily work around limitations like this (i.e. Chain not implementing ExactSizeIterator). I filed this here: #68995.

@LukasKalbertodt
Copy link
Member Author

Closing now, as explained in the previous comment.

@kvark
Copy link
Contributor

kvark commented Aug 17, 2020

I wonder if we can come back to this in the next edition. For example, by no longer implementing ExactSizeIterator for Skip? It still seems like a very narrow soundness hole in the sea of usefulness for ExactSizeIterator.

@KamilaBorowska
Copy link
Contributor

Editions don't allow for different trait implementations, there is only one "core Rust", see https://rust-lang.github.io/rfcs/2052-epochs.html.

@ssomers
Copy link
Contributor

ssomers commented Dec 22, 2021

My gut reaction is that size_hint should panic when len would. But that's an even bigger breaking change for those that didn't care about the ExactSizeIterator-ness before. There's a perfectly working example that shouldn't start panicking because size_hint has the urge to be exact but doesn't know how.

It's possible though to let people opt in to this strict behaviour, if we provide an additional method (for instance, chain_exact) to chain exact iterators and preserve exactness (or panic), as in this branch.

Yes, this particular iterator is an edge case because calling its next() method will go into an infinite loop.

@xfix, which iterator is this about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-fcp This change is insta-stable, so needs a completed FCP to proceed. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.