-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Iterator::array_chunks()
#92393
Add Iterator::array_chunks()
#92393
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
002a875
to
66d8831
Compare
See #87776 for a previous PR attempting that, some of the discussion might be relevant. |
@the8472 Thanks, I will check it out. Compared the benchmarks and that implementation is definitely a lot more performant 😅. I will look into improving this one using some ideas from that one. |
66d8831
to
3ceb0cd
Compare
Okay so I have basically taken #87776 and improved it, most notably the following:
Using the benchmark functions the performance is now the following
Compared to previously
|
|
||
/// Returns a reference to the remaining elements of the original iterator | ||
/// that are not going to be returned by this iterator. The returned slice | ||
/// has at most `N-1` elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to mention that it is only useful after next()
is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Nevermind, my comment was nonsense, I forgot that this doesn't have access to the underlying source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to mention that it is only useful after next() is None ?
Yeah, this is a good point. I was thinking it might also be useful for this to return an Option
so that it easy to tell the difference between an empty remainder and a not yet known remainder. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably suggest a single fn into_remainder(self) -> Option<array::IntoIter>
method, and that'll also let us get rid of the custom remainder struct entirely I think? (We can directly store the Option<array::IntoIter>
as the second field).
|
||
/// Returns a reference to the remaining elements of the original iterator | ||
/// that are not going to be returned by this iterator. The returned slice | ||
/// has at most `N-1` elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably suggest a single fn into_remainder(self) -> Option<array::IntoIter>
method, and that'll also let us get rid of the custom remainder struct entirely I think? (We can directly store the Option<array::IntoIter>
as the second field).
struct FrontGuard<T, const N: usize> { | ||
/// A pointer to the array that is being filled. We need to use a raw | ||
/// pointer here because of the lifetime issues in the fold implementations. | ||
ptr: *mut T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we could store the array directly inside the guard?
} | ||
|
||
#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")] | ||
impl<I, const N: usize> DoubleEndedIterator for ArrayChunks<I, N> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may make sense to not have double ended iterators? It feels a little unclear that the semantics of cutting tail elements are the ones users will expect. Do we have a use case for backwards chunk iteration?
It seems plausible that we could expose a dedicated .skip_remainder() -> ArrayChunksRev<I, N>
or similar, perhaps, to more explicitly skip it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think because .rev().array_chunks()
is different to .array_chunks().rev()
it is worth having. It is also worth noting that the following APIs have a DoubleEndedIterator
implementation that behave in this way and cut the tail elements.
However I do realize since iterators are often used in a consuming way this might be strange. Also it does require ExactSizeIterator
bound. Additionally, I haven't personally found a use case for this so I am happy to remove this implementation.
// SAFETY: `array` will still be valid if `guard` is dropped. | ||
let mut guard = unsafe { FrontGuard::new(&mut array) }; | ||
|
||
for slot in array.iter_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: for any next
method where there's a loop, consider just using self.try_for_each(ControlFlow::Break).break_value()
instead. That's simpler code, less unsafe
, gives better coverage of your try_fold
override, and should be just as fast -- or even better for inner iterators that have their own try_fold
overrides.
let (lower, upper) = self.iter.size_hint(); | ||
// Keep infinite iterator size hint lower bound as `usize::MAX`. This | ||
// is required to implement `TrustedLen`. | ||
if lower == usize::MAX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. The lower bound can be usize::MAX
without being infinite -- take .repeat(x).take(usize::MAX)
, for example, or just [(); usize::MAX].into_iter()
. Nor is (usize::MAX, None)
certain to be infinite -- especially on 16-bit platforms.
Adapters that make things shorter can propagate ExactSizeIterator
, but not TrustedLen
. (And, conversely but not relevant here, adapters that makes things longer, like Chain
, can propagate TrustedLen
but not ExactSizeIterator
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks, so you are saying we can't implement TrustedLen
for this at all because it makes the iterator shorter? I think I got a little confused by this explanation in the TrustedLen
documentation. It might need to be extended a little.
/// The iterator reports a size hint where it is either exact
/// (lower bound is equal to upper bound), or the upper bound is [`None`].
/// The upper bound must only be [`None`] if the actual iterator length is
/// larger than [`usize::MAX`]. In that case, the lower bound must be
/// [`usize::MAX`], resulting in an [`Iterator::size_hint()`] of
/// `(usize::MAX, None)`.
///
/// The iterator must produce exactly the number of elements it reported
/// or diverge before reaching the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks, so you are saying we can't implement TrustedLen for this at all because it makes the iterator shorter?
That's correct.
For a specific example, if the actual length is usize::MAX + 3
or usize::MAX + 5
, the size_hint will be (usize::MAX, None)
.
But array_chunks::<2>
would need to return a length of exactly usize::MAX/2 + 2
or usize::MAX/2 + 3
respectively for them, which it can't, so thus it can't be TrustedLen
.
(Similar logic explains why https://doc.rust-lang.org/std/iter/struct.Skip.html doesn't implement TrustedLen
.)
// SAFETY: `array` will still be valid if `guard` is dropped. | ||
let mut guard = unsafe { FrontGuard::new(&mut array) }; | ||
|
||
let result = self.iter.try_fold(init, |mut acc, item| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really useful functionality, so consider whether there's a way to expose it on Iterator
directly without the adapter, and then make the adapter really simple. (Vague analogy: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.array_chunks is a simple wrapper around https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks )
For example, imagine you had
pub trait Iterator {
fn next_chunk<const N: usize>(&mut self) -> Result<[T; N], array::IntoIter<T, N>>;
}
That's usable independently, overridable for array and slice iterators that can do it more efficiently, and would be exactly the piece you need to make the ArrayChunks
iterator easy to write in safe code.
(It'd be nice if it were -> Result<[T; N], array::IntoIter<T, N-1>>
, but that's probably not writable right now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be useful to manually unroll an iterator. E.g. we know that a vec goes from 0 to 8 capacity with its default allocation behavior (for small T) and then grows in multiples from there. So we could try unrolling the loop in batches of N <= 8. But this might still be difficult to optimize for Chain
adapters if a middle part straddles the chain boundary.
A try_fold_chunked
might be easier to optimize because it would split the loop into 3 parts. Going from try_fold_chunked
to chunked_next
is easier than the other way around.
|
||
// SAFETY: The array will still be valid if `guard` is dropped and | ||
// it is forgotten otherwise. | ||
let mut guard = unsafe { FrontGuard::new(&mut array) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's lots of unsafe
here, as has often been the case in PRs for new methods using const generic arrays.
Please re-use the stuff from core::array
for this (see
rust/library/core/src/array/mod.rs
Line 792 in 498eeb7
fn try_collect_into_array<I, T, R, const N: usize>(iter: &mut I) -> Option<R::TryType> |
array
module, or something, but that's better than making more versions of these Guard
types in more places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it makes sense to have this stuff in core::array
👍
3ceb0cd
to
6adcad8
Compare
ping from triage: FYI: when a PR is ready for review, send a message containing |
☔ The latest upstream changes (presumably #94787) made this pull request unmergeable. Please resolve the merge conflicts. |
closing this as inactive |
…-ou-se Add `Iterator::next_chunk` See also rust-lang#92393 ### Prior art - [`Itertools::next_tuple()`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.next_tuple) ### Unresolved questions - Should we also add `next_chunk_back` to `DoubleEndedIterator`? - Should we rather call this `next_array()` or `next_array_chunk`?
Add `Iterator::array_chunks` (take N+1) A revival of rust-lang#92393. r? `@Mark-Simulacrum` cc `@rossmacarthur` `@scottmcm` `@the8472` I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
Add `Iterator::array_chunks` (take N+1) A revival of rust-lang#92393. r? `@Mark-Simulacrum` cc `@rossmacarthur` `@scottmcm` `@the8472` I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
Add `Iterator::next_chunk` See also rust-lang/rust#92393 ### Prior art - [`Itertools::next_tuple()`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.next_tuple) ### Unresolved questions - Should we also add `next_chunk_back` to `DoubleEndedIterator`? - Should we rather call this `next_array()` or `next_array_chunk`?
Add `Iterator::array_chunks` (take N+1) A revival of rust-lang/rust#92393. r? `@Mark-Simulacrum` cc `@rossmacarthur` `@scottmcm` `@the8472` I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
This has been similarly implemented as
.tuples()
initertools
asItertools::tuples()
. But it makes more sense with arrays since all elements are the same type.I will update stability attributes with a tracking issue if accepted.
See also #92394