-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand in-place iteration specialization to Flatten, FlatMap and ArrayChunks #110353
Conversation
r? @cuviper (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
1152eea
to
4048d32
Compare
|
||
use super::{InPlaceDrop, InPlaceDstBufDrop, SpecFromIter, SpecFromIterNested, Vec}; | ||
|
||
/// Specialization marker for collecting an iterator pipeline into a Vec while reusing the | ||
/// source allocation, i.e. executing the pipeline in place. | ||
#[rustc_unsafe_specialization_marker] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the specialization-thicket is growing more complex and I encountered a bunch of ICEs along the way... relying on one fewer of these hopefully makes things a bit less precarious.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4048d32c608af1520e9fb65b6214fee5e419534a with merge 911084b0a23ea94b199d68b6de70f2a07e858278... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (911084b0a23ea94b199d68b6de70f2a07e858278): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
impl<T> KnownExpansionFactor for Once<T> {} | ||
impl<T, const N: usize> KnownExpansionFactor for [T; N] { | ||
const FACTOR: Option<NonZeroUsize> = NonZeroUsize::new(N); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you're trying to be complete, but I think this could add: option::{Iter, IterMut}
, Result
, result::{IntoIter, Iter, IterMut}
, OnceWith
, and array::IntoIter
.
And then adapters... or would that run into specialization woes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for in-place iteration which means items will be moved through the pipeline so it's unlikely that items will be by-ref iterators. It's not impossible, e.g. when borrowing from some global state or something, but less likely.
Adapters also wouldn't make much sense for flatten()
. But it could work for flat_map()
. So that's an interesting idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It's not exhaustive but hopefully covers common shapes.
// e.g. | ||
// - 1 x [u8; 4] -> 4x u8, via flatten | ||
// - 4 x u8 -> 1x [u8; 4], via array_chunks | ||
mem::size_of::<SRC>() * step_merge.get() == mem::size_of::<DEST>() * step_expand.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we also work with a reduction in size? i.e. src_size * merge >= dest_size * expand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be possible. We could even go for lower alignment (which would also require a allocator::shrink).
But I'd want to benchmark those separately because the tradeoffs might be different. It could hit realloc more often which could end up being costly or it could waste a more space in reused but mostly empty allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, benchmarks still look fine.
1696710
to
4db35cd
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4db35cd1e0fdaa0be1f7a1c80a894a0aa22c91e5 with merge addfbc34bf4fab5c5360d413cbdb421be5165e10... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (addfbc34bf4fab5c5360d413cbdb421be5165e10): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
4db35cd
to
d91cd0c
Compare
@bors retry (docker limit) |
…uviper Expand in-place iteration specialization to Flatten, FlatMap and ArrayChunks This enables the following cases to collect in-place: ```rust let v = vec![[0u8; 4]; 1024] let v: Vec<_> = v.into_iter().flatten().collect(); let v: Vec<Option<NonZeroUsize>> = vec![NonZeroUsize::new(0); 1024]; let v: Vec<_> = v.into_iter().flatten().collect(); let v = vec![u8; 4096]; let v: Vec<_> = v.into_iter().array_chunks::<4>().collect(); ``` Especially the nicheful-option-flattening should be useful in real code.
💥 Test timed out |
Hanging while building bootstrap, which happens before building std. @bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (df0295f): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 672.848s -> 672.731s (-0.02%) |
Fix in-place collect not reallocating when necessary Regression introduced in rust-lang#110353. This was [caught by miri](https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Cron.20Job.20Failure.20.28miri-test-libstd.2C.202023-11.29/near/404764617) r? `@saethlin`
Fix in-place collect not reallocating when necessary Regression introduced in rust-lang/rust#110353. This was [caught by miri](https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Cron.20Job.20Failure.20.28miri-test-libstd.2C.202023-11.29/near/404764617) r? `@saethlin`
…viper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
…viper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
Rollup merge of rust-lang#120116 - the8472:only-same-alignments, r=cuviper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
Pkgsrc changes: * Adapt checksums and patches. Upstream chnages: Version 1.76.0 (2024-02-08) ========================== Language -------- - [Document Rust ABI compatibility between various types] (rust-lang/rust#115476) - [Also: guarantee that char and u32 are ABI-compatible] (rust-lang/rust#118032) - [Warn against ambiguous wide pointer comparisons] (rust-lang/rust#117758) Compiler -------- - [Lint pinned `#[must_use]` pointers (in particular, `Box<T>` where `T` is `#[must_use]`) in `unused_must_use`.] (rust-lang/rust#118054) - [Soundness fix: fix computing the offset of an unsized field in a packed struct] (rust-lang/rust#118540) - [Soundness fix: fix dynamic size/align computation logic for packed types with dyn Trait tail] (rust-lang/rust#118538) - [Add `$message_type` field to distinguish json diagnostic outputs] (rust-lang/rust#115691) - [Enable Rust to use the EHCont security feature of Windows] (rust-lang/rust#118013) - [Add tier 3 {x86_64,i686}-win7-windows-msvc targets] (rust-lang/rust#118150) - [Add tier 3 aarch64-apple-watchos target] (rust-lang/rust#119074) - [Add tier 3 arm64e-apple-ios & arm64e-apple-darwin targets] (rust-lang/rust#115526) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Add a column number to `dbg!()`] (rust-lang/rust#114962) - [Add `std::hash::{DefaultHasher, RandomState}` exports] (rust-lang/rust#115694) - [Fix rounding issue with exponents in fmt] (rust-lang/rust#116301) - [Add T: ?Sized to `RwLockReadGuard` and `RwLockWriteGuard`'s Debug impls.] (rust-lang/rust#117138) - [Windows: Allow `File::create` to work on hidden files] (rust-lang/rust#116438) Stabilized APIs --------------- - [`Arc::unwrap_or_clone`] (https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.unwrap_or_clone) - [`Rc::unwrap_or_clone`] (https://doc.rust-lang.org/stable/std/rc/struct.Rc.html#method.unwrap_or_clone) - [`Result::inspect`] (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect) - [`Result::inspect_err`] (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect_err) - [`Option::inspect`] (https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.inspect) - [`type_name_of_val`] (https://doc.rust-lang.org/stable/std/any/fn.type_name_of_val.html) - [`std::hash::{DefaultHasher, RandomState}`] (https://doc.rust-lang.org/stable/std/hash/index.html#structs) These were previously available only through `std::collections::hash_map`. - [`ptr::{from_ref, from_mut}`] (https://doc.rust-lang.org/stable/std/ptr/fn.from_ref.html) - [`ptr::addr_eq`](https://doc.rust-lang.org/stable/std/ptr/fn.addr_eq.html) Cargo ----- See [Cargo release notes] (https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-176-2024-02-08). Rustdoc ------- - [Don't merge cfg and doc(cfg) attributes for re-exports] (rust-lang/rust#113091) - [rustdoc: allow resizing the sidebar / hiding the top bar] (rust-lang/rust#115660) - [rustdoc-search: add support for traits and associated types] (rust-lang/rust#116085) - [rustdoc: Add highlighting for comments in items declaration] (rust-lang/rust#117869) Compatibility Notes ------------------- - [Add allow-by-default lint for unit bindings] (rust-lang/rust#112380) This is expected to be upgraded to a warning by default in a future Rust release. Some macros emit bindings with type `()` with user-provided spans, which means that this lint will warn for user code. - [Remove x86_64-sun-solaris target.] (rust-lang/rust#118091) - [Remove asmjs-unknown-emscripten target] (rust-lang/rust#117338) - [Report errors in jobserver inherited through environment variables] (rust-lang/rust#113730) This [may warn](rust-lang/rust#120515) on benign problems too. - [Update the minimum external LLVM to 16.] (rust-lang/rust#117947) - [Improve `print_tts`](rust-lang/rust#114571) This change can break some naive manual parsing of token trees in proc macro code which expect a particular structure after `.to_string()`, rather than just arbitrary Rust code. - [Make `IMPLIED_BOUNDS_ENTAILMENT` into a hard error from a lint] (rust-lang/rust#117984) - [Vec's allocation behavior was changed when collecting some iterators] (rust-lang/rust#110353) Allocation behavior is currently not specified, nevertheless changes can be surprising. See [`impl FromIterator for Vec`] (https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#impl-FromIterator%3CT%3E-for-Vec%3CT%3E) for more details. - [Properly reject `default` on free const items] (rust-lang/rust#117818)
Fix in-place collect not reallocating when necessary Regression introduced in rust-lang/rust#110353. This was [caught by miri](https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Cron.20Job.20Failure.20.28miri-test-libstd.2C.202023-11.29/near/404764617) r? `@saethlin`
Fix in-place collect not reallocating when necessary Regression introduced in rust-lang/rust#110353. This was [caught by miri](https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Cron.20Job.20Failure.20.28miri-test-libstd.2C.202023-11.29/near/404764617) r? `@saethlin`
This enables the following cases to collect in-place:
Especially the nicheful-option-flattening should be useful in real code.