optimize implementation of Zip::fold and company #100124

sarah-ek · 2022-08-03T22:05:26Z

adds a more efficient implementation of core::iter::Zip::{fold, rfold, try_fold, try_rfold}
on my machine, this gives a 15% speedup on the iter::bench_skip_cycle_skip_zip_add_sum benchmark, and a 32% speedup on iter::bench_skip_then_zip.

i haven't noticed any performance regressions

rust-highfive · 2022-08-03T22:05:29Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon.

Please see the contribution instructions for more information.

rustbot · 2022-08-03T22:05:29Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

timvermeulen

This is an excellent first contribution 👍

I've left some initial feedback. Zip is weird in the sense that it needs to choose which inner iterator to iterate internally and which one externally, but to me choosing the first one to iterate internally seems like a very reasonable choice.

timvermeulen · 2022-08-04T11:07:14Z

library/core/src/iter/adapters/zip.rs

+        #[inline]
+        default fn fold<T, F>(self, init: T, mut f: F) -> T
+        where
+            F: FnMut(T, Self::Item) -> T,
+        {
+            let mut a = self.a;
+            let mut b = self.b;
+
+            let acc = a.try_fold(init, move |acc, x| match b.next() {
+                Some(y) => Ok(f(acc, (x, y))),
+                None => Err(acc),
+            });
+
+            match acc {
+                Ok(exhausted_a) => exhausted_a,
+                Err(exhausted_b) => exhausted_b,
+            }
+        }


When a fold implementation can't be implemented on top of an inner fold, it's common in libcore to delegate to self's try_fold instead in order to avoid some duplicate logic. See e.g. Take::fold. TakeWhile, MapWhile, and Scan do the same thing. You could see if it affects benchmark results in any way, but I'd hope that it won't.

SIde note: closures in the implementation of iterator adapters are typically not written outright, but returned from a nested function, see #62429.

timvermeulen · 2022-08-04T11:24:36Z

library/core/src/iter/adapters/zip.rs

+            let acc = a.try_fold(init, move |acc, x| match b.next() {
+                Some(y) => {
+                    let result = f(acc, (x, y));
+                    match result.branch() {
+                        ControlFlow::Continue(continue_) => Ok(continue_),
+                        ControlFlow::Break(break_) => Err(R::from_residual(break_)),
+                    }
+                }
+                None => Err(R::from_output(acc)),
+            });


This pattern of either continuing with an output value or breaking with an output value or a residual value can be expressed a lot simpler by the ControlFlow enum, and specifically its (non-public) from_try and into_try methods. Take::try_fold illustrates this really well.

timvermeulen · 2022-08-04T11:28:40Z

library/core/src/iter/adapters/zip.rs

+        #[inline]
+        default fn rfold<T, F>(mut self, init: T, mut f: F) -> T
+        where
+            A: DoubleEndedIterator + ExactSizeIterator,
+            B: DoubleEndedIterator + ExactSizeIterator,
+            F: FnMut(T, Self::Item) -> T,
+        {
+            self.adjust_back();
+            let mut a = self.a;
+            let mut b = self.b;
+
+            let acc = a.try_rfold(init, move |acc, x| match b.next_back() {
+                Some(y) => Ok(f(acc, (x, y))),
+                None => Err(acc),
+            });
+
+            match acc {
+                Ok(exhausted_a) => exhausted_a,
+                Err(exhausted_b) => exhausted_b,
+            }
+        }


Here we know that the inner iterators have the same length (after adjusting), so it might be worth writing this in terms of self.a.rfold instead. We'd have to panic in case b.next_back returns None. I have not benchmarked this, but in theory rfold might lead to more efficient machine code, depending on the underlying iterator.

this gave me the idea of applying a similar optimization to fold. by checking if both iterators have the same length (which i assumed is a common case for zip), we can use a.fold instead of a.try_fold

Very cool! I didn't think about doing that with size_hint.

sarah-ek · 2022-08-04T16:10:11Z

thanks for the feedback! i've applied your suggestions to the code, and additionally made some further optimizations. (though they don't show a difference on the current zip benchmarks, which are somewhat limited)

the8472

Are there tests that target the new unsafe TrustedRandomAccess implementations? If not we need them because that particular specialization has a long history of unsoundness.

library/core/src/iter/adapters/zip.rs

- remove size_hint check in fold - add tests for new unsafe fold implementations

sarah-ek · 2022-08-14T01:19:13Z

i've applied the previous suggestions. i tried to rebase on the master branch to be able to get relevant comparisons from benchmarks. i hope i didn't mess up any git stuff

saethlin · 2022-12-28T02:55:48Z

I'm curious
@bors try @rust-timer queue

rust-timer · 2022-12-28T02:55:50Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-log-analyzer · 2022-12-28T02:59:59Z

The job dist-x86_64-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

   Compiling profiler_builtins v0.0.0 (/checkout/library/profiler_builtins)
[RUSTC-TIMING] build_script_build test:false 0.159
[RUSTC-TIMING] build_script_build test:false 0.199
[RUSTC-TIMING] build_script_build test:false 0.432
error[E0599]: no function or associated item named `wrap_mut_2` found for struct `NeverShortCircuit` in the current scope
    |
    |
251 | / macro_rules! zip_impl_general_defaults {
252 | |     () => {
253 | |         default fn new(a: A, b: B) -> Self {
254 | |             Zip {
...   |
298 | |             ZipImpl::try_fold(&mut self, init, NeverShortCircuit::wrap_mut_2(f)).0
    | |                                                                   |
    | |                                                                   |
    | |                                                                   function or associated item not found in `NeverShortCircuit<_>`
    | |                                                                   help: there is an associated function with a similar name: `wrap_mut_2_imp`
332 | |     };
333 | | }
333 | | }
    | |_- in this expansion of `zip_impl_general_defaults!`
...
344 |       zip_impl_general_defaults! {}
    |
   ::: library/core/src/ops/try_trait.rs:379:1
    |
    |
379 |   pub(crate) struct NeverShortCircuit<T>(pub T);
    |   -------------------------------------- function or associated item `wrap_mut_2` not found for this struct

error[E0599]: no function or associated item named `wrap_mut_2` found for struct `NeverShortCircuit` in the current scope
    |
    |
251 | / macro_rules! zip_impl_general_defaults {
252 | |     () => {
253 | |         default fn new(a: A, b: B) -> Self {
254 | |             Zip {
...   |
298 | |             ZipImpl::try_fold(&mut self, init, NeverShortCircuit::wrap_mut_2(f)).0
    | |                                                                   |
    | |                                                                   |
    | |                                                                   function or associated item not found in `NeverShortCircuit<_>`
    | |                                                                   help: there is an associated function with a similar name: `wrap_mut_2_imp`
332 | |     };
333 | | }
333 | | }
    | |_- in this expansion of `zip_impl_general_defaults!`
...
377 |       zip_impl_general_defaults! {}
    |
   ::: library/core/src/ops/try_trait.rs:379:1
    |
    |
379 |   pub(crate) struct NeverShortCircuit<T>(pub T);
    |   -------------------------------------- function or associated item `wrap_mut_2` not found for this struct
For more information about this error, try `rustc --explain E0599`.
[RUSTC-TIMING] core test:false 7.338
error: could not compile `core` due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

bors · 2022-12-28T03:00:03Z

💔 Test failed - checks-actions

saethlin · 2022-12-28T03:19:27Z

Whelp, that didn't work, but that also means it would have failed when approved anyway. @sarah-ek can you rebase this again and see if you can fix the compile error that appears?

JohnCSimon · 2023-01-29T06:55:15Z

@sarah-ek
ping from triage - can you post your status on this PR? There hasn't been an update in a few months. Thanks!

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

bors · 2023-02-13T13:39:43Z

☔ The latest upstream changes (presumably #107634) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC · 2023-05-15T06:25:51Z

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

rust-highfive assigned m-ou-se Aug 3, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 3, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 3, 2022

timvermeulen reviewed Aug 4, 2022

View reviewed changes

the8472 reviewed Aug 4, 2022

View reviewed changes

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

sarah-ek requested a review from the8472 August 10, 2022 13:53

the8472 reviewed Aug 11, 2022

View reviewed changes

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

scottmcm reviewed Aug 11, 2022

View reviewed changes

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

scottmcm reviewed Aug 11, 2022

View reviewed changes

library/core/src/iter/adapters/zip.rs Outdated Show resolved Hide resolved

sarah added 4 commits August 14, 2022 02:50

optimize implementation of Zip::fold and company

ede4616

apply review suggestions, optimize trusted random access impl

1d82bfc

apply review suggestions:

1b6e730

- remove size_hint check in fold - add tests for new unsafe fold implementations

apply review suggestions

8b87335

sarah-ek force-pushed the zip-internal-iteration branch from 2b57dd9 to 8b87335 Compare August 14, 2022 00:51

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 28, 2022

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2022

Dylan-DPC removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 22, 2023

Dylan-DPC closed this May 15, 2023

the8472 mentioned this pull request Jan 7, 2024

A more efficient slice comparison implementation for T: !BytewiseEq #116846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize implementation of Zip::fold and company #100124

optimize implementation of Zip::fold and company #100124

sarah-ek commented Aug 3, 2022

rust-highfive commented Aug 3, 2022

rustbot commented Aug 3, 2022

timvermeulen left a comment

timvermeulen Aug 4, 2022

timvermeulen Aug 4, 2022

timvermeulen Aug 4, 2022

sarah-ek Aug 4, 2022

timvermeulen Aug 4, 2022

sarah-ek commented Aug 4, 2022

the8472 left a comment

sarah-ek commented Aug 14, 2022

saethlin commented Dec 28, 2022

rust-timer commented Dec 28, 2022

rust-log-analyzer commented Dec 28, 2022

bors commented Dec 28, 2022

saethlin commented Dec 28, 2022

JohnCSimon commented Jan 29, 2023

bors commented Feb 13, 2023

Dylan-DPC commented May 15, 2023

optimize implementation of Zip::fold and company #100124

optimize implementation of Zip::fold and company #100124

Conversation

sarah-ek commented Aug 3, 2022

rust-highfive commented Aug 3, 2022

rustbot commented Aug 3, 2022

timvermeulen left a comment

Choose a reason for hiding this comment

timvermeulen Aug 4, 2022

Choose a reason for hiding this comment

timvermeulen Aug 4, 2022

Choose a reason for hiding this comment

timvermeulen Aug 4, 2022

Choose a reason for hiding this comment

sarah-ek Aug 4, 2022

Choose a reason for hiding this comment

timvermeulen Aug 4, 2022

Choose a reason for hiding this comment

sarah-ek commented Aug 4, 2022

the8472 left a comment

Choose a reason for hiding this comment

sarah-ek commented Aug 14, 2022

saethlin commented Dec 28, 2022

rust-timer commented Dec 28, 2022

rust-log-analyzer commented Dec 28, 2022

bors commented Dec 28, 2022

saethlin commented Dec 28, 2022

JohnCSimon commented Jan 29, 2023

bors commented Feb 13, 2023

Dylan-DPC commented May 15, 2023