Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change slice::to_vec to not use extend_from_slice #79186

Merged
merged 1 commit into from
Nov 23, 2020

Conversation

JulianKnodt
Copy link
Contributor

@JulianKnodt JulianKnodt commented Nov 18, 2020

I saw this Zulip thread, and didn't see any update from it, so I thought I'd try to fix it. This converts to_vec to no longer use extend_from_slice, but relies on knowing that the allocated capacity is the same size as the input.

Godbolt new v1
Godbolt new v2 w/ drop guard
Godbolt old version

After some amount of iteration, there are now two specializations for to_vec, one for Copy types that use memcpy, and one for clone types which is the original from this PR.

This is then used inside of impl<T: Clone> FromIterator<Iter::Slice<T>> for Vec<T> which is essentially equivalent to &[T] -> Vec<T>, instead of previous specialization of the extend function. This is because extend has to reason more about existing capacity by calling reserve on an existing vec, and thus produces worse asm.

Downsides: This allocates the exact capacity, so I think if many items are added to this Vec after, it might need to allocate whereas extending may not. I also noticed the number of faults went up in the benchmarks, but not sure where from exactly.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 18, 2020
@JulianKnodt JulianKnodt force-pushed the str_from branch 3 times, most recently from 3297db7 to e09b5a0 Compare November 19, 2020 00:41
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extend_from_slice was previously (presumably) optimized to e.g. perform a single memcpy via specialization on T: Copy, is that property preserved here? Can we add a codegen test (src/test/codegen, see https://www.llvm.org/docs/CommandGuide/FileCheck.html for docs on syntax or examples in that dir) to show that we get expected behavior here perhaps (for the T: Copy case at least).

library/alloc/src/slice.rs Outdated Show resolved Hide resolved
@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Nov 19, 2020

While there's no codegen test yet for that property, from the godbolt link which uses a u8, it looks to be the case that that is preserved.

It might take a bit to add a test since compiling on my machine is slow

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Nov 19, 2020

FWIW the &str -> String case I was able to better optimize (edit: at least at a "asm looks better" level) than what this PR produces with https://rust.godbolt.org/z/WP7WdK, but I'm not sure if we can e.g. put that directly in a From impl. This PR may still make sense even without that.

@m-ou-se -- did you perhaps have context here beyond the Zulip conversation? Trying to figure out if there's other cases besides &str -> String that we care about here.

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Nov 19, 2020

Ah I think that what you have is certainly better for the case of &str -> String, and I agree that it won't fit in the impl From. I do believe this PR still has benefits since it optimizes the impl which is probably used elsewhere, so it might be interesting to do a perf run

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 19, 2020

⌛ Trying commit 97ca104b566e8b60191bd40c1103694f1e8cba4f with merge c163a6ecb293f84742f6802fe80a1b745c3e1b4c...

@bors
Copy link
Contributor

bors commented Nov 19, 2020

☀️ Try build successful - checks-actions
Build commit: c163a6ecb293f84742f6802fe80a1b745c3e1b4c (c163a6ecb293f84742f6802fe80a1b745c3e1b4c)

@rust-timer
Copy link
Collaborator

Queued c163a6ecb293f84742f6802fe80a1b745c3e1b4c with parent 8256379, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (c163a6ecb293f84742f6802fe80a1b745c3e1b4c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Nov 19, 2020

Interesting, mixed with mostly good results, I've also added an additional specialization since it appears that that on top the new version should hopefully be better, as from prior comments the reason it slowed down was because of missing Copy specialization.

@JulianKnodt JulianKnodt force-pushed the str_from branch 4 times, most recently from bd79187 to 7e9f777 Compare November 19, 2020 08:07
@JulianKnodt
Copy link
Contributor Author

It might be a good idea to have another perf run to see if the additional changes helped or regressed anything.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 19, 2020

extend_from_slice already pre-allocates the right amount of space. I think you can just remove the with_capacity and get the same efficient result:

-        let mut vec = Vec::with_capacity(s.len());
+        let mut vec = Vec::new();
         vec.extend_from_slice(s);
         vec

https://rust.godbolt.org/z/x1W9fr

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Nov 19, 2020

Hm, I agree that that is much better than the implementation provided with the drop guard, but I think Simulacrum's impl has fewer instructions when it comes to string types.

I also looked at a comparison to the implementation provided with larger types, and it appears just removing with_capacity generates larger asm than the provided impl. Maybe this means there's an underlying problem in extend_from_slice that should be fixed there rather than here.

https://rust.godbolt.org/z/xP9Wrr
https://rust.godbolt.org/z/zEP4x8

@m-ou-se
Copy link
Member

m-ou-se commented Nov 19, 2020

More assembly instructions doesn't have to be a bad thing. It's optimizing for speed mostly. It looks like it just unrolled the copy loop better in the extend_from_slice case. It doesn't call realloc, which was the main issue.

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Nov 19, 2020

Mm that makes sense. I wonder if I could get the drop guard version to unroll some, because it seems to do less work outside of the loop, so taking the best of both worlds

@the8472
Copy link
Member

the8472 commented Nov 19, 2020

Vec's FromIterator has a specialization for T: Copy slices. To avoid code duplication either [T]::to_vec()'s Copy case should just use the iterator or the iterator specialization should delegate to to_vec, whichever is faster.

The specialization:

rust/library/alloc/src/vec.rs

Lines 2287 to 2299 in fe98231

impl<'a, T: 'a> SpecFromIter<&'a T, slice::Iter<'a, T>> for Vec<T>
where
T: Copy,
{
// reuses the extend specialization for T: Copy
fn from_iter(iterator: slice::Iter<'a, T>) -> Self {
let mut vec = Vec::new();
// must delegate to spec_extend() since extend() itself delegates
// to spec_from for empty Vecs
vec.spec_extend(iterator);
vec
}
}

which ultimately should (through some unnecessary indirections) delegate to

rust/library/alloc/src/vec.rs

Lines 1266 to 1274 in fe98231

/// Appends elements to `Self` from other buffer.
#[inline]
unsafe fn append_elements(&mut self, other: *const [T]) {
let count = unsafe { (*other).len() };
self.reserve(count);
let len = self.len();
unsafe { ptr::copy_nonoverlapping(other as *const T, self.as_mut_ptr().add(len), count) };
self.len += count;
}

@JulianKnodt
Copy link
Contributor Author

I would think to_vec would be better in the end, because it's called while creating the vec so it specifies the end length and capacity, whereas this delegates to appending elements and has no prior knowledge of the resulting capacity/length.

@rust-timer
Copy link
Collaborator

Queued da403f94eb6c20c344e2e0a7fb61f0a2c940a930 with parent 20328b5, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (da403f94eb6c20c344e2e0a7fb61f0a2c940a930): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@JulianKnodt
Copy link
Contributor Author

Thank you! I should've rebased more carefully to keep it clear which was which.

@Mark-Simulacrum
Copy link
Member

Please also update the description of this PR, and feel free to keep commits squashed. I'm not sure about the numerous perf runs here either, if there's some comparison being done or whatever I didn't catch that :) The last perf run looks pretty excellent though!

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 22, 2020
This also required adding a loop guard in case clone panics

Add specialization for copy

There is a better version for copy, so I've added specialization for that function
and hopefully that should speed it up even more.

Switch FromIter<slice::Iter> to use `to_vec`

Test different unrolling version for to_vec

Revert to impl

From benchmarking, it appears this version is faster
@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 22, 2020
@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

Let's double check there are no regressions from the Copy/Clone switch.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 23, 2020

⌛ Trying commit a991558 with merge 52c91095e4d4144807832f6fb658bea83f707c50...

@bors
Copy link
Contributor

bors commented Nov 23, 2020

☀️ Try build successful - checks-actions
Build commit: 52c91095e4d4144807832f6fb658bea83f707c50 (52c91095e4d4144807832f6fb658bea83f707c50)

@rust-timer
Copy link
Collaborator

Queued 52c91095e4d4144807832f6fb658bea83f707c50 with parent a0d664b, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (52c91095e4d4144807832f6fb658bea83f707c50): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@Mark-Simulacrum
Copy link
Member

@bors r+

Ok, latest perf run looks good to me.

@bors
Copy link
Contributor

bors commented Nov 23, 2020

📌 Commit a991558 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 23, 2020
@bors
Copy link
Contributor

bors commented Nov 23, 2020

⌛ Testing commit a991558 with merge 40cf721...

@bors
Copy link
Contributor

bors commented Nov 23, 2020

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 40cf721 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 23, 2020
@bors bors merged commit 40cf721 into rust-lang:master Nov 23, 2020
@rustbot rustbot added this to the 1.50.0 milestone Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants