-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed loop optimization for &Vec
argument in .zip()
#36920
Comments
Note: a similar version with #[no_mangle]
pub fn dot_mut_s(xs: &mut [u32], ys: &mut [u32]) -> u32 {
let mut s = 0;
for (x, y) in xs.iter().zip(ys) {
s += (*x) * (*y);
}
s
} Maybe the problem is |
It may just be the double indirection in I've seen double indirection losses like that in closures too (references captured by reference vs move). Tries Yes! Double indirection is bad. No autovectorization: #[no_mangle]
pub fn dot_ref_s(xs: &&[u32], ys: &&[u32]) -> u32 {
let mut s = 0;
for (x, y) in xs.iter().zip(*ys) {
s += x * y;
}
s
} |
Inside ndarray I carefully use |
The problem indeed looks like a dubious null check surviving:
|
Fixing this is blocked on https://llvm.org/bugs/show_bug.cgi?id=30597. |
The LLVM PR fixes the |
cc rust-lang#36920 (in addition to LLVM PR30597, should fix the &&[i32] case)
rustc 0b2c356 + that llvm backport revision do fix all the test cases in this issue. Awesome. Now to try what more it fixes |
ndarray changes (selection of benchmarks, rest seemed to be unchanged). Benchmarking is a bit noisy, so I had to pick a few and try to verify them. comparing
I believe the improvements in mean_axis0, sum_axis0 and iter_sum_2d_transpose_by_row are legitimate and that iter_sum_2d_transpose_regular genuinely regresses for some reason. I don't think the rest can be verified to change in either direction. It's nice overall though., |
Any code changes in |
Contains backport for my patch, [InstCombine] Transform !range metadata to !nonnull when combining loads. Fixes rust-lang#36920.
I'm not really finding what the difference is. Do you think I should look in the unoptimized or optimized llvm IR? It's not a big concern since it's already the slow case for the iterator. |
The optimized IR. Could you post it? |
optimized IR here, there are two files "old" and "new", same versions as previous comment #36920 (comment) |
I haven't spotted the difference yet. There's a lot of noise for I think the autovectorization of the fast path, which is not taken for the transposed array. Instead it should run the general/slow case here that calls |
The IRs look identical up to α-renaming. |
LLVM: Add triple for Fuchsia Update subproject commit. Fixes #36920
This is now in the latest nightly, and all the four testcases produce the same nice code. Thanks a lot for working on it. |
This is a case where two almost identical functions have very different performance, because the slice version is autovectorized and the vector version is not. Note that specialization is involved, but both cases use exactly the same code path, for the slice iterator.
Using rustc 1.14.0-nightly (289f3a4 2016-09-29)
Note: The Vec vs slice difference is sometimes visible in actual inline use of .zip in code, depending on exact context.
The text was updated successfully, but these errors were encountered: