Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criterion benchmarks hang with target-cpu=znver1 #65618

Closed
novacrazy opened this issue Oct 20, 2019 · 13 comments
Closed

Criterion benchmarks hang with target-cpu=znver1 #65618

novacrazy opened this issue Oct 20, 2019 · 13 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows-msvc Toolchain: MSVC, Operating system: Windows O-x86_64 Target: x86-64 processors (like x86_64-*) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@novacrazy
Copy link

novacrazy commented Oct 20, 2019

On my system, a Zen 1 AMD CPU, attempting to run Criterion benchmarks with RUSTFLAGS = "-C target-cpu=znver1" results in the benchmark hanging at the Analyzing stage for a few minutes before I killed the program.

Without the flags, it doesn't hang at all, as expected.

This affects stable, beta and nightly, and is probably related to LLVM 9.

Code that can hang on stable:

#[macro_use]
extern crate criterion;

use criterion::{Criterion, ParameterizedBenchmark};

fn criterion_benchmark(c: &mut Criterion) {
    c.bench(
        "bench",
        ParameterizedBenchmark::new(
            "old",
            |b, _| {
                b.iter(|| {});
            },
            vec![()],
        )
        .with_function("new", |b, _| {
            b.iter(|| {});
        }),
    );
}

criterion_group!(my_benches, criterion_benchmark);
criterion_main!(my_benches);

using cargo +stable bench

No special flags are present in the Cargo.toml

[package]
name = "bench-test"
version = "0.1.0"
authors = ["novacrazy <[email protected]>"]
edition = "2018"

[dev-dependencies]
criterion = "0.3.0"

[[bench]]
name = "benchmark_name"
harness = false
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 20, 2019
@jonas-schievink jonas-schievink added the C-bug Category: This is a bug. label Oct 20, 2019
@nagisa
Copy link
Member

nagisa commented Oct 20, 2019

Unable to reproduce on my 1st gen Ryzen 1700 with nightly:

...
   Compiling bench-test v0.1.0 (/tmp/crittest)
     Running `rustc --edition=2018 --crate-name bench_test bench_name.rs --error-format=json --json=diagnostic-rendered-ansi --emit=dep-info,link -C opt-level=3 --cfg test -C metadata=8c71f5ab08445b57 -C extra-filename=-8c71f5ab08445b57 --out-dir /tmp/crittest/target/release/deps -L dependency=/tmp/crittest/target/release/deps --extern bench_test=/tmp/crittest/target/release/deps/libbench_test-0f2595cb3983be9b.rlib --extern criterion=/tmp/crittest/target/release/deps/libcriterion-0d07609906f00047.rlib -Ctarget-cpu=znver1`
    Finished bench [optimized] target(s) in 2.26s
     Running `/tmp/crittest/target/release/deps/bench_test-8c71f5ab08445b57 --bench`
bench/old               time:   [0.0000 ps 0.0000 ps 0.0000 ps]                               
                        change: [-53.418% -6.0379% +85.594%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
bench/new               time:   [0.0000 ps 0.0000 ps 0.0000 ps]                               
                        change: [-52.325% -7.9366% +80.704%] (p = 0.82 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  6 (6.00%) high mild
  14 (14.00%) high severe

@nagisa
Copy link
Member

nagisa commented Oct 20, 2019

Unable to reproduce with stable compiler on 2400G or 2700 either.

@novacrazy
Copy link
Author

novacrazy commented Oct 20, 2019

I'm on a Threadripper 1950X. Could the number of threads (32 in this case) affect it?

Another thought, does any part of Criterion rely on atomic values? Could be something odd with how code is generated and interacts with the multi-chip Threadripper.

@mati865
Copy link
Contributor

mati865 commented Oct 20, 2019

Cannot reproduce on 2700X using nightly linux-gnu and windows-gnu with the command RUSTFLAGS="-C target-cpu=znver1" cargo bench.
So it's not related to OS, unlike #63959

@novacrazy
Copy link
Author

Ah, I should have specified I'm on the MSVC toolchain (nightly-x86_64-pc-windows-msvc).

I can confirm this does not appear on GNU toolchains, only MSVC.

@Speedy37
Copy link

Speedy37 commented Nov 18, 2019

Same here with an AMD R7 1800X and -C target-cpu=native

rustc 1.39.0 (4560ea788 2019-11-04)
binary: rustc
commit-hash: 4560ea788cb760f0a34127156c78e2552949f734
commit-date: 2019-11-04
host: x86_64-pc-windows-msvc
release: 1.39.0
LLVM version: 9.0

I took a quick look at llvm changes since 9.0.0, but nothing seems related.

llvm/llvm-project@llvmorg-9.0.0...release/9.x

At some point the producer variable is broken in rayon helper method (iter/plumbing/mod.rs).
And there is a lot of AVX2 instructions there.

producer value is : {range={start=0x0000000000000000 end=0x00000021646c7200 } } once the folding core is running. start and end should stay between 0 and 100 000.

Things are working fine with -C target-cpu=core-avx2 but iter/plumbing/mod.rs is not using AVX2 instructions anymore.

Here are the generated set of instructions of iter/plumbing/mod.rs

znver1.txt
core-avx2.txt

I hope I didn't focus on the wrong method.

@Speedy37
Copy link

Speedy37 commented Nov 19, 2019

Couldn't this be related to #64609 ?

@Speedy37
Copy link

I just run cargo bench on the beta and nightly channel and it works fine.

@mati865
Copy link
Contributor

mati865 commented Nov 19, 2019

I think it's different because this one unlike #64609 doesn't appear with windows-gnu target.

@Speedy37
Copy link

@mati865 I think you are right, and the part where I think there is a misscompilation is inlined.

How can we validate beta/nightly actually fixes this and it's not pure luck due to some minor change in how rust output things?
Just printing the "bad" value with println at the top of the helper function, made the program work with rustc stable.

@sanxiyn sanxiyn added the O-x86_64 Target: x86-64 processors (like x86_64-*) label Nov 19, 2019
@mati865
Copy link
Contributor

mati865 commented Nov 19, 2019

Just printing the "bad" value with println at the top of the helper function, made the program work with rustc stable.

Probably println prevented one of optimisations.

I don't know to move forward.
Maybe compare IR and ASM generated for mingw and msvc?

cc @eddyb

@Speedy37
Copy link

Well I already reduced the incriminated code to the helper function which use a set of both xmm and ymm.
Seeing what I saw with the assembly of your test case for #63959, I bet this issue is the same.

@novacrazy
Copy link
Author

This appears to have been caused by #63959, so it's fixed for me now. This could probably be closed now.

@sanxiyn sanxiyn closed this as completed Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-windows-msvc Toolchain: MSVC, Operating system: Windows O-x86_64 Target: x86-64 processors (like x86_64-*) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants