Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much higher compile times with -Z threads=8 than with -Z threads=1 #117755

Open
Shnatsel opened this issue Nov 9, 2023 · 10 comments
Open

Much higher compile times with -Z threads=8 than with -Z threads=1 #117755

Shnatsel opened this issue Nov 9, 2023 · 10 comments
Labels
A-parallel-queries Area: Parallel query execution C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler

Comments

@Shnatsel
Copy link
Member

Shnatsel commented Nov 9, 2023

When compiling cargo audit from git on commit b6baecc0ea4e2d115e4e10b10c2196b33d42c1da, I'm seeing the project build in 19 seconds on my machine with -Z threads=1 but it takes 25 seconds with -Z threads=8.

I am using a 6-core desktop CPU, so no chance of NUMA issues. I'm also seeing 25s compile times with -Z threads=6, matching the CPU core count.

I've captured Samply profiles but they are hard to make sense of due to the sheer number of threads (4000 for a single thread, 6000 for multiple threads). They are too big for sharing via firefox.dev, so please find them attached:
profile-1-thread.json.gz
profile-8-threads.json.gz

Meta

rustc --version --verbose:

rustc 1.75.0-nightly (fdaaaf9f9 2023-11-08)
binary: rustc
commit-hash: fdaaaf9f923281ab98b865259aa40fbf93d72c7a
commit-date: 2023-11-08
host: x86_64-unknown-linux-gnu
release: 1.75.0-nightly
LLVM version: 17.0.4
@Shnatsel Shnatsel added the C-bug Category: This is a bug. label Nov 9, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 9, 2023
@bjorn3 bjorn3 added the A-parallel-queries Area: Parallel query execution label Nov 9, 2023
@Shnatsel
Copy link
Member Author

Shnatsel commented Nov 9, 2023

@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler I-compiletime Issue: Problems and improvements with respect to compile times. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 9, 2023
@mjguzik
Copy link
Contributor

mjguzik commented Nov 10, 2023

I confirm the issue, compiling the same thing:

cargo build -r 606.59s user 33.28s system 1596% cpu 40.078 total
RUSTFLAGS="-Z threads=8" cargo build -r 771.61s user 49.88s system 1660% cpu 49.476 total

Test system has 24 cores.

rustc 1.75.0-nightly (0f44eb3 2023-11-09)
binary: rustc
commit-hash: 0f44eb3
commit-date: 2023-11-09
host: x86_64-unknown-linux-gnu
release: 1.75.0-nightly
LLVM version: 17.0.4

@Kobzol
Copy link
Contributor

Kobzol commented Nov 10, 2023

Do you have the same flags set for both builds? using RUSTFLAGS overrides config.toml, did you perhaps have some options there?

@mjguzik
Copy link
Contributor

mjguzik commented Nov 10, 2023

In my case this is a fresh clone, 0 local changes.

@Shnatsel
Copy link
Member Author

Shnatsel commented Nov 10, 2023

I did have mold configured as the linker in config.toml. After removing the config the parallel frontend is still slower, although by not quite as much.

Single thread: 23.29s
8 threads: 25.70s

That's still a 10% regression.

@Shnatsel
Copy link
Member Author

I've re-measured cargo build --timings and updated the post above to provide a correct baseline, without mold.

I still see syn v1.0.109 compilation time going from 2.3s to 7.4s, tokio v1.29.1 going up from 2.5s to 5.2s, aho-corasick v1.0.2 from 1.1s to 5.1s, and the compilation time of many other crates also increasing. See the full output of --timings for details.

@mjguzik
Copy link
Contributor

mjguzik commented Nov 10, 2023

I reran with RUSTFLAGS="-Z threads=1", got:

RUSTFLAGS="-Z threads=1" cargo build -r 591.51s user 30.31s system 1561% cpu 39.825 total

which is about the same as without the -Z flag.

@Shnatsel
Copy link
Member Author

It's curious that a system with a higher core count is seeing a greater regression: 39.825s to 49.476 is a nearly 20% increase in compilation time, compared to a 10% increase on my 6-core CPU.

@mjguzik
Copy link
Contributor

mjguzik commented Nov 10, 2023

It's not particularly curious, adding more threads to a case which suffers a scalability problem does tend to increase total run time. And I have more cores to exercise the problem at the same time.

I tried to get a differential flamegraph based on perf record output, but perf report ended up executing for almost 2h(!) before I killed it, boggled down in comunicating with addr2line which kept failing to resolve anything (it was making forward progress, just incredibly slowly and the result was useless anyway). Debian 12 for interested parties.

@bjorn3
Copy link
Member

bjorn3 commented Nov 10, 2023

Try perf report --no-inline. That will skip addr2line at the cost of not showing inlines functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parallel-queries Area: Parallel query execution C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler
Projects
None yet
Development

No branches or pull requests

6 participants