-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking the performance of -Zchecksum-freshness
#14722
Comments
Originally posted at #14136 (comment). Here is a result of a performance benchmark that mostly tests the rustc part.
Result: On par if you have zero local code.
Benchmark details
Setup package cargo new foo --lib
cd foo
cargo add aws-sdk-ec2 Benchmark script hyperfine --warmup 1 \
--setup "cargo fetch" \
--min-runs 20 \
--prepare "cargo clean" \
--export-markdown result.md \
"cargo +nightly check --frozen" \
"cargo +nightly check --frozen -Zchecksum-freshness" Platform information
|
Originally posted at #14136 (comment). Here is another performance benchmark that focuses on cargo part.
Result: performance deteriorates.
Benchmark details
Setup package git clone https://github.com/awslabs/aws-sdk-rust.git
cd aws-sdk-rust
git switch -d 505dab66bf0801ca743212678d47d6490d2beba9 Benchmark script hyperfine --warmup 1 \
--setup "cargo fetch && cargo clean" \
--min-runs 20 \
--export-markdown result.md \
"cargo +nightly check -p aws-sdk-ec2 --frozen" \
"cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness" Platform information
Additionally, here is the result of running the same benchmark but for the entire
Important Benchmarkes were done on |
Originally posted by @Xaeroxe at #14136 (comment). Do the slower benchmarks appear to be I/O bound or CPU bound? If they are CPU bound, are all the cores saturated? I knew going into this that performance would suffer. So unless we can improve these numbers a lot I'd be inclined to make this feature opt-in. Use it if you know you need it. |
Originally posted by @epage at #14136 (comment).
Should we avoid passing this flag to rustc for immutable dependencies? I know your benchmark didn't show much of a difference but figure there might be other relevant cases |
@epage My impression is that rustc already computes file checksums, just with a different algorithm. The performance might be less interesting. After some discussions with @Muscraft, I can think of some relevant cases:
|
It's more like I/O bound. @Muscraft has done the same benchmark on their 16 cores machine with a fancy SSD. the performance difference between the two was a bit smaller than this benchmark result #14722 (comment), even Here are their benchmark results
hyperfine --warmup 25 \
--setup "cargo fetch && cargo clean" \
--min-runs 25 \
--export-markdown result.md \
"cargo +nightly check -p aws-sdk-ec2 --frozen" \
"cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness"
hyperfine --warmup 25 \
--setup "cargo fetch && cargo clean" \
--min-runs 25 \
--export-markdown result-all.md \
"cargo +nightly check --frozen" \
"cargo +nightly check --frozen -Zchecksum-freshness"
|
I am curious how often "time to prove build is fresh" is relevant to cargo's user satisfaction. Are users more likely to be upset about an incorrect freshness check, or a long duration to prove the workspace is fresh? I'll admit that the mtime check gets it right most of the time. However thinking from the perspective of a user I think I'd rather wait a little longer and be confident that the check was correct. However I'm biased. I'm curious about other people's thoughts on it. |
Some more benchmark results with a permutation of
I found the result is fairly stable btw. |
@Xaeroxe Personally I am with you. In this case correctness is more important to me than a few seconds lag for no-op builds. Though there are tools heavily calling |
Would it be possible to have a mode where the checksums are computed only when mtime shows that the file is touched to avoid unnecessary recompiles when nothing actually changed? This would still have the same correctness issues as mtime, but speed up some cases that behave badly with mtime. For example in cg_clif's test suite this would be really useful to avoid unnecessarily rebuilding crates whose sources are copied and patched before building. |
The main reason I'm excited about this is because I'm running on CI, and all my mtimes change with each run, so my valid cache contents are never used. I think this is an extremely common problem. If folks decide the overhead of always doing the checksum every time slows down the nothing-to-do case too much, please consider defaulting to an alternative scheme:
|
The current implementation probably doesn't really help if your package have build scripts involved in your dependency graph. This remains unresolved (I may work on it if I find some free time, or someone else).
Not sure about this. Mtime detection works well for most of the time, but fails some corner cases, either excessive rebuilds or not rebuild at all. And yes Cargo could potentially provide a config saying you trust you system mtime and doesn't ever compute checksums. I think that is kinda the last resort. Personally I'd like to drop mtime entirely 😆.
Yes in the first implementation Cargo checks size as an optimization, though in a different (but more reliable) way — if its size changed, it reports the file dirty and don't bother computing checksums. cargo/src/cargo/core/compiler/fingerprint/mod.rs Lines 2023 to 2029 in edd36eb
|
I intend to see this to completion and will do that once |
This issue is intended for tracking performance related discussion and benchmarks for #14136. #14136 itself is for overall progress reports.
Useful cargo script to summarize number and size of files got checksum'd
The text was updated successfully, but these errors were encountered: