Tracking the performance of `-Zchecksum-freshness` #14722

weihanglo · 2024-10-23T13:45:54Z

This issue is intended for tracking performance related discussion and benchmarks for #14136. #14136 itself is for overall progress reports.

Useful cargo script to summarize number and size of files got checksum'd

#!/usr/bin/env cargo
---cargo
[dependencies]
walkdir = "2"
---

use std::collections::HashMap;
use std::ffi::OsStr;
use std::fs::File;
use std::io::prelude::*;
use std::io::BufReader;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let walker = walkdir::WalkDir::new("target/debug/deps")
        .into_iter()
        .filter_entry(|e| e.file_type().is_dir() || e.path().extension() == Some(OsStr::new("d")));

    let mut map = HashMap::with_capacity(8192);

    for entry in walker {
        let entry = entry?;
        if entry.file_type().is_dir() {
            continue;
        }
        for line in BufReader::new(File::open(entry.path())?).lines() {
            let line = line?;
            let Some(rest) = line.strip_prefix("# checksum:") else {
                continue;
            };
            let mut parts = rest.splitn(3, ' ');
            let (_, size, path) = (parts.next(), parts.next().unwrap(), parts.next().unwrap());
            let size: u64 = size.strip_prefix("file_len:").unwrap().parse()?;
            if let Some(old_size) = map.insert(path.to_string(), size) {
                assert_eq!(old_size, size, "size doesn't match for {path}: {old_size} vs {size}");
            }
        }
    }

    let mut stdout = std::io::stdout().lock();
    writeln!(stdout, "number of files: {}", map.len())?;
    writeln!(stdout, "total file size: {}", map.values().sum::<u64>())?;

    Ok(())
}

weihanglo · 2024-10-23T13:51:37Z

^{Originally posted at #14136 (comment).}

Here is a result of a performance benchmark that mostly tests the rustc part.

It adds [email protected] as a dependency and runs cargo clean && cargo check.
rg 'checksum:blake3' target/debug/deps | wc -l shows 8313 matches. It means rustc roughly computes blake3 checksum 8313 times.
Checksum computations on Cargo side are mostly ignored because Cargo assumes registry are immutable and won't do either mtime or checksum check .

Result: On par if you have zero local code.

Command	Mean [s]	Min [s]	Max [s]	Relative
`cargo +nightly check --frozen`	52.986 ± 0.366	52.549	53.681	1.00
`cargo +nightly check --frozen -Zchecksum-freshness`	53.020 ± 0.315	52.442	53.480	1.00 ± 0.01

Benchmark details

Setup package

cargo new foo --lib
cd foo
cargo add aws-sdk-ec2

Benchmark script

hyperfine --warmup 1 \
  --setup "cargo fetch" \
  --min-runs 20 \
  --prepare "cargo clean" \
  --export-markdown result.md \
  "cargo +nightly check --frozen" \
  "cargo +nightly check --frozen -Zchecksum-freshness"

Platform information

cargo 1.82.0 (8f40fc59f 2024-08-21)
release: 1.82.0
commit-hash: 8f40fc59fb0c8df91c97405785197f3c630304ea
commit-date: 2024-08-21
host: x86_64-unknown-linux-gnu
libgit2: 1.8.1 (sys:0.19.0 vendored)
libcurl: 8.9.0-DEV (sys:0.4.74+curl-8.9.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Amazon Linux AMI 2.0.0 [64-bit] (r7a.24xlarge)

weihanglo · 2024-10-23T13:53:02Z

^{Originally posted at #14136 (comment).}

Here is another performance benchmark that focuses on cargo part.

Clone awslabs/aws-sdk-rust@505dab6 and cargo check -p aws-sdk-ec2 (with pre-filled cache)
rg 'checksum:blake3' target/debug/deps | wc -l shows 8312 matches. So basiclly matches the previous rustc-only benchmark result Tracking Issue for checksum freshness #14136 (comment)
Use the provided cargo script, the sum of file size got checksum'd is 63 MiB.
Different from the rustc-only benchmark, the entire aws-sdk-ec2 is a local package, so every rust file will be blake-checksum'd by Cargo to check its freshness.

Result: performance deteriorates.

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`cargo +nightly check -p aws-sdk-ec2 --frozen`	582.0 ± 4.3	576.3	590.3	1.00
`cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness`	698.1 ± 2.9	691.5	702.2	1.20 ± 0.01

Benchmark details

Setup package

git clone https://github.com/awslabs/aws-sdk-rust.git
cd aws-sdk-rust
git switch -d 505dab66bf0801ca743212678d47d6490d2beba9

Benchmark script

hyperfine --warmup 1 \
  --setup "cargo fetch && cargo clean" \
  --min-runs 20 \
  --export-markdown result.md \
  "cargo +nightly check -p aws-sdk-ec2 --frozen" \
  "cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness"

Platform information

cargo 1.82.0 (8f40fc59f 2024-08-21)
release: 1.82.0
commit-hash: 8f40fc59fb0c8df91c97405785197f3c630304ea
commit-date: 2024-08-21
host: x86_64-unknown-linux-gnu
libgit2: 1.8.1 (sys:0.19.0 vendored)
libcurl: 8.9.0-DEV (sys:0.4.74+curl-8.9.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Amazon Linux AMI 2.0.0 [64-bit] (r7a.24xlarge)

Additionally, here is the result of running the same benchmark but for the entire aws-sdk-rust workspace.

rg 'checksum:blake3' target/debug/deps | wc -l shows 192323 matches.
Use the provided cargo script, the sum of file size got checksum'd is 1316 MiB.

Command	Mean [s]	Min [s]	Max [s]	Relative
`cargo +nightly check --frozen`	2.070 ± 0.099	2.035	2.492	1.00
`cargo +nightly check --frozen -Zchecksum-freshness`	5.063 ± 0.010	5.046	5.079	2.45 ± 0.12

Important

Benchmarkes were done on r7a.24xlarge, which has 96vCPU and 768GiB RAM. It is expected that on a lower end machine the performance will be worse.

weihanglo · 2024-10-23T13:53:52Z

^{Originally posted by @Xaeroxe at #14136 (comment).}

Do the slower benchmarks appear to be I/O bound or CPU bound? If they are CPU bound, are all the cores saturated?

I knew going into this that performance would suffer. So unless we can improve these numbers a lot I'd be inclined to make this feature opt-in. Use it if you know you need it.

weihanglo · 2024-10-23T13:54:34Z

^{Originally posted by @epage at #14136 (comment).}

rg 'checksum:blake3' target/debug/deps | wc -l shows 8313 matches. It means rustc roughly computes blake3 checksum 8313 times.

Should we avoid passing this flag to rustc for immutable dependencies? I know your benchmark didn't show much of a difference but figure there might be other relevant cases

weihanglo · 2024-10-23T14:36:26Z

@epage My impression is that rustc already computes file checksums, just with a different algorithm. The performance might be less interesting.

After some discussions with @Muscraft, I can think of some relevant cases:

Zbinary-dep-depinfo would hash large binary files. This might be improved by using mmap or rayon update in blake3 crate on rustc side. Not sure if it leaves traces for Cargo to know it is binary files, though we can do that on demand when the file size exceeds a threshold.

As of 42f4143 in Cargo fingerprint, and rust-lang/rust@814df6e50 in rustc, we use 16KiB buffer size for blake3. In blake3 it uses a larger buffer size 64KiB via update_reader function. We might want to switch to it. Here is a list of large files we might want a larger buffer when digesting.

Top 20 largest files in rust-lang/cargo

 308K    CHANGELOG.md
 188K    tests/testsuite/build.rs
 184K    tests/testsuite/package.rs
 156K    tests/testsuite/build_script.rs
 136K    tests/testsuite/test.rs
 120K    tests/testsuite/publish.rs
 120K    tests/testsuite/metadata.rs
 116K    tests/testsuite/registry.rs
 112K    tests/testsuite/git.rs
 112K    src/cargo/util/context/mod.rs
 108K    src/cargo/util/toml/mod.rs
 108K    src/cargo/core/compiler/fingerprint/mod.rs
 100K    tests/testsuite/artifact_dep.rs
 92K     tests/testsuite/freshness_checksum.rs
 92K     src/doc/src/images/auth-level-acl.png
 92K     benches/workspaces/substrate.tgz
 88K     tests/testsuite/freshness.rs
 84K     tests/testsuite/patch.rs
 84K     tests/testsuite/install.rs
 84K     src/doc/src/reference/unstable.md

weihanglo · 2024-10-23T14:38:50Z

Originally posted by @Xaeroxe at #14136 (comment).

Do the slower benchmarks appear to be I/O bound or CPU bound? If they are CPU bound, are all the cores saturated?

I knew going into this that performance would suffer. So unless we can improve these numbers a lot I'd be inclined to make this feature opt-in. Use it if you know you need it.

It's more like I/O bound. @Muscraft has done the same benchmark on their 16 cores machine with a fancy SSD. the performance difference between the two was a bit smaller than this benchmark result #14722 (comment), even r7a.24xlarge is a way powerful machine in terms of cores and RAM.

Here are their benchmark results

hyperfine --warmup 25 \
  --setup "cargo fetch && cargo clean" \
  --min-runs 25 \
  --export-markdown result.md \
  "cargo +nightly check -p aws-sdk-ec2 --frozen" \
  "cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness"

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`cargo +nightly check -p aws-sdk-ec2 --frozen`	445.5 ± 8.1	430.5	461.8	1.00
`cargo +nightly check -p aws-sdk-ec2 --frozen -Zchecksum-freshness`	501.1 ± 10.3	477.8	518.9	1.12 ± 0.03

hyperfine --warmup 25 \
  --setup "cargo fetch && cargo clean" \
  --min-runs 25 \
  --export-markdown result-all.md \
  "cargo +nightly check --frozen" \
  "cargo +nightly check --frozen -Zchecksum-freshness"

Command	Mean [s]	Min [s]	Max [s]	Relative
`cargo +nightly check --frozen`	1.368 ± 0.027	1.328	1.402	1.00
`cargo +nightly check --frozen -Zchecksum-freshness`	2.748 ± 0.054	2.661	2.813	2.01 ± 0.06

Xaeroxe · 2024-10-23T15:29:25Z

I am curious how often "time to prove build is fresh" is relevant to cargo's user satisfaction. Are users more likely to be upset about an incorrect freshness check, or a long duration to prove the workspace is fresh?

I'll admit that the mtime check gets it right most of the time. However thinking from the perspective of a user I think I'd rather wait a little longer and be confident that the check was correct. However I'm biased. I'm curious about other people's thoughts on it.

weihanglo · 2024-10-23T20:36:20Z

Some more benchmark results with a permutation of -Zchecksum-freshness * -Zbinary-dep-depinfo, using the same script from #14722 (comment):

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`check -p aws-sdk-ec2 --frozen`	584.2 ± 2.8	580.5	591.9	1.00
`check -p aws-sdk-ec2 --frozen -Zbinary-dep-depinfo`	588.2 ± 2.6	582.8	593.4	1.01 ± 0.01
`check -p aws-sdk-ec2 --frozen -Zchecksum-freshness`	699.3 ± 3.5	692.0	707.4	1.20 ± 0.01
`check -p aws-sdk-ec2 --frozen -Zchecksum-freshness -Zbinary-dep-depinfo`	773.4 ± 4.2	767.2	786.0	1.32 ± 0.01

number of files: 8435
total file size: 259031218 (~247 MiB)

Command	Mean [s]	Min [s]	Max [s]	Relative
`check --frozen`	2.072 ± 0.061	2.042	2.258	1.00
`check --frozen -Zbinary-dep-depinfo`	2.630 ± 0.010	2.611	2.645	1.27 ± 0.04
`check --frozen -Zchecksum-freshness`	5.096 ± 0.076	5.052	5.275	2.46 ± 0.08
`check --frozen -Zchecksum-freshness -Zbinary-dep-depinfo`	5.854 ± 0.017	5.825	5.906	2.82 ± 0.08

number of files: 192533
total file size: 1664996265 (~1587 MiB)

I found the result is fairly stable btw.

weihanglo · 2024-10-23T20:40:24Z

I am curious how often "time to prove build is fresh" is relevant to cargo's user satisfaction. Are users more likely to be upset about an incorrect freshness check, or a long duration to prove the workspace is fresh?

@Xaeroxe Personally I am with you. In this case correctness is more important to me than a few seconds lag for no-op builds. Though there are tools heavily calling cargo check and it's better to control the perf regression in a reasonable level.

bjorn3 · 2024-10-23T20:48:24Z

Would it be possible to have a mode where the checksums are computed only when mtime shows that the file is touched to avoid unnecessary recompiles when nothing actually changed? This would still have the same correctness issues as mtime, but speed up some cases that behave badly with mtime. For example in cg_clif's test suite this would be really useful to avoid unnecessarily rebuilding crates whose sources are copied and patched before building.

scottlamb · 2024-10-23T20:53:58Z

The main reason I'm excited about this is because I'm running on CI, and all my mtimes change with each run, so my valid cache contents are never used. I think this is an extremely common problem.

If folks decide the overhead of always doing the checksum every time slows down the nothing-to-do case too much, please consider defaulting to an alternative scheme:

do the existing mtime/size/whatever check; if it says the file is unchanged, it's unchanged.
if it says the file is changed, look more closely with the checksum. There's really no harm in this; the IO is likely warming the cache for later rather than doubling the cost. And the CPU used for blake3 is insignificant compared to the actual compile.

weihanglo · 2024-10-23T23:30:22Z

and all my mtimes change with each run, so my valid cache contents are never used.

The current implementation probably doesn't really help if your package have build scripts involved in your dependency graph. This remains unresolved (I may work on it if I find some free time, or someone else).

Would it be possible to have a mode where the checksums are computed only when mtime shows that the file is touched to avoid unnecessary recompiles when nothing actually changed?

Not sure about this. Mtime detection works well for most of the time, but fails some corner cases, either excessive rebuilds or not rebuild at all. And yes Cargo could potentially provide a config saying you trust you system mtime and doesn't ever compute checksums. I think that is kinda the last resort. Personally I'd like to drop mtime entirely 😆.

do the existing mtime/size/whatever check; if it says the file is unchanged, it's unchanged.

Yes in the first implementation Cargo checks size as an optimization, though in a different (but more reliable) way — if its size changed, it reports the file dirty and don't bother computing checksums.

cargo/src/cargo/core/compiler/fingerprint/mod.rs

Lines 2023 to 2029 in edd36eb

    
           if current_file_len != file_len { 
        
               return Some(StaleItem::FileSizeChanged { 
        
                   path: path.to_path_buf(), 
        
                   new_size: current_file_len, 
        
                   old_size: file_len, 
        
               }); 
        
           }

Xaeroxe · 2024-10-25T17:57:17Z

Build scripts

I intend to see this to completion and will do that once build-rs is properly adopted.

weihanglo added Performance Gotta go fast! Z-checksum-freshness Nightly: rebuild detection on file checksum instead of mtime labels Oct 23, 2024

weihanglo mentioned this issue Oct 23, 2024

Tracking Issue for checksum freshness #14136

Open

4 tasks

weihanglo mentioned this issue Oct 23, 2024

initial version of checksum based freshness #14137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking the performance of `-Zchecksum-freshness` #14722

Tracking the performance of `-Zchecksum-freshness` #14722

weihanglo commented Oct 23, 2024 •

edited

Loading

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024 •

edited

Loading

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024 •

edited

Loading

Xaeroxe commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

bjorn3 commented Oct 23, 2024

scottlamb commented Oct 23, 2024

weihanglo commented Oct 23, 2024

Xaeroxe commented Oct 25, 2024

Tracking the performance of -Zchecksum-freshness #14722

Tracking the performance of -Zchecksum-freshness #14722

Comments

weihanglo commented Oct 23, 2024 • edited Loading

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024 • edited Loading

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024 • edited Loading

Xaeroxe commented Oct 23, 2024

weihanglo commented Oct 23, 2024

weihanglo commented Oct 23, 2024

bjorn3 commented Oct 23, 2024

scottlamb commented Oct 23, 2024

weihanglo commented Oct 23, 2024

Xaeroxe commented Oct 25, 2024

Tracking the performance of `-Zchecksum-freshness` #14722

Tracking the performance of `-Zchecksum-freshness` #14722

weihanglo commented Oct 23, 2024 •

edited

Loading

weihanglo commented Oct 23, 2024 •

edited

Loading

weihanglo commented Oct 23, 2024 •

edited

Loading