Skip to content

Commit

Permalink
feat(perf): Flamegraphs for test program execution benchmarks (#6253)
Browse files Browse the repository at this point in the history
# Description

## Problem\*

Resolves #6186 

## Summary\*

* Breaks up the `nargo_cli/benches/criterion.rs` into two steps: 1) a
compilation step which happens only once and 2) the execution of the
already compiled program.
* Limits the code in the benchmark loop to be only the circuit
execution, excluding the parsing of the program and its inputs and the
saving of results as the execute command would do.

These are the timings of just the circuit execution:
```text
eddsa_execute           time:   [111.13 ms 111.19 ms 111.25 ms]
regression_execute      time:   [151.27 µs 151.84 µs 152.66 µs]
struct_execute          time:   [602.39 ns 606.05 ns 609.77 ns]
```

The flame graphs look as follows (you can right click and use the "Save
Linked File As..." to download it and open it in a browser to make it
interactive):
* `eddsa`:
![eddsa](https://github.com/user-attachments/assets/c9f35961-65d9-4ac9-b2a6-1d50d14a9adc)
* `regression`:
![regression](https://github.com/user-attachments/assets/5664ce3a-eb6e-4fe8-a832-0ae539c99881)
* `struct`:
![struct](https://github.com/user-attachments/assets/15ebab47-1d52-4152-8d32-88f124fda525)


To generate them run the following commands:

```shell
./scripts/benchmark_start.sh
cargo bench -p nargo_cli --bench criterion -- --profile-time=30
./scripts/benchmark_stop.sh
```

## Additional Context

The problem with the current `nargo_cli/benches/criterion.rs` was that
it executed the `nargo execute --program-dir ... --force` command as a
sub-process, and the profiler that creates the Flamegraph only sampled
the criterion executor itself, not what the subprocess was doing.

My first idea to get the flame graphs to include the actual execution
was to turn `nargo_cli` into a library _and_ a binary
(af19dfc),
so that we can import the commands into the benchmark and run them in
the same process. This resulted in the following flame graphs:

* `eddsa`:
![eddsa](https://github.com/user-attachments/assets/e214653e-c6e3-4614-b763-b35694eeaec8)
* `regression`:
![regression](https://github.com/user-attachments/assets/ade1ef1a-1a1b-4ca4-9c09-62693551a8b0)
* `struct`:
![struct](https://github.com/user-attachments/assets/72838e1c-7c6b-4a0d-8040-acd335007463)

These include the entire `ExecuteCommand::run` command, which
unfortunately results in the flame graph dominated by parsing logic,
reading inputs and writing outputs to files. In fact in all but the
`eddsa` example the `execute_circuit` was so fast I can't even find it
on the flamegraph.

These are the timings for the command execution per test program:
```text
eddsa_execute           time:   [984.12 ms 988.74 ms 993.95 ms]
regression_execute      time:   [71.240 ms 71.625 ms 71.957 ms]
struct_execute          time:   [68.447 ms 69.414 ms 70.438 ms]
```

For this reason I rolled back the library+binary change and broke up the
execution further by parsing the program into a `CompiledProgram` once
and calling `execute_program` in the benchmark loop without saving the
results.

I copied two utility functions to read the program artefact and the
input map. Not sure these are worth moving around, even the errors they
raise a CLI specific.

## Documentation\*

Check one:
- [ ] No documentation needed.
- [x] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [x] I have tested the changes locally.
- [ ] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
  • Loading branch information
aakoshh authored Oct 10, 2024
1 parent 70cbeb4 commit c186791
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 21 deletions.
13 changes: 13 additions & 0 deletions tooling/nargo_cli/benches/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Benchmarks

To generate flamegraphs for the execution of a specific program, execute the following commands:

```shell
./scripts/benchmark_start.sh
cargo bench -p nargo_cli --bench criterion <test-program-name> -- --profile-time=30
./scripts/benchmark_stop.sh
```

Afterwards the flamegraph is available at `target/criterion/<test-program-name>_execute/profile/flamegraph.svg`

Alternatively, omit `<test-program-name>` to run profiling on all test programs defined in [utils.rs](./utils.rs).
161 changes: 141 additions & 20 deletions tooling/nargo_cli/benches/criterion.rs
Original file line number Diff line number Diff line change
@@ -1,33 +1,154 @@
//! Select representative tests to bench with criterion
use acvm::{acir::native_types::WitnessMap, FieldElement};
use assert_cmd::prelude::{CommandCargoExt, OutputAssertExt};
use criterion::{criterion_group, criterion_main, Criterion};

use paste::paste;
use noirc_abi::{
input_parser::{Format, InputValue},
Abi, InputMap,
};
use noirc_artifacts::program::ProgramArtifact;
use noirc_driver::CompiledProgram;
use pprof::criterion::{Output, PProfProfiler};
use std::hint::black_box;
use std::path::Path;
use std::{cell::RefCell, collections::BTreeMap};
use std::{process::Command, time::Duration};

include!("./utils.rs");

macro_rules! criterion_command {
($command_name:tt, $command_string:expr) => {
paste! {
fn [<criterion_selected_tests_ $command_name>](c: &mut Criterion) {
let test_program_dirs = get_selected_tests();
for test_program_dir in test_program_dirs {
let mut cmd = Command::cargo_bin("nargo").unwrap();
cmd.arg("--program-dir").arg(&test_program_dir);
cmd.arg($command_string);
cmd.arg("--force");

let benchmark_name = format!("{}_{}", test_program_dir.file_name().unwrap().to_str().unwrap(), $command_string);
c.bench_function(&benchmark_name, |b| {
b.iter(|| cmd.assert().success())
});
}
}
/// Compile the test program in a sub-process
fn compile_program(test_program_dir: &Path) {
let mut cmd = Command::cargo_bin("nargo").unwrap();
cmd.arg("--program-dir").arg(test_program_dir);
cmd.arg("compile");
cmd.arg("--force");
cmd.assert().success();
}

/// Read the bytecode(s) of the program(s) from the compilation artifacts
/// from all the binary packages. Pair them up with their respective input.
///
/// Based on `ExecuteCommand::run`.
fn read_compiled_programs_and_inputs(
dir: &Path,
) -> Vec<(CompiledProgram, WitnessMap<FieldElement>)> {
let toml_path = nargo_toml::get_package_manifest(dir).expect("failed to read manifest");
let workspace = nargo_toml::resolve_workspace_from_toml(
&toml_path,
nargo_toml::PackageSelection::All,
Some(noirc_driver::NOIR_ARTIFACT_VERSION_STRING.to_string()),
)
.expect("failed to resolve workspace");

let mut programs = Vec::new();
let binary_packages = workspace.into_iter().filter(|package| package.is_binary());

for package in binary_packages {
let program_artifact_path = workspace.package_build_path(package);
let program: CompiledProgram = read_program_from_file(&program_artifact_path).into();

let (inputs, _) = read_inputs_from_file(
&package.root_dir,
nargo::constants::PROVER_INPUT_FILE,
Format::Toml,
&program.abi,
);

let initial_witness =
program.abi.encode(&inputs, None).expect("failed to encode input witness");

programs.push((program, initial_witness));
}
programs
}

/// Read the bytecode and ABI from the compilation output
fn read_program_from_file(circuit_path: &Path) -> ProgramArtifact {
let file_path = circuit_path.with_extension("json");
let input_string = std::fs::read(file_path).expect("failed to read artifact file");
serde_json::from_slice(&input_string).expect("failed to deserialize artifact")
}

/// Read the inputs from Prover.toml
fn read_inputs_from_file(
path: &Path,
file_name: &str,
format: Format,
abi: &Abi,
) -> (InputMap, Option<InputValue>) {
if abi.is_empty() {
return (BTreeMap::new(), None);
}

let file_path = path.join(file_name).with_extension(format.ext());
if !file_path.exists() {
if abi.parameters.is_empty() {
return (BTreeMap::new(), None);
} else {
panic!("input file doesn't exist: {}", file_path.display());
}
};
}

let input_string = std::fs::read_to_string(file_path).expect("failed to read input file");
let mut input_map = format.parse(&input_string, abi).expect("failed to parse input");
let return_value = input_map.remove(noirc_abi::MAIN_RETURN_NAME);

(input_map, return_value)
}

/// Use the nargo CLI to compile a test program, then benchmark its execution
/// by executing the command directly from the benchmark, so that we can have
/// meaningful flamegraphs about the ACVM.
fn criterion_selected_tests_execution(c: &mut Criterion) {
for test_program_dir in get_selected_tests() {
let benchmark_name =
format!("{}_execute", test_program_dir.file_name().unwrap().to_str().unwrap());

// The program and its inputs will be populated in the first setup.
let artifacts = RefCell::new(None);

let mut foreign_call_executor =
nargo::ops::DefaultForeignCallExecutor::new(false, None, None, None);

c.bench_function(&benchmark_name, |b| {
b.iter_batched(
|| {
// Setup will be called many times to set a batch (which we don't use),
// but we can compile it only once, and then the executions will not have to do so.
// It is done as a setup so that we only compile the test programs that we filter for.
if artifacts.borrow().is_some() {
return;
}
compile_program(&test_program_dir);
// Parse the artifacts for use in the benchmark routine
let programs = read_compiled_programs_and_inputs(&test_program_dir);
// Warn, but don't stop, if we haven't found any binary packages.
if programs.is_empty() {
eprintln!("\nWARNING: There is nothing to benchmark in {benchmark_name}");
}
// Store them for execution
artifacts.replace(Some(programs));
},
|_| {
let artifacts = artifacts.borrow();
let artifacts = artifacts.as_ref().expect("setup compiled them");

for (program, initial_witness) in artifacts {
let _witness_stack = black_box(nargo::ops::execute_program(
black_box(&program.program),
black_box(initial_witness.clone()),
&bn254_blackbox_solver::Bn254BlackBoxSolver,
&mut foreign_call_executor,
))
.expect("failed to execute program");
}
},
criterion::BatchSize::SmallInput,
);
});
}
}
criterion_command!(execution, "execute");

criterion_group! {
name = execution_benches;
Expand Down
9 changes: 8 additions & 1 deletion tooling/nargo_cli/benches/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ fn get_selected_tests() -> Vec<PathBuf> {
Ok(dir) => PathBuf::from(dir),
Err(_) => std::env::current_dir().unwrap(),
};

let test_dir = manifest_dir
.parent()
.unwrap()
Expand All @@ -15,5 +16,11 @@ fn get_selected_tests() -> Vec<PathBuf> {
.join("execution_success");

let selected_tests = vec!["struct", "eddsa", "regression"];
selected_tests.into_iter().map(|t| test_dir.join(t)).collect()
let mut selected_tests =
selected_tests.into_iter().map(|t| test_dir.join(t)).collect::<Vec<_>>();

let test_dir = test_dir.parent().unwrap().join("benchmarks");
selected_tests.extend(test_dir.read_dir().unwrap().filter_map(|e| e.ok()).map(|e| e.path()));

selected_tests
}

0 comments on commit c186791

Please sign in to comment.