Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark overhead to frame-omni-bencher #5303

Closed
skunert opened this issue Aug 9, 2024 · 10 comments
Closed

Add benchmark overhead to frame-omni-bencher #5303

skunert opened this issue Aug 9, 2024 · 10 comments

Comments

@skunert
Copy link
Contributor

skunert commented Aug 9, 2024

The benchmark overhead command should be implemented for frame-omni-bencher.

Currently its not clear how to run benchmark overhead for a parachain.

Omni bencher currently does not support the command.

"Only the `v1 benchmark pallet` command is currently supported".into()

Polkadot-parachain does not support it either.

match cmd {
#[cfg(feature = "runtime-benchmarks")]
BenchmarkCmd::Pallet(cmd) => runner.sync_run(|config| {
cmd.run_with_spec::<HashingFor<Block>, ReclaimHostFunctions>(Some(
config.chain_spec,
))
}),
BenchmarkCmd::Block(cmd) => runner.sync_run(|config| {
let node = new_node_spec(&config, &cli.node_extra_args())?;
node.run_benchmark_block_cmd(config, cmd)
}),
#[cfg(feature = "runtime-benchmarks")]
BenchmarkCmd::Storage(cmd) => runner.sync_run(|config| {
let node = new_node_spec(&config, &cli.node_extra_args())?;
node.run_benchmark_storage_cmd(config, cmd)
}),
BenchmarkCmd::Machine(cmd) =>
runner.sync_run(|config| cmd.run(&config, SUBSTRATE_REFERENCE_HARDWARE.clone())),
#[allow(unreachable_patterns)]
_ => Err("Benchmarking sub-command unsupported or compilation feature missing. \
Make sure to compile with --features=runtime-benchmarks \
to enable all supported benchmarks."
.into()),
}
},

And you can not use the polkadot binary because it is not having the same hostfunctions enabled (e.g. ext_storage_proof_size_storage_proof_size_version_1).

Noticed while playing with #5283

@ggwpez
Copy link
Member

ggwpez commented Aug 9, 2024

Polkadot-parachain does not support it either.

Oh... adding it to the omni-bencher would be good long term, yes. We can probably re-use some mechanisms from the try-runtime-cli since that is also building blocks.
The annoying part are the Inherent Data Providers, since they are on the node side and we dont know in advance what inherent data the runtime needs.
So the omni-bencher needs to inject all possible IDPS that we have (and also the ones from ORML etc). Otherwise if one inherent is missing then the block will not build.

@skunert
Copy link
Contributor Author

skunert commented Aug 9, 2024

For a start, the omni-bencher could support what the omni-node will support. That should be fine for most use cases.

@ggwpez
Copy link
Member

ggwpez commented Aug 10, 2024

For a start, the omni-bencher could support what the omni-node will support. That should be fine for most use cases.

WDYM? IIUC the omni-node would just call into the shared omni-bencher code. This de-duplicates all of this logic and puts it into one place.
But now thinking about it, the omni-node would be a superset of the omni-bencher. So maybe the omni-bencher would not be needed anymore 🤔
Currently it is nice to have though, since it compiles faster in CI and does not require any features to be enabled.

@skunert
Copy link
Contributor Author

skunert commented Aug 15, 2024

My impression was that we will have no benchmarking related code in the omni node and people will use the omni bencher for benchmarking exclusively.

This is also what is outlined here: #4966

@ggwpez
Copy link
Member

ggwpez commented Aug 15, 2024

My impression was that we will have no benchmarking related code in the omni node and people will use the omni bencher for benchmarking exclusively.

Yes even better. I thought you meant the opposite with this:

For a start, the omni-bencher could support what the omni-node will support. That should be fine for most use cases.

But nevermind, seems like we think the same thing.

@kianenigma
Copy link
Contributor

On the benchmark side, everything should be possible to do in frame-omni-bencher.

As noted, omni-node will have no old-school, runtime-coupled benchmarking abilities.

As noted here, we could embed frame-omni-bencher as-is into the omni-node as well.

@skunert
Copy link
Contributor Author

skunert commented Aug 30, 2024

I investigated the steps needed for an omni-node friendly benchmark-overhead command.

Context

The benchmark overhead command has the job of providing us with two things:

  1. Base block weight
  2. Base extrinsic weight

To calculate these, it executes a block and monitors its time and the proof recorded during this time. In addition, it will execute a System::remark extrinsic. Historically, only the polkadot binary supported this command; polkadot-parachain never supported it. I am even skeptical if this value was even benchmarked for most runtimes since I only see the same value everywhere. Maybe it was a reasonable guess, and we copied it everywhere?

Moving Parts

There are multiple moving parts that make this work for arbitrary runtimes.

Inherents

Since we want to support multiple runtimes, we need to supply all the inherents the runtime expects. Most parachain runtimes will have similar inherent requirements. The runtime already provides the inherents to the block-builder. All we need to do is populate the InherentData with the required values for the most popular inherents.

Building the Extrinsic

To support multiple runtimes, we must build the extrinsic dynamically based on the runtime's metadata. My first intuition is to employ subxt, which should be able to build this simple extrinsic dynamically. Alternatively, we could let users provide bytes of the extrinsic to execute.

Block Building & Client

The actual block building and client acquisition should be easy. We can use the same FakeRuntimeApi strategy that we already use in the omni-node.

Initial State

Ideally, the runtime should provide the initial state. As previously discussed, we could rely on some dev profile by default. This profile provides us with a valid genesis state on which to build. Eventually, we should inject more accounts into the genesis state to make the benchmark results more accurate. We also need to ensure that the account we signed the remark with is in the genesis state.

Question: How many accounts do we want to inject?

User Facing

The user interface can extend the benchmark overhead command v1. We also need something like --runtime, similar to pallet benchmarking. For now, we can continue to support the --chain arg to support chains that do not provide any genesis presets in the runtime.

One question that remains open for now is how to handle chains that have custom inherents or more exotic setups. We can likely ignore this in the initial version. Still, we could allow parachain teams extension of inherents and transaction building by having a minimal API, similar to what polkadot-parachain-lib is exposing now. This would mean that not everything is possible with omni-bencher, and integration into omni-node is a must.

For an initial iteration, there is also the idea of providing something like a --tx-bytes flag that allows users to pass the bytes of a prepared transaction to the benchmarking.

@ggwpez
Copy link
Member

ggwpez commented Sep 9, 2024

One question that remains open for now is how to handle chains that have custom inherents or more exotic setups. We can likely ignore this in the initial version. Still, we could allow parachain teams extension of inherents and transaction building by having a minimal API, similar to what polkadot-parachain-lib is exposing now. This would mean that not everything is possible with omni-bencher, and integration into omni-node is a must.

Either that, or they fork the omni-bencher, or there is a --inherent-data KEY:VALUE argument that can be used to inject any arbitrary inherent data.

For an initial iteration, there is also the idea of providing something like a --tx-bytes flag that allows users to pass the bytes of a prepared transaction to the benchmarking.

Yea good idea.

Question: How many accounts do we want to inject?

Probably should be a CLI command with a default. Currently the PoV benchmarking assumes a million entries per map - but it does not actually populate the maps, just extrapolates the proof size.

@gui1117
Copy link
Contributor

gui1117 commented Oct 26, 2024

With TransactionExtension being now weighted, we want to change how to calculate the extrinsic overhead not to include the transaction extensions.
The only extrinsic without extensions are bare extrinsics. Unfortunately there is no equivalent of remark but as a valid bare extrinsic in system.
I suggest we have a new valid bare extrinsic only when the runtime is compiled with runtime-benchmarks that we would use for benchmarking the extrinsic overhead.
I am going to start the work on top of the PR.
I will modify ExtrinsicBuilder::build so that we can build signed or bare transaction depending on a new argument.
Add validation for a new no-op call in system when compiled with runtime-benchmarks.
Then benchmark overhead will build bare transaction of this new specific call.
But if we only ever want to benchmark the extrinsic overhead then maybe we can simplify stuff.
If we still want to benchmark like blocks full of signed transfer then current architecture as in the PR is the best.
EDIT: but if we benchmark for a bare extrinsic, we still need to add the weight of signature verification of signed extrinsic :-/, I don't have a full solution yet.

Maybe the easiest will be to just substract the post-dispatch weight of remark + all transaction extensions to the result of the extrinsic overhead benchmark. Then this gets the real overhead without the call.
To get programmatically the post dispatch weight of remark call + all transaction extensions is not easy, but it can be doable.

github-merge-queue bot pushed a commit that referenced this issue Oct 30, 2024
# Benchmark Overhead Command for Parachains

This implements the `benchmark overhead` command for parachains. Full
context is available at:
#5303. Previous attempt
was this #5283, but here
we have integration into frame-omni-bencher and improved tooling.

## Changes Overview

Users are now able to use `frame-omni-bencher` to generate
`extrinsic_weight.rs` and `block_weight.rs` files for their runtime. The
core logic for generating these remains untouched; this PR provides
mostly machinery to make it work for parachains at all.

Similar to the pallet benchmarks, we gain the option to benchmark based
on just a runtime:

```
frame-omni-bencher v1 benchmark overhead --runtime {{runtime}}
```

or with a spec:

```
frame-omni-bencher v1 benchmark overhead --chain {{spec}} --genesis-builder spec
```

In this case, the genesis state is generated from the runtime presets.
However, it is also possible to use `--chain` and genesis builder `spec`
to generate the genesis state from the chain spec.

Additionally, we use metadata to perform some checks based on the
pallets the runtime exposes:

- If we see the `ParaInherent` pallet, we assume that we are dealing
with a relay chain. This means that we don't need proof recording during
import (since there is no storage weight).
- If we detect the `ParachainSystem` pallet, we assume that we are
dealing with a parachain and take corresponding actions like patching a
para id into the genesis state.

On the inherent side, I am currently supplying the standard inherents
every parachain needs.

In the current state, `frame-omni-bencher` supports all system chains.
In follow-up PRs, we could add additional inherents to increase
compatibility.

Since we are building a block during the benchmark, we also need to
build an extrinsic. By default, I am leveraging subxt to build the xt
dynamically. If a chain is not compatible with the `SubstrateConfig`
that comes with `subxt`, it can provide a custom extrinsic builder to
benchmarking-cli. This requires either a custom bencher implementation
or an integration into the parachains node.

Also cumulus-test-runtime has been migrated to provide genesis configs.

## Chain Compatibility
The current version here is compatible with the system chains and common
substrate chains. The way to go for others would be to customize the
frame-omni-bencher by providing a custom extrinsicbuilder. I did an
example implementation that works for mythical:
https://github.com/skunert/mythical-bencher

## Follow-Ups
- After #6040 is finished, we should integrate this here to make the
tooling truly useful. In the current form, the state is fairly small and
not representative.

## How to Review
I recommend starting from
[here](https://github.com/paritytech/polkadot-sdk/pull/5891/files#diff-50830ff756b3ac3403b7739d66c9e3a5185dbea550669ca71b28d19c7a2a54ecR264),
this method is the main entry point for omni-bencher and `polkadot`
binary.

TBD:
- [x] PRDoc

---------

Co-authored-by: Michal Kucharczyk <[email protected]>
@kianenigma
Copy link
Contributor

closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants