Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metered Weights in the Polkadot-SDK #49

Open
shawntabrizi opened this issue Nov 16, 2023 · 3 comments
Open

Metered Weights in the Polkadot-SDK #49

shawntabrizi opened this issue Nov 16, 2023 · 3 comments

Comments

@shawntabrizi
Copy link
Member

shawntabrizi commented Nov 16, 2023

Before creating a full RFC, I want to start a discussion on a potential direction around improving Weights and Benchmarks in the Polkadot SDK.

Problems to solve

  • Weights / Benchmarking are identified as one of the more complex parts of using the Polkadot SDK.
  • Keeping Polkadot-SDK as general as possible for the runtime, allowing for other frameworks to be created.
  • Improving safety and performance of the Polkadot SDK execution environment.

High Level Ideas

The Polkadot-SDK runtime should take a "step backwards", and introduce weight metering as the base level of execution limiting, rather than the pre-measured weight system that exists today.

Weight Metering is compatible with pre-measured weights, but not vice-versa.

One of the goals of the Polkadot-SDK is to be as general as possible, and allow for customization at each level of the stack, especially the runtime. As I understand, we have chosen a system where execution of a block in the runtime requires knowledge of the weight of that block ahead of time. This appears to be less flexible than using execution metering.

For example, assuming a system where we did execution metering, the runtime could bypass the metering system and directly inject the weights that it knows are correct for a given execution. However, with pre-measured weights, we have no flexibility to implement a metering system within a custom runtime framework.

Benchmarking pushes overhead to developers.

Benchmarking is quite the laborious process, especially with more complex pallets.
It is a large blocker from building an idea, and deploying a product which is relatively safe to use.

If we want to keep Polkadot SDK competitive for innovators and builders, we cannot have this large overhead where other existing platforms do not.

Benchmarking can be extremely pessimistic.

Because we need to use the worst case situation for every extrinsic, the final calculated weights for a block can be much more than the time it actually takes to execute that block. It was previously calculated that a block full of only transactions uses only 60% of the total pre-calculated weight.

Even if extrinsics use weight refunds, it is likely that we wont optimally fill blocks because we only start include an extrinsic in a block if the worst case scenario weight would allow it to fit, not the final weight.

High Level Solutions

At the base Runtime API, support Weight Metering

Blocks and extrinsics being executed in the runtime should provide a max_weight parameter, and fail to execute if the metered weight is higher than the max_weight.

Perhaps this should be an Option where None can be provided to be backwards compatible and the runtime will be forced to provide a pre-calculated weight.

Runtime Should Support Panics

It seems that in order for metering to ever work, we would need to be able to suddenly halt extrinsic execution when the metered weight is beyond the expected max_weight. A panic is the right tool for this, correct?

In any case, allowing panics in the runtime would also improve developer experience since this is a major area where a runtime developer can make a mistake, and make their chain vulnerable to attack.

Weight Metered Database

It is not my suggestion that we provide full weight metering to all execution in the runtime. This would just bring us back to the performance of smart contracts.

Instead, I suggest we create a special DB layer which provides very specific weight information about database access as it happens during runtime execution.

We know that DB operations account for the majority of weight costs in the runtime, and that usually the number of DB operations is also quite low. (We should do basic analysis of existing pre-metered weights to back this up tangibly).

If we only meter the database, and assume that other execution is nominal, then we can get a very high performance environment with high accuracy.

The DB Layer could provide very specific details like exactly where the item exists in the merkle trie (depth, size, neighboring children, if it or other neighboring children have already been cached, etc..). Then with really comprehensive database benchmarks, we can dynamically meter how much weight each data operation would be.

Perhaps it is possible to forgo this minimal overhead when pre-calculated weights already exist, or, this can be used to automatically provide weight refunds when there is knowledge that the db weights are overestimated.

Handling Execution Weight

With a metered database, I suspect we will calculate a majority of the weight used in a block / extrinsic.

However, to get full saftey, we can provide a few different tools:

Custom Additional Weight

We already provide APIs for runtime developers to manually add more weight during extrinsic execution. This can be used to increase the weight where we know that the metered databse is not enough.

In fact, the benchmarking system already splits benchmarking between Wasm execution and the database operations, so we already provide a method for users to actually discover the "missing" weight.

Custom Weight Buffering

We could also allow runtime developers to add their own custom "weight buffer" to keep their extrinsics more safe. For example, we could add an additional 20% overhead to the weight returned by metered database.

@gui1117
Copy link

gui1117 commented Aug 17, 2024

The DB Layer could provide very specific details like exactly where the item exists in the merkle trie (depth, size, neighboring children, if it or other neighboring children have already been cached, etc..). Then with really comprehensive database benchmarks, we can dynamically meter how much weight each data operation would be.

This would require to specify an abstract database architecture with a specific cache size, and enforce all clients to have a database compatible with this abstraction.
Maybe just enforcing one cache (without all the neighboring children, depth, size, information) can already be a huge improvement.

Also if we have more memory in the runtime with PVM we can also implement some caching inside the runtime itself I guess.

I see we can do this RFC in multiple step:

  • 1- Tracking operations: have a better automatic refund.

    • inside the runtime: keep track of most time-intensive runtime-interface call: crypto, hashing, Storage, Trie.

    • benchmark them independantly from the ref_time. (we already do it for Storage)

    • have the weight related to them separated from ref_time in the dispatch info (e.g.: tracked_operation_ref_time).

    • tracked_operation_ref_time gets an automatic precise refund at the end of the transaction by calculating the actual weight from the tracked call.

    • (maybe we could even track wasm execution in the tracked_logic_ref_time, as I heard the expensive part of metering was branching when gas is exceeded, not actually counting the number of operation, but it feels difficult for less gain, also considering PVM)

  • 2- Tracking Storage more precisely: even better refund

    • enforce an architecture on the database: maybe just a cache size or what you propose with neighbor, depth etc..

    • add this information in the Storage runtime interface.

    • use this information in the tracking and refund.

  • 3- Have metering and execution stopped when max weight is reached.

@ggwpez
Copy link
Member

ggwpez commented Aug 17, 2024

The PVM will allow for deterministic metering. I think the longer term goal is to recompile the runtimes to PVM and then use that.
I am not sure if its worth to do a lot of effort earlier. WASM is just fundamentally flawed in this regard (being a stack machine). The PVM story is probably year out though.

@gui1117
Copy link

gui1117 commented Aug 17, 2024

This keeps open the question of "Weight Metered Database" or the point (2) in my comment.

Or if PVM allow more memory in the runtime we can implement a cache inside the runtime maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants