-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase max_pov_size
to 10MB
#5334
Comments
I do not have strong feelings on increasing PoV sizes near-term, but certianly if we put gluttons on kusama then it's nice to be able to pressure them properly.
Yes, we make storage proofs 4x larger than necessarily by inherreting the stupid radix 16 trie from ETH. |
So, a rough roadmap could be like that. In the next SDK release, we bump The other option is to make the proof size configurable, but I'm not sure how much sense it makes to do that, given that it's a complication anyway and the event of changing this constant is so rare. CC @dmitry-markin any concerns from the networking layer side? Feel free to CC other relevant people. |
Maximum code size is already controlled by the host configuration:
I strongly hope that we have not hard coded anywhere this 5MiB limit. |
As I see from the code, the |
The MAX_POV_SIZE constant is also present in req/response protocols config. I think we should remove all the constant references and use the configuration value. In the future we might want to raise it above 10MB, I don't see any reason why not if the validators have enough bandwidth and CPU. We'd still be capped at 16MB on the parachain maximum response size of the block request protocol:
|
Otherwise there is no consensus on this number. Which leads to the situation that we need to please all validators to upgrade. This is not really how a decentralized network should work. Using some upper number on the node makes sense, but it should give some leeway to the on chain value. |
The runtime set code does something like this, so I guess that was the leeway which we now want to increase.
|
There are two values here:
(1) poses an upper limit on (2). Yes ideally we would derive the network limit from the runtime configuration, but that would require some refactoring as we currently configure that value on node startup, when setting up the protocol. It is worth double checking that (2) is correctly used everywhere in validation and not the constant. Other than that the process is as follows:
Assuming we use the runtime configuration correctly (and persisted validation data, which should be derived from it), there can be no consensus issue. We nevertheless must have the majority of validators upgraded to the higher networking limit, otherwise honest nodes would not be able to fetch a large PoV, which could cause a finality stall. |
Indeed. First the node side limit, then the runtime value. |
One thing regarding the networking req/resp protocols limit to keep in mind, is that the block response limit was set for minimal supported network bandwidth. I.e., it should not time out even on the slowest supported connections. I don't have the numbers at hand, but the 16 MB block response should not time out in 20 seconds, so the minimum bandwidth is presumably 1 MB/sec in the spec. If we raise this limit, we should also make sure we either increase the request-response protocol timeout (but this can introduce more latency with unresponsive peers) or raise the bandwidth requirements. |
From: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware
So at least from theoretical point of view we should have bandwidth for validators. |
Also, regular full nodes must be able to follow the chain. |
+1 for this. Moonbeam has adopted Asynchronous backing and increased the EVM gas per block and we're running into difficulties due to this limitation. |
Moving comments from #5753 We should consider networking speed limitations if we want to increase the maximum PoV size to 10 MB. The current PoV request timeout is set to 1.2s to handle 5 consecutive requests during a 6s block. With the number of parallel requests set to 10, validators will need the following networking speeds:
The current required speed of 50 MB/s aligns with the 62.5 MB/s specified in the reference hardware requirements. Increasing the PoV size to 10 MB may require a higher networking speed. This is worst case scenario, when all blocks you need to recover are full. |
If we have around 6-7 approvals per validator, then we'd need to recover 5 * 7 = 35 MB per relay chain block, that is almost 6MB/s required to keep up with finality. For backing, assuming there async backing limits (max_candidate_depth 3, max ancestry len 2), there should be at most 5 * 3 * 2 = 5 MB/s Which means a total of lets say 11MB/s for all PoV fetches.
This would mean 22MB/s based on the above math which is well under the spec leaving room for all other gossip and relay chain block fetching. |
We think 100MB/s = 1Gb/s has good odds of handling 15 approvals per valudators then? Or do we think there needs to be a lot more slack somehow? |
Networking should probably be enough even now for 15. I'd be worried about CPU usage, you'd have 15 * 2 = 30s of execution every 6s seconds on top of backing duties. With updated hw specs means 5cores are busy leaving 3 more for relay chain, networking and parachain consensus. Out of those 3 cores, we'd be easily using one just for erasure coding. |
Hey, are there timelines for when we might bump the PoV to 10 MB? One of the current issues is that with Async Backing, we bumped the execution time to 2 seconds, but the PoV was kept the same, so we are not reaping the full benefits of increasing the execution time from 0.5 to 2 seconds. |
The plan is to get it done this year. In terms of code changes this is very little work, but requires significant testing before we go to Polkadot. |
@bkchr After some research, this is what the situation looks like. I was pleased to discover that In the runtime, the hardcoded One thing that could be done is to set Are you okay with such an approach? Or do you have any better ideas? |
Yeah the situation not being the best. I would assume that at least the runtime side on the relay chain is using the value from the For the parachain runtime, using a const sounds sensible. This value doesn't change that often and the assumption that it only goes up is also fine to assume. So, the only downside would be that a parachain may not be able to use the full pov size until they have done a runtime upgrade. However, I think this is reasonable. |
I'm not sure I'm following your thoughts here. When a validator needs to validate a parablock, be it backing or approvals or whatever, it comes through the candidate validation subsystem, which, before instantiating the PVF, surely checks that the PoV is not larger then the limit noted in the persistent validation data, but that's an offchain check. Still, we need to change |
Perfect! That is what I meant!
That the hardcoded limit is used there again, is IMO not correct. The
I would argue that we don't need any RFC. Or better, depends a little bit if the maximum request/response size for the individual protocols is specced. In a perfect world I would assume that these values are not specced. |
Quoting @bkchr (from [here](#5334 (comment))): > That the hardcoded limit is used there again, is IMO not correct. The `max_pov_size` should be controlled by the `HostConfiguration` and not limited by some "random" constant. This PR aims to change the hard limit to a not-so-random constant, allowing more room for maneuvering in the future.
Does that mean target is to have it live on Polkadot by year end? Since fully adopting asynchronous backing on Moonbeam, we expanded from 15M gas per block to 60M but since the POV is the same, cost for some tx that are bound by proof size have quadrupled so we're hoping to get this change as soon as possible. |
yes
I was looking at some metrics for Moonbeam proof size and it doesn't seem to be a bottleneck, see below (source https://www.polkadot-weigher.com/history) |
@sandreim, thanks for providing this. I'll forward it internally. However, the issue is that if there is a block with 60M gas of PoV-heavy transactions, you might see these numbers go up to 100%. Our blocks were 15M Gas with a 5 MB PoV. Due to the increased execution time we bumped it to 60M gas but for the same 5 MB PoV. Hence, we had to penalize PoV-heavy Ethereum transactions from a gas perspective by bumping the gas estimation 4x. Only those transactions are affected. We estimate gas for execution, storage growth, and PoV and use the estimation or the worst-case scenario. |
@albertov19 I still think that you should build a reclaim like functionality that also pays back fees. |
Currently we run relay chain with a max PoV size of 5MB on Polkadot/Kusama. We've recently discovered during benchmarking that the storage proof overhead increases significantly with the number of keys in storage such that the parachain throughput drops to 50% with 1 mil accounts for example.
Based on the numbers in #4399 we should be able to double the max pov size and still only require 50% of hw spec bandwidth at worst case.
I'd expect the CPU cost to increase to 100% at worst for the erasure encoding/decoding, we should determine it using subsystem benchmarks and see how it fits with the upcoming new hw specs.
CC @eskimor @burdges
The text was updated successfully, but these errors were encountered: