-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor PV performance (missed votes) with v1.12.0 and v1.13.0 #4800
Comments
First of all the drop in performance is small, since you are still getting at least A grades, but not that often getting A+, with these small margins I would double check a few times that it is not just restarts that improve things and not the version changing. Getting into potential reason for this happening, I don't think it is a problem with those releases since ~90% of the validators is already running them and getting A+ grades, so I think it might be a combination of network or your node being a bit slow and maybe something that change between those versions. Two pull requests touched that area(statement-distribution) between v1.11 and v1.12:
|
Thanks for helping out with this. I will address the possible issues with our servers and network below.
For versions 1.11.0 and previous, our nodes for ran for weeks at a time without restart and achieved very few missed votes. During the "v1.12.0 section of our graph above (snip below), we restarted no less than four times as we moved the node back and forth between our E-2136 and E-2386G severs. Restarts showed no improvement.
To keep network latency to a minimum we use 16 core Microtik routers capable of handling 100x the traffic we throw at it. Our routers' CPU usage is almost idle.
We thought our E-2136 server might be a bit slow (despite it meeting benchmarks and running v1.11.0 fine), so we did switch the DOT node to the E-2386G server that achieves a score of over 200% on all tests except memory. As discussed above, the faster server was no help. |
Can someone check whether parachain block times also take a hit? Maybe some parachains are forking? |
Could be it, currently there is a hard limit of 1 concurrent request per peer: #3444 (comment) I would expect that some validators are not upgraded and send more than one request at a time to peers upgraded to v1.12+ |
|
Digging a bit more through the data and I concur with your finding it seems like around session
Doesn't look like from the metrics and it makes sense because https://apps.turboflakes.io/?chain=polkadot#/insights gives you the grade based on the relative points you gathered compared with the Peers in your group, so in this case if other peers got slightly more points it means they backed the candidate and that ended-up on chain. Just from the data it looks like, sometimes you loose the race of including your Valid/Backed statements on chain, as of now the most obvious candidate for causing this is: #4800 (comment), but this needs more investigation to understand the full conditions. |
@aperture-sandi any chance you could provide us with the grafana values for the following metrics, when running v1.12 or v1.13 ?
|
Quick update on next steps here:
|
Looking at the logs from one of our kusama validator I see this: https://grafana.teleport.parity.io/goto/7QCJDgwIR?orgId=1
And this correlates exactly with the number of missed statements by the validator: https://apps.turboflakes.io/?chain=kusama#/validator/FSkbnjrAgD26m44uJCyvaBvJhGgPbCAifQJYb8yBYz8PZNA?mode=history. It misses 2 statements at session 40001 and 1 statement at 4002. The changes have been introduced with #4035, so most likely that's what is causing the behaviour on your validators, I'll dig deeper into why this condition The good news is that this is happening very rarely and even when it happens the candidate gets backed anyway, so the parachain does not misses any slot, it's just that the validator will not get the backing points for that single parachain block. |
ok I was a bit confused by the logs being in reverse chronological order. This log can appear if the parachain is producing forks. But the resulting number of on-chain backing votes should not be different prior to #4035 . Prior to #4035, different parts of the backing group (let's say half and half) could back different forks. The block author would in the end only pick one of the two (so only half of the votes would end up on chain). But the block author will have both candidates to pick from. After #4035, each half of the validator group would back different forks, but the block author will only ever know one candidate. In the end, th result should be the same (only half of the votes would end up on chain). Based on this reasoning this should not affect backing rewards. However, I think what may be happening is that prior to #4035, each validator in the group would have backed both candidates. That is, even if a validator voted for a candidate, this wouldn't stop the validator from voting for a competing candidate. |
Hi all, I'm out of the office until Tuesday won't be able to get to this until Tuesday, Wednesday, if still needed |
I discussed this further with @alindima offline and we think you are right.
This is almost correct. You would have more than half of the votes, as the validators would accept to second the fork which is what changed with this PR. |
A simple fix would be to ramp up the backing threshold to 3 out of 5 on Polkadot. Better because always working: Deterministic fork selection: Instead of dropping the new import, we compare by CandidateHash and drop the smaller/greater one. If we do this, then please don't forget to add this to the implementer's guide. |
Double checked:
How would that fix this situation ? If you are on the loosing side of the fork your vote would still not be counted and rewarded. |
Yeah, aura should not fork except from significant gossip delays, wrong clocks, or other concensus-ish issues. Are there equivocations on the parachain? Aura equivocations should look like two blocks with the same slot, which incidentally have the same block producer authority.
At minimum, we should glance at those backing group size charts first, because 2-of-5 chooses 2<5/2 for liveness. I'd think 2-of-4 retains liveness, maybe it reduces this too, but likely the rel fix lies elsewhere. |
These are the two blocks I'm talking about: [Abandoned but our validator backs it first] https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frpc-bridge-hub-kusama.luckyfriday.io#/explorer/query/0x0ee1366da44b3513b8e987edb00727c224fa12f0c842871e38285d4bf4c60c58 Looking at the logs they don't seem to be produced by the same validator:
|
Found the root-cause of the forks: https://matrix.to/#/!gTDxPnCbETlSLiqWfc:matrix.org/$fO-SigvBAVe1uRY3Kp99yGeUDiHPvfeaL_fCR_tKyoc?via=web3.foundation&via=matrix.org&via=parity.io, it seems that people that run system parachains are running two different instances of a collator with the same authorithies keys, so when it is the turn of their authority they end-up proposing two blocks which end up racing on the relay-chain to get the necessary backing votes. So, if a validator is backing the loosing fork it will not get the rewards, because of the reasons explained here by @alindima #4800 (comment). My suggestion for the path forward here would be:
|
We should start slashing them. Initially I didn't added slashing for equivocations, because there was no way to determine the relay chain block associated to a parachain block. However, now we can do this and we should enable it. Generally the idea is that there is one block per slot from each author. |
Sounds complicated to me, as a validator could start signing any kind of blob and pretend they voted for something. |
They are literally, running two instances with the same keys, because they think it is good practice, I will tell them to stop this practice for system parachains, since it does more harm than help. |
+1 creating forks will impact parachain block time, so there is a good incentive for parachains to slash their collators if they do it. On the other hand, theoretically you could even have proof of work parachains or some custom block production consensus that creates forks. In such a scenario, if your validator is assigned to a such a parachain it will get less rewards than before, so problem is not solved in the general case. With the current behaviour of never backing a fork we actually save some CPU that could be used for more useful work. Of course, we trade off some liveness when a fork gets backed with only 2 backers that have bandwidth issues or go offline. IMO the solution seems to be changing prospective parachains to accept forks. This is limited to just ensure we have as many backers as possible, but the forks would never be taken into consideration to form a chain, so we keep current behaviour and make validators happy again. |
We are backing forks? The point being that you may only back a fork that doesn't get on chain and thus, you don't get any rewards as explained by @alexggh above. Generally I would also say that this is expected. Over time the validator will get its normalized rewards. Also don't we already give more era points for approval voting than backing? So, backing a fork should even have less impact. |
@alexggh You've different It's harmless for polkadot if the fork gets back, just some lost rewards for the backer. We do not want collators equivocating of course, but we should first ask them to stop. If we then need non-slashing technical measures, then we could break out my CRDT trick authority key registration, aka only one registered on-chain at a time. We'd then only have real dishonest equivocations from collators, for which parachains likely eject them. Also, we're not even paying the approval checkers yet, so our backing rewards look really exagerated right now. Just naively, validator rewards should break down roughly like 70% approval checks, 8% availability, 7% backing, and 15% babe & grandpa. We're missing like 78% of that allocation, so all current rewards shall shrink by a factor of 5 once we account for all this critical work, and then people should care less if they back the wrong thing ocasionally. |
Great work @alexggh ! Indeed, the fork choice already happens before a candidate is fully backed (actually not good), so 3 out of 5 does not help here. Picking by comparing candidate hashes should do the trick though. |
Yeah that was a red-herring, those weren't the authority ID, they are some keys that are auto-generated at startup for each instance, so that's why I thought these are different collators, but they aren't.
Yes, I think that it is what will happen when the forks would be more or less random, but in this case it looks like the same set of validators find themselves more often on the loosing side, I would assume because the fork is deterministic and the same validator are loosing because of their latency to the collator.
We don't have approval voting rewards implemented yet, but yeah once we reduce the backing proportion in rewards this should actually be a non-issue, even currently the difference in points it is just around 1%. |
Asked the system parachain collators to stop running two instances with the same authority keys here: https://matrix.to/#/!gTDxPnCbETlSLiqWfc:matrix.org/$nP852NAScRgGukKtdNHoDEISH_XaqnfgXX5MoSPZy1Q?via=web3.foundation&via=matrix.org&via=parity.io, I'll keep an eye on the dashboards to confirmed if there is any impact. |
I've discussed this issue more with @eskimor and @sandreim and we came up with the following solution that we think would work. In #4035 , I removed support for parachain forks in prospective parachains, but added support for unconnected candidates (candidates for which we don't yet know the parent, so that chained candidates can be validated in parallel by different backing groups, needed for elastic scaling). The proposal is to have prospective-parachains only build the chain with candidates that have been backed. This way, a single validator that seconds an invalid candidate would not prevent a legitimate fork from being backed. The unconnected candidate storage could hold forks, but the actual chain prospective-parachains builds would not allow for forks (which was the source of great complexity before #4035) TL;DR: prospective-parachains will again allow for forks between candidates that are not backed but will not track them once the candidates are backed. validators will use a deterministic fork selection rule during backing This is definitely not a trivial fix that can be done in a couple of days though. |
Can you explain what you mean by this? |
I mean that if there is a fork (two different candidates are seconded, having the same parent), the other validators in the backing group will all choose to validate the same candidate, so that they all have the highest chance of getting their backing reward. The block author will also pick the same candidate to back on-chain between the two. This rule could be that we favour the candidate with the lower candidate hash. This is something random that can't be easily manipulated. |
Actually, now that I've started prototyping this I'm not sure it's needed to do this comparison during backing, as long as we allow both forks to be backed. It's up to the block author to pick one and both of them could have the same number of votes. It'd only be more efficient to give up validation of a candidate that we could reasonably know will not be picked by the block author. But adds a whole lot of complexity for little benefit the more I think about it |
Yeah this was actually the case before the changes you have done in #4035. I mean the general idea of choosing the next block to back, based on the lower hash sounds reasonable to me. I mean there could be N (number of validators in the backing group) forks and to ensure that one gets the required number of votes, by letting all decide in the same way sounds like a good idea. But then we would maybe need to stop fetching a POV when we see some smaller hash or whatever. So, it adds complexity again. Maybe something when we are bored :P |
IIUC this means we need to go back to tracking trees rather than chains + unconnected candidates. The idea with the comparison is to use it so only the chosen fork makes it to the block author (unless it is himself in that group). All other forks remains local to the backing group (cluster topology). |
This was always the idea. We've no control over the order in which backers & block authors recieve the candidates & statements, and thus cannot reward every backed candidate, because establishing control costs too much CPU time & bandwidth. That's fine. |
Resolves #4800 # Problem In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). # Description A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. ## Some implementation details `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc
Resolves #4800 In #4035, we removed support for parachain forks and cycles and added support for backing unconnected candidates (candidates for which we don't yet know the full path to the latest included block), which is useful for elastic scaling (parachains using multiple cores). Removing support for backing forks turned out to be a bad idea, as there are legitimate cases for a parachain to fork (if they have other consensus mechanism for example, like BABE or PoW). This leads to validators getting lower backing rewards (depending on whether they back the winning fork or not) and a higher pressure on only the half of the backing group (during availability-distribution for example). Since we don't yet have approval voting rewards, backing rewards are a pretty big deal (which may change in the future). A backing group is now allowed to back forks. Once a candidate becomes backed (has the minimum backing votes), we don't accept new forks unless they adhere to the new fork selection rule (have a lower candidate hash). This helps with keeping the implementation simpler, since forks will only be taken into account for candidates which are not backed yet (only seconded). Having this fork selection rule also helps with reducing the work backing validators need to do, since they have a shared way of picking the winning fork. Once they see a candidate backed, they can all decide to back a fork and not accept new ones. But they still accept new ones during the seconding phase (until the backing quorum is reached). Therefore, a block author which is not part of the backing group will likely not even see the forks (only the winning one). Just as before, a parachain producing forks will still not be able to leverage elastic scaling but will still work with a single core. Also, cycles are still not accepted. `CandidateStorage` is no longer a subsystem-wide construct. It was previously holding candidates from all relay chain forks and complicated the code. Each fragment chain now holds their candidate chain and their potential candidates. This should not increase the storage consumption since the heavy candidate data is already wrapped in an Arc and shared. It however allows for great simplifications and increase readability. `FragmentChain`s are now only creating a chain with backed candidates and the fork selection rule. As said before, `FragmentChain`s are now also responsible for maintaining their own potential candidate storage. Since we no longer have the subsytem-wide `CandidateStorage`, when getting a new leaf update, we use the storage of our latest ancestor, which may contain candidates seconded/backed that are still in scope. When a candidate is backed, the fragment chains which hold it are recreated (due to the fork selection rule, it could trigger a "reorg" of the fragment chain). I generally tried to simplify the subsystem and not introduce unneccessary optimisations that would otherwise complicate the code and not gain us much (fragment chains wouldn't realistically ever hold many candidates) TODO: - [x] update metrics - [x] update docs and comments - [x] fix and add unit tests - [x] tested with fork-producing parachain - [x] tested with cycle-producing parachain - [x] versi test - [x] prdoc
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
EDITED to add...
Further research seems to indicate an overall increase in missed votes for a significant percentage of validators.
See comment below: #4800 (comment)
Poor PV performance (missed votes) with v1.12.0 and v1.13.0
We have been seeing poor PV performance in the form of missed votes for v1.12.0 and v1.13.0
For our DOT node for versions v1.11.0 and previous our missed vote ratio (MVR) was ~0.3%
For our DOT node for versions v1.12.0 and newer our missed vote ratio (MVR) is ~1.2%
To confirm v1.12.0+ was the issue we reverted to v1.11.0 for 8+ sessions and saw our MVR return to previous low values.
We have tested this on Xeon(R) E-2136 CPU @ 3.30GHz and Xeon(R) E-2386G CPU @ 3.50GHz bare metal servers running NVME drives.
*NVME drive are connected to PCIE via add on card for direct PCIE access
*Hyperthreading is turned off
(See attached benchmarks)
We compile using "RUSTFLAGS="-C target-cpu=native" and "--profile production"
RUSTFLAGS="-C target-cpu=native" ~/.cargo/bin/cargo install --force --locked --path . --profile production
Rust Version:
Ubuntu 22.04.3 LTS
See attached image of MVR via TurboFlakes ONE-T report.
Live Report here: https://apps.turboflakes.io/?chain=polkadot#/validator/5F99nAGjKA9cU8JTpuTDWDhqpMxwxmwc8M1ojkKXZAR7Mn2c?mode=history
How can we help to narrow down the cause of this performance issue?
Steps to reproduce
To confirm v1.12.0+ was the issue we switch back to v1.11.0 for 8+ sessions and saw our MVR return to previous low values.
We have tested this on Xeon(R) E-2136 CPU @ 3.30GHz and Xeon(R) E-2386G CPU @ 3.50GHz bare metal servers running NVME drives.
EDITED to add...
Further research seems to indicate an overall increase in missed votes for a significant percentage of validators.
See comment below: #4800 (comment)
The text was updated successfully, but these errors were encountered: