-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronization stopped with ErrorPolicySuspendConsumer error. #4405
Comments
This is one of the relays I have.
This block reports a negative TIP.
|
Just had a brief look at this with a few other engineers on a call we were on. From the Consensus perspective, a 5min gap is not itself cause for alarm; it will randomly happen. We (Consensus) would be worried if:
I haven't looked to closely at the log messages you shared. It could be that those do specifically suggest some unexpected behavior. Or it could be that the nodes are timing out as planned due to the leader schedule being unusually empty (we have semi-random timeouts and sometimes a large leader schedule gap is enough to ensure they will be triggered). I did filter your log by Does that seem like a complete explanation? |
slot": 70300567 (028df8ce653ec0fcd2cf44fbc37ca79b5bfbca51c038305aebefb528bc4aeb7e) <During this time, the connection of the many relay node has gone wrong.> slot": 70300592 (My Friend Pool) My friend had a slot reader scheduled after 70300567 slot. However, it failed to receive it correctly and failed to generate a block, perhaps because a negative TIP was reported in the previous block propagation. |
Passed on to the networking team |
not sure if it is correct, because I have only one 1.34.1 node left. and here how the same looked on my 1.35.3 relays on my 1.34.1 node block the drop and re-establishement of tcp connections may was caused by a long period without blocks. or because the node dropped connections to who feeded him the bad (early) block? |
Here's a summary of what I saw on my relays logs 2022-08-30T13:40:43 2022-08-30T13:40:55
2022-08-30T13:40:57
2022-08-30T13:45:12 here it seems SCAR's 028df8... became adopted meanwhile (positive blockDelay) and LEO2's 0fb740... built on SCAR's block |
(reproduced from comment IntersectMBO/ouroboros-network#3984 (comment) on #3984) There is nothing particularly suspicious or unexpected in this size of a gap. The statistical properties of the block leadership generate a negative-exponential inter-arrival time for blocks. A 5minute gap between blocks has (approximately) 1 in 10^ We would expect such gaps A node can not tell the difference between a 'stalled' network connection and this situation. This observation lead to the design decision captured in https://github.com/input-output-hk/ouroboros-network/blob/a0e1c2ba64ad479fdd8e0d50d81afa753d76cd00/ouroboros-consensus/src/Ouroboros/Consensus/Node.hs?plain=1#L660 As is illustrated above, the change in connection in peers is expected (can't tell difference between non-responding peer and this situation). The peer selection mechanisms seeks out new nodes to connect to (as existing ones timeout) seeking to maintain the configured size of various 'temperatures' of peers. To me, this looks like a beautiful illustration that the design is working as intended, we've never had this level of evidence before - thank you. |
Internal/External
External otherwise.
Area
Cardano-node
Mainnet
Summary
8/30 13:41~13:45 UTC
It appears that the relay was unable to fetch the slot during this time period and the chain stopped.
A large number of "ErrorPolicySuspendConsumer" errors were reported at the relay node at that time.
Block generation during this time is not present in the chain
Steps to reproduce
Synchronization appears to have stopped.
Block generation during this time period cannot be confirmed.
Many "ErrorPolicySuspendConsumer" errors
Could this be a problem with block propagation or topology connections?
System info (please complete the following information):
OS Name:Ubuntu
OS Version e.g. 20.04
Node version (output of
cardano-node --version
)cardano-node 1.35.3 - linux-x86_64 - ghc-8.10
git rev 950c4e2
CLI version (output of
cardano-cli --version
)cardano-cli 1.35.3 - linux-x86_64 - ghc-8.10
git rev 950c4e2
The text was updated successfully, but these errors were encountered: