You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is meant to track a set of issues related to a high latency observed between processing a sortition and processing the associated Stacks epochs. In watching a node process blocks both during bootup and steady-state for a few weeks, I've reached the following conclusions:
During steady-state, there are two principal sources of latency: doing a periodic full block inventory sync, and a tendency for blocks to simply propagate slowly.
Nodes will periodically (every 12 hours) synchronize all block inventories with their neighbors. If there are slow nodes, this can take a while. In my analysis of a node's operation for 3,000 blocks, at least 6 blocks took over 10 minutes to arrive once their sortitions were processed, because the node was spending all that time in the block inventory synchronization step.
Nodes rarely push blocks directly to one another. Instead, they send BlocksAvailable and MicroblocksAvailable messages to remote nodes, with the expectation that the remote node will turn around and request the block and microblock data via the HTTP interface. The only times they'll push a block or microblock stream directly is when they either (1) mine the block, or (2) notice that a neighbor is missing a block or stream and push it over via the anti-entropy protocol. The latency induced by not pushing blocks has a very wide distribution, and can add as much as 120 seconds of delay between when a sortition is processed and when the block is downloaded.
(via @kantai) If the node is mining, the node can spend an inordinate amount of time in the RunTenure step, but in doing so, will starve itself from running the ProcessTenure step (especially if there are many RunTenure steps in the pipeline). The node should immediately broadcast a block or microblock it produces at the end of RunTenure, instead of waiting for ProcessTenure.
Remove the full inventory sync feature. The node can get away with doing a single full inventory sync when it boots up, and then in the unlikely event that a block or microblock stream from over 2 reward cycles ago becomes available, the anti-entropy protocol can take care of propagating it. No need to delay block downloads actively searching for missing data in prior reward cycles.
Forward blocks and microblock streams to outbound peers, unconditionally. Send BlocksAvailable / MicroblocksAvailable messages to inbound peers.
If the miner mined a block in this sortition, then immediately try to push the block to any new neighbors that connect and don't have the block.
Before trying to mine, verify that the target parent block is still the chain tip. Drop RunTenure requests for which this is not true.
Immediately broadcast a mined block or microblock once it is produced; don't do so in a subsequent relayer loop pass.
Do not query a peer's block inventory more than once per reward cycle during initial block download, since this stalls block downloads.
The text was updated successfully, but these errors were encountered:
Hi, @jcnelson
i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade
could this issue be a reason for that? and what would you recommend to mitigate it?
Hi, @jcnelson i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade could this issue be a reason for that? and what would you recommend to mitigate it?
Hi, @jcnelson
what should be present in debug logs when latency issue occur?
is it an option to mitigate it by switching to other mining node?
i can see one of the miners is doing it, and he has much less invalid block commits then the others
appreciate your help
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is meant to track a set of issues related to a high latency observed between processing a sortition and processing the associated Stacks epochs. In watching a node process blocks both during bootup and steady-state for a few weeks, I've reached the following conclusions:
During bootup, nodes can wait a long time to fetch a Stacks block for a burnchain block, because the node will process all sortitions before it attempts to download blocks ([burnchain-download] download burnchain blocks in parallel to processing sortitions #2944)
During steady-state, there are two principal sources of latency: doing a periodic full block inventory sync, and a tendency for blocks to simply propagate slowly.
Nodes will periodically (every 12 hours) synchronize all block inventories with their neighbors. If there are slow nodes, this can take a while. In my analysis of a node's operation for 3,000 blocks, at least 6 blocks took over 10 minutes to arrive once their sortitions were processed, because the node was spending all that time in the block inventory synchronization step.
Nodes rarely push blocks directly to one another. Instead, they send
BlocksAvailable
andMicroblocksAvailable
messages to remote nodes, with the expectation that the remote node will turn around and request the block and microblock data via the HTTP interface. The only times they'll push a block or microblock stream directly is when they either (1) mine the block, or (2) notice that a neighbor is missing a block or stream and push it over via the anti-entropy protocol. The latency induced by not pushing blocks has a very wide distribution, and can add as much as 120 seconds of delay between when a sortition is processed and when the block is downloaded.(via @kantai) If the node is mining, the node can spend an inordinate amount of time in the
RunTenure
step, but in doing so, will starve itself from running theProcessTenure
step (especially if there are manyRunTenure
steps in the pipeline). The node should immediately broadcast a block or microblock it produces at the end ofRunTenure
, instead of waiting forProcessTenure
.The required fixes are as follows:
BlocksAvailable
/MicroblocksAvailable
messages to inbound peers.RunTenure
requests for which this is not true.The text was updated successfully, but these errors were encountered: