[net] high latency when fetching blocks in certain cases #2986

jcnelson · 2022-01-07T18:57:41Z

This issue is meant to track a set of issues related to a high latency observed between processing a sortition and processing the associated Stacks epochs. In watching a node process blocks both during bootup and steady-state for a few weeks, I've reached the following conclusions:

During bootup, nodes can wait a long time to fetch a Stacks block for a burnchain block, because the node will process all sortitions before it attempts to download blocks ([burnchain-download] download burnchain blocks in parallel to processing sortitions #2944)
During steady-state, there are two principal sources of latency: doing a periodic full block inventory sync, and a tendency for blocks to simply propagate slowly.
- Nodes will periodically (every 12 hours) synchronize all block inventories with their neighbors. If there are slow nodes, this can take a while. In my analysis of a node's operation for 3,000 blocks, at least 6 blocks took over 10 minutes to arrive once their sortitions were processed, because the node was spending all that time in the block inventory synchronization step.
- Nodes rarely push blocks directly to one another. Instead, they send BlocksAvailable and MicroblocksAvailable messages to remote nodes, with the expectation that the remote node will turn around and request the block and microblock data via the HTTP interface. The only times they'll push a block or microblock stream directly is when they either (1) mine the block, or (2) notice that a neighbor is missing a block or stream and push it over via the anti-entropy protocol. The latency induced by not pushing blocks has a very wide distribution, and can add as much as 120 seconds of delay between when a sortition is processed and when the block is downloaded.
(via @kantai) If the node is mining, the node can spend an inordinate amount of time in the RunTenure step, but in doing so, will starve itself from running the ProcessTenure step (especially if there are many RunTenure steps in the pipeline). The node should immediately broadcast a block or microblock it produces at the end of RunTenure, instead of waiting for ProcessTenure.

The required fixes are as follows:

Address [burnchain-download] download burnchain blocks in parallel to processing sortitions #2944
Remove the full inventory sync feature. The node can get away with doing a single full inventory sync when it boots up, and then in the unlikely event that a block or microblock stream from over 2 reward cycles ago becomes available, the anti-entropy protocol can take care of propagating it. No need to delay block downloads actively searching for missing data in prior reward cycles.
Forward blocks and microblock streams to outbound peers, unconditionally. Send BlocksAvailable / MicroblocksAvailable messages to inbound peers.
If the miner mined a block in this sortition, then immediately try to push the block to any new neighbors that connect and don't have the block.
Before trying to mine, verify that the target parent block is still the chain tip. Drop RunTenure requests for which this is not true.
Immediately broadcast a mined block or microblock once it is produced; don't do so in a subsequent relayer loop pass.
Do not query a peer's block inventory more than once per reward cycle during initial block download, since this stalls block downloads.

The text was updated successfully, but these errors were encountered:

MaksimalistT · 2022-01-11T11:20:12Z

Hi, @jcnelson
i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade
could this issue be a reason for that? and what would you recommend to mitigate it?

MaksimalistT · 2022-01-20T13:37:04Z

Hi, @jcnelson i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade could this issue be a reason for that? and what would you recommend to mitigate it?

Hi, @jcnelson
what should be present in debug logs when latency issue occur?
is it an option to mitigate it by switching to other mining node?
i can see one of the miners is doing it, and he has much less invalid block commits then the others
appreciate your help

stale · 2023-01-22T14:34:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jcnelson · 2023-02-22T03:07:40Z

All of these points have been addressed now for some time in 2.05.0.6.0.

jcnelson self-assigned this Jan 7, 2022

jcnelson mentioned this issue Jan 9, 2022

Minimize time between sortitions and block-acceptance #2989

Merged

stale bot added the stale label Jan 22, 2023

jcnelson closed this as completed Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[net] high latency when fetching blocks in certain cases #2986

[net] high latency when fetching blocks in certain cases #2986

jcnelson commented Jan 7, 2022 •

edited

Loading

MaksimalistT commented Jan 11, 2022 •

edited

Loading

MaksimalistT commented Jan 20, 2022

stale bot commented Jan 22, 2023

jcnelson commented Feb 22, 2023

[net] high latency when fetching blocks in certain cases #2986

[net] high latency when fetching blocks in certain cases #2986

Comments

jcnelson commented Jan 7, 2022 • edited Loading

MaksimalistT commented Jan 11, 2022 • edited Loading

MaksimalistT commented Jan 20, 2022

stale bot commented Jan 22, 2023

jcnelson commented Feb 22, 2023

jcnelson commented Jan 7, 2022 •

edited

Loading

MaksimalistT commented Jan 11, 2022 •

edited

Loading