Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net] high latency when fetching blocks in certain cases #2986

Closed
6 of 7 tasks
jcnelson opened this issue Jan 7, 2022 · 4 comments · Fixed by #2989
Closed
6 of 7 tasks

[net] high latency when fetching blocks in certain cases #2986

jcnelson opened this issue Jan 7, 2022 · 4 comments · Fixed by #2989
Assignees
Labels

Comments

@jcnelson
Copy link
Member

jcnelson commented Jan 7, 2022

This issue is meant to track a set of issues related to a high latency observed between processing a sortition and processing the associated Stacks epochs. In watching a node process blocks both during bootup and steady-state for a few weeks, I've reached the following conclusions:

  • During bootup, nodes can wait a long time to fetch a Stacks block for a burnchain block, because the node will process all sortitions before it attempts to download blocks ([burnchain-download] download burnchain blocks in parallel to processing sortitions #2944)

  • During steady-state, there are two principal sources of latency: doing a periodic full block inventory sync, and a tendency for blocks to simply propagate slowly.

    • Nodes will periodically (every 12 hours) synchronize all block inventories with their neighbors. If there are slow nodes, this can take a while. In my analysis of a node's operation for 3,000 blocks, at least 6 blocks took over 10 minutes to arrive once their sortitions were processed, because the node was spending all that time in the block inventory synchronization step.

    • Nodes rarely push blocks directly to one another. Instead, they send BlocksAvailable and MicroblocksAvailable messages to remote nodes, with the expectation that the remote node will turn around and request the block and microblock data via the HTTP interface. The only times they'll push a block or microblock stream directly is when they either (1) mine the block, or (2) notice that a neighbor is missing a block or stream and push it over via the anti-entropy protocol. The latency induced by not pushing blocks has a very wide distribution, and can add as much as 120 seconds of delay between when a sortition is processed and when the block is downloaded.

  • (via @kantai) If the node is mining, the node can spend an inordinate amount of time in the RunTenure step, but in doing so, will starve itself from running the ProcessTenure step (especially if there are many RunTenure steps in the pipeline). The node should immediately broadcast a block or microblock it produces at the end of RunTenure, instead of waiting for ProcessTenure.

The required fixes are as follows:

  • Address [burnchain-download] download burnchain blocks in parallel to processing sortitions #2944
  • Remove the full inventory sync feature. The node can get away with doing a single full inventory sync when it boots up, and then in the unlikely event that a block or microblock stream from over 2 reward cycles ago becomes available, the anti-entropy protocol can take care of propagating it. No need to delay block downloads actively searching for missing data in prior reward cycles.
  • Forward blocks and microblock streams to outbound peers, unconditionally. Send BlocksAvailable / MicroblocksAvailable messages to inbound peers.
  • If the miner mined a block in this sortition, then immediately try to push the block to any new neighbors that connect and don't have the block.
  • Before trying to mine, verify that the target parent block is still the chain tip. Drop RunTenure requests for which this is not true.
  • Immediately broadcast a mined block or microblock once it is produced; don't do so in a subsequent relayer loop pass.
  • Do not query a peer's block inventory more than once per reward cycle during initial block download, since this stalls block downloads.
@MaksimalistT
Copy link

MaksimalistT commented Jan 11, 2022

Hi, @jcnelson
i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade
could this issue be a reason for that? and what would you recommend to mitigate it?

@MaksimalistT
Copy link

Hi, @jcnelson i am observing that my node has significantly higher amount of "Invalid block commit: missed target block" compared to others and this problem appears much more often after 2.05 upgrade could this issue be a reason for that? and what would you recommend to mitigate it?

Hi, @jcnelson
what should be present in debug logs when latency issue occur?
is it an option to mitigate it by switching to other mining node?
i can see one of the miners is doing it, and he has much less invalid block commits then the others
appreciate your help

@stale
Copy link

stale bot commented Jan 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 22, 2023
@jcnelson
Copy link
Member Author

All of these points have been addressed now for some time in 2.05.0.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants