Fix: snappy downloader #5393

jcnelson · 2024-10-28T20:55:11Z

This fixes a few bugs in the relayer and networking stack:

It removes a convoy effect that can happen when the node is under load. Before, the channel between the p2p thread and relayer thread could grow unbounded if the relayer couldn't keep up with bursts of NetworkResults. In this PR, the p2p thread merges outstanding NetworkResults into a single NetworkResult and drops / consolidates obsolete data, which both minimizes the relayer's total workload and minimizes the time between receiving a data-bearing message and processing it.
It fixes the block downloader so that it detects and deprioritizes unhealthy replicas during block download, so that most of the time, the node is only querying replicas that can serve it data. It also improves error and retry logging in the downloader.
To stress-test the downloader, it adds an option to disable block-push altogether, so the node is forced to download everything
It fixes an off-by-one error in the p2p stack which was preventing it from caching reward sets. Instead, the p2p stack would always fetch reward sets from disk, which lead to performance degradation.

testnet/stacks-node/src/nakamoto_node/relayer.rs

…if there's download pressure

… so that we only forward results that contain blocks (drop tx and stackerdb messages)

…-network/stacks-blockchain into fix/relayer-drain-channel

…ded), and merge un-sent NetworkResult's in order to keep the queue length bound to at most one outstanding NetworkResult

… and clean out completed tenures based on whether or not we see them processed

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs

jcnelson · 2024-11-03T04:57:23Z

There's still something weird happening with this PR. My test node has repeatedly gotten itself stuck at the same block height for hours, with not even so much as an attempt to download missing block data (despite it witnessing the Bitcoin chain advancing). Need to dig more into this.

…o current or next reward cycle

…ight

…-network/stacks-blockchain into fix/relayer-drain-channel

jcnelson · 2024-11-05T03:15:46Z

Okay, this is now working again. The fix was to disconnect from nodes that served seemingly-stale data via their unconfirmed tenure downloader interface. There's at least one Stacks 2.5 node out there still running, and it was consistently replying to the unconfirmed downloader and inadvertently preventing it from making progress (since the bug caused the downloader to wait forever for the remote peer's stale view to be corrected).

…tale (otherwise we would cease to make progress if the node never caught up), and throttle down unconfirmed download checks

obycode

This LGTM! I just had one minor refactoring request.

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs

… BurnchainDB

stackslib/src/net/download/nakamoto/download_state_machine.rs

kantai

LGTM, just a few comments.

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs

obycode

LGTM

jferrant · 2024-11-05T22:27:02Z

I think this breaks simple_neon_integration test. I don't see this failing anywhere else (passes on develop with prom metrics enabled). It seems to be there was a change to the prometheus metric in this PR that is screwing it up.

jferrant

Will reapprove once simple_neon_integration test is fixed.

jcnelson requested review from kantai, obycode and jferrant October 28, 2024 20:55

jcnelson requested a review from a team as a code owner October 28, 2024 20:55

fix: drain the relayer channel to alleviate block download pressure

28fb59b

jferrant previously approved these changes Oct 28, 2024

View reviewed changes

kantai reviewed Oct 28, 2024

View reviewed changes

testnet/stacks-node/src/nakamoto_node/relayer.rs Outdated Show resolved Hide resolved

jcnelson dismissed jferrant’s stale review via 61d701b October 28, 2024 21:36

jcnelson and others added 20 commits October 28, 2024 17:48

chore: only consider block-bearing network results if in ibd mode or …

61d701b

…if there's download pressure

chore: shed network results when in ibd or with download backpressure…

e8cb18f

… so that we only forward results that contain blocks (drop tx and stackerdb messages)

chore: fix compile issues

be96888

fix: drive main loop wakeups when we're backlogged

c5ec5b3

Merge branch 'develop' into fix/relayer-drain-channel

0d26d50

chore: option to disable block pushes

0685670

Merge branch 'fix/relayer-drain-channel' of https://github.com/stacks…

9bc3125

…-network/stacks-blockchain into fix/relayer-drain-channel

feat: make NetworkResults mergeable

babd3d9

chore: make StackerDBSyncResult Debug and PartialEq

596d41d

chore: test NetworkResult::update()

1fc9d72

chore: remove logic to drain the network result channel (it's not nee…

225ada1

…ded), and merge un-sent NetworkResult's in order to keep the queue length bound to at most one outstanding NetworkResult

chore: remove dead code

07b65cb

chore: count download attempts, and don't download processed tenures,…

27e7301

… and clean out completed tenures based on whether or not we see them processed

chore: p2p --> relayer channel only needs one slot

d1bf24f

chore: pub(crate) visibility to avoid private leakage

72c9f54

chore: log attempt failures, and only start as many downloaders as given

9361bea

chore: log more downloader diagnostics

c88d0e6

chore: deprioritize unreliable peers

fa493d5

Merge branch 'develop' into fix/relayer-drain-channel

71eccc9

Merge branch 'develop' into fix/relayer-drain-channel

26d8a4d

jcnelson changed the title ~~Fix: drain relayer channel~~ Fix: snappy downloader Nov 1, 2024

jcnelson requested a review from jferrant November 1, 2024 21:13

jferrant reviewed Nov 2, 2024

View reviewed changes

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs Outdated Show resolved Hide resolved

jferrant reviewed Nov 2, 2024

View reviewed changes

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs Show resolved Hide resolved

jcnelson added 9 commits November 4, 2024 17:23

chore: API sync

ca3b2ae

fix: use burnchain tip reward cycle to infer whether or not to sync t…

06108f2

…o current or next reward cycle

chore: API sync

3369de5

chore: store burnchain DB handle in p2p network and load burnchain he…

0e1058e

…ight

chore: API sync, and test fixes

ad4faaf

chore: API sync

4365ebf

Merge branch 'fix/relayer-drain-channel' of https://github.com/stacks…

19b7c94

…-network/stacks-blockchain into fix/relayer-drain-channel

Merge branch 'develop' into fix/relayer-drain-channel

30292cb

chore: address PR feedback

18be1fe

jcnelson added 2 commits November 4, 2024 22:25

fix: disconnect from neighbors serving unconfirmed tenures that are s…

bce9839

…tale (otherwise we would cease to make progress if the node never caught up), and throttle down unconfirmed download checks

Merge branch 'develop' into fix/relayer-drain-channel

5ed737a

obycode reviewed Nov 5, 2024

View reviewed changes

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs Outdated Show resolved Hide resolved

fix: fix unit tests that broke due to PeerNetwork needing an existing…

ead50a8

… BurnchainDB

kantai reviewed Nov 5, 2024

View reviewed changes

stackslib/src/net/download/nakamoto/download_state_machine.rs Outdated Show resolved Hide resolved

kantai reviewed Nov 5, 2024

View reviewed changes

jferrant reviewed Nov 5, 2024

View reviewed changes

stackslib/src/net/download/nakamoto/tenure_downloader_set.rs Outdated Show resolved Hide resolved

jferrant previously approved these changes Nov 5, 2024

View reviewed changes

Merge branch 'develop' into fix/relayer-drain-channel

9731059

jcnelson dismissed jferrant’s stale review via 45adc33 November 5, 2024 21:51

jcnelson requested review from obycode, jferrant and kantai November 5, 2024 21:52

obycode approved these changes Nov 5, 2024

View reviewed changes

chore: address remaining PR feedback and get tests to pass

45adc33

jferrant requested changes Nov 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: snappy downloader #5393

Fix: snappy downloader #5393

jcnelson commented Oct 28, 2024 •

edited

Loading

jcnelson commented Nov 3, 2024

jcnelson commented Nov 5, 2024

obycode left a comment

kantai left a comment

obycode left a comment

jferrant commented Nov 5, 2024 •

edited

Loading

jferrant left a comment

Fix: snappy downloader #5393

Are you sure you want to change the base?

Fix: snappy downloader #5393

Conversation

jcnelson commented Oct 28, 2024 • edited Loading

jcnelson commented Nov 3, 2024

jcnelson commented Nov 5, 2024

obycode left a comment

Choose a reason for hiding this comment

kantai left a comment

Choose a reason for hiding this comment

obycode left a comment

Choose a reason for hiding this comment

jferrant commented Nov 5, 2024 • edited Loading

jferrant left a comment

Choose a reason for hiding this comment

jcnelson commented Oct 28, 2024 •

edited

Loading

jferrant commented Nov 5, 2024 •

edited

Loading