Fix/5193 stackerdb decoherence #5197

jcnelson · 2024-09-16T21:15:24Z

This fixes #5193 by having all p2p state machines (namely, both epoch 2.x and Nakamoto inv sync and StackerDB) track and report their pinned connections to the peer network, so they won't be pruned. The cause of the decoherency seems to have been that once a peer's outbound neighbor count exceeded [connection_opts].soft_max_neighbors_per_org or one of the other similar limits, the pruner would simply close the newer connections until the number of connections was brought down. This would often happen during StackerDB sync (and would also happen in inv sync), which would have the effect of a node with many neighbors failing to synchronize their StackerDB replicas.

This I suspect was also the cause of the decoherence we would see with larger Nakamoto testnets, where the soft limits on the number of neighbors were exceeded.

You can see the effect of this PR in /v2/neighbors -- inbound and outbound peer entries now report an age (in seconds), which should rarely be reset due to the pinning. Before, neighbors would come and go very quickly as state machines connected to them and the pruner immediately disconnected them.

Leaving as a draft for now so I can test this live with the Nakamoto testnet signers.

…et pruned

…o inv sync peers

…te machine are using

… on irrecoverable error

…as to verify that connection pinning prevents decoherence

kantai

LGTM, will approve once we can confirm it resolves the issue.

obycode

LGTM, just flagged one typo

stackslib/src/net/neighbors/comms.rs

…ignore flag set Signed-off-by: Jacinta Ferrant <[email protected]>

…ting for unnecessary signatures Signed-off-by: Jacinta Ferrant <[email protected]>

This allows us to avoid hitting block 240, which is when the stackers get unstacked and the chain stalls, making `partial_tenure_fork` less flaky

jcnelson · 2024-09-20T14:19:10Z

I am testing this on mainnet along with my other in-flight PRs, and I think I'm getting OOM'ed. I need to confirm first.

wileyj · 2024-09-20T15:52:45Z

I am testing this on mainnet along with my other in-flight PRs, and I think I'm getting OOM'ed. I need to confirm first.

will also run this branch to see if i can reproduce

Co-authored-by: Brice Dobry <[email protected]>

…erence Fix/5193 stackerdb decoherence

blockstack-devops · 2024-10-26T00:20:20Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jcnelson requested review from hstove, kantai, obycode, wileyj and jferrant September 16, 2024 21:15

jcnelson added 10 commits September 16, 2024 17:18

feat: report peer age in /v2/neighbors

edc7318

feat: compute peer age for p2p convo

95301f3

fix: pin connections to peers we're inv-syncing with, so they don't g…

6af50f5

…et pruned

fix: report pinned connections so the pruner won't disconnect nakamot…

25e84f2

…o inv sync peers

fix: don't unpin a connection once it connects

c514062

fix: don't prune connections the inv state machines and stackerdb sta…

a76ffa4

…te machine are using

chore: keep stackerdb replicas pinned across restarts, and only unpin…

fdee274

… on irrecoverable error

chore: enhance stackerdb test to force the network pruner to run, so …

7f34262

…as to verify that connection pinning prevents decoherence

chore: fix test

dbf7bf5

Merge branch 'develop' into fix/5193-stackerdb-decoherence

9391403

jferrant previously approved these changes Sep 17, 2024

View reviewed changes

kantai reviewed Sep 17, 2024

View reviewed changes

Merge branch 'develop' into fix/5193-stackerdb-decoherence

1637273

saralab added the 3.0-must label Sep 17, 2024

saralab marked this pull request as ready for review September 17, 2024 20:06

saralab requested a review from a team as a code owner September 17, 2024 20:06

obycode previously approved these changes Sep 17, 2024

View reviewed changes

stackslib/src/net/neighbors/comms.rs Outdated Show resolved Hide resolved

jcnelson and others added 3 commits September 18, 2024 02:20

Merge branch 'develop' into fix/5193-stackerdb-decoherence

707f7cb

Merge branch 'develop' into fix/5193-stackerdb-decoherence

4e0bf1c

Do not count received valid signatures towards threshold weight when …

5614112

…ignore flag set Signed-off-by: Jacinta Ferrant <[email protected]>

jferrant dismissed stale reviews from obycode and themself via 5614112 September 19, 2024 18:12

jferrant and others added 3 commits September 19, 2024 13:26

Do not assume every signers signature makes it before miner quits wai…

b2acfd7

…ting for unnecessary signatures Signed-off-by: Jacinta Ferrant <[email protected]>

test: move the 2.5 and 3.0 activation heights earlier for this test

2cdd31b

This allows us to avoid hitting block 240, which is when the stackers get unstacked and the chain stalls, making `partial_tenure_fork` less flaky

test: reduce flakiness in partial_tenure_fork integration test

7ef8809

obycode mentioned this pull request Sep 20, 2024

Fix/5205 #5206

Merged

Merge branch 'develop' into fix/5193-stackerdb-decoherence

becca5e

obycode and others added 5 commits September 20, 2024 12:33

Merge branch 'develop' into fix/5193-stackerdb-decoherence

3f400b5

Merge branch 'develop' into fix/5193-stackerdb-decoherence

5fd6ed8

fix: typo

2140b44

Co-authored-by: Brice Dobry <[email protected]>

Merge branch 'develop' into fix/5193-stackerdb-decoherence

3d1e690

Merge branch 'develop' into fix/5193-stackerdb-decoherence

f2f16c0

obycode approved these changes Sep 23, 2024

View reviewed changes

kantai approved these changes Sep 23, 2024

View reviewed changes

wileyj enabled auto-merge September 23, 2024 17:49

wileyj added this pull request to the merge queue Sep 23, 2024

github-merge-queue bot pushed a commit that referenced this pull request Sep 23, 2024

Merge pull request #5197 from stacks-network/fix/5193-stackerdb-decoh…

97f8723

…erence Fix/5193 stackerdb decoherence

Merged via the queue into develop with commit 2ac4df0 Sep 23, 2024
1 check passed

wileyj mentioned this pull request Sep 23, 2024

[Network] StackerDB decoherence #5193

Closed

blockstack-devops added the locked label Oct 26, 2024

stacks-network locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/5193 stackerdb decoherence #5197

Fix/5193 stackerdb decoherence #5197

jcnelson commented Sep 16, 2024

kantai left a comment

obycode left a comment

jcnelson commented Sep 20, 2024

wileyj commented Sep 20, 2024

blockstack-devops commented Oct 26, 2024

Fix/5193 stackerdb decoherence #5197

Fix/5193 stackerdb decoherence #5197

Conversation

jcnelson commented Sep 16, 2024

kantai left a comment

Choose a reason for hiding this comment

obycode left a comment

Choose a reason for hiding this comment

jcnelson commented Sep 20, 2024

wileyj commented Sep 20, 2024

blockstack-devops commented Oct 26, 2024