Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/5193 stackerdb decoherence #5197

Merged
merged 23 commits into from
Sep 23, 2024
Merged

Fix/5193 stackerdb decoherence #5197

merged 23 commits into from
Sep 23, 2024

Conversation

jcnelson
Copy link
Member

This fixes #5193 by having all p2p state machines (namely, both epoch 2.x and Nakamoto inv sync and StackerDB) track and report their pinned connections to the peer network, so they won't be pruned. The cause of the decoherency seems to have been that once a peer's outbound neighbor count exceeded [connection_opts].soft_max_neighbors_per_org or one of the other similar limits, the pruner would simply close the newer connections until the number of connections was brought down. This would often happen during StackerDB sync (and would also happen in inv sync), which would have the effect of a node with many neighbors failing to synchronize their StackerDB replicas.

This I suspect was also the cause of the decoherence we would see with larger Nakamoto testnets, where the soft limits on the number of neighbors were exceeded.

You can see the effect of this PR in /v2/neighbors -- inbound and outbound peer entries now report an age (in seconds), which should rarely be reset due to the pinning. Before, neighbors would come and go very quickly as state machines connected to them and the pruner immediately disconnected them.

Leaving as a draft for now so I can test this live with the Nakamoto testnet signers.

jferrant
jferrant previously approved these changes Sep 17, 2024
Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will approve once we can confirm it resolves the issue.

@saralab saralab marked this pull request as ready for review September 17, 2024 20:06
@saralab saralab requested a review from a team as a code owner September 17, 2024 20:06
obycode
obycode previously approved these changes Sep 17, 2024
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just flagged one typo

stackslib/src/net/neighbors/comms.rs Outdated Show resolved Hide resolved
@jferrant jferrant dismissed stale reviews from obycode and themself via 5614112 September 19, 2024 18:12
jferrant and others added 3 commits September 19, 2024 13:26
…ting for unnecessary signatures

Signed-off-by: Jacinta Ferrant <[email protected]>
This allows us to avoid hitting block 240, which is when the stackers
get unstacked and the chain stalls, making `partial_tenure_fork` less
flaky
@obycode obycode mentioned this pull request Sep 20, 2024
@jcnelson
Copy link
Member Author

I am testing this on mainnet along with my other in-flight PRs, and I think I'm getting OOM'ed. I need to confirm first.

@wileyj
Copy link
Contributor

wileyj commented Sep 20, 2024

I am testing this on mainnet along with my other in-flight PRs, and I think I'm getting OOM'ed. I need to confirm first.

will also run this branch to see if i can reproduce

@wileyj wileyj added this pull request to the merge queue Sep 23, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 23, 2024
Merged via the queue into develop with commit 2ac4df0 Sep 23, 2024
1 check passed
@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Oct 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants