Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/more multi miner fixes jude #5093

Merged
merged 83 commits into from
Sep 12, 2024
Merged

Conversation

jcnelson
Copy link
Member

@jcnelson jcnelson commented Aug 20, 2024

This PR fixes several more bugs and performance regressions in the networking stack and relayer. These bugs were discovered in a six-node topology with three miners and three stacker-signer followers, where each node was paired to its own bitcoin node and each stacker-signer follower was paired to a single follower (see schematic below):

            `bitcoind 0` <-----> `bitcoind 1` <-----> `bitcoind 2` <-----> `bitcoind 3` <-----> `bitcoind 4` <-----> `bitcoind 5`
                 |                    |                    |                    |                    |                    |
                 |                    |                    |                    |                    |                    |
                 V                    V                    V                    V                    V                    V
             `stacks 0`  <----->  `stacks 1`  <----->  `stacks 2`  <----->  `stacks 3`  <----->  `stacks 4`  <----->  `stacks 5`
             `(miner) `           `(miner) `           `(miner) `           `(follower)`         `(follower)`         `(follower)`
                                                                                ^                    ^                    ^
                                                                                |                    |                    |
                                                                                |                    |                    |
                                                                            `signer 0`           `signer 1`           `signer 2`

Fixes included the following:

  • Fixes the wanted tenure and available tenure calculations in the Nakamoto downloader at a reward cycle boundary. Before, the available tenure list could be empty on a reward cycle boundary, which could lead to a downloader stall.
  • Standardizes the first-block-height calculation of a reward cycle in the Nakamoto downloader, leading to the deletion of about 100 lines of code.
  • Fixes the confirmed/unconfirmed download state transition logic by removing a bunch of code.
  • Fixes the Nakamoto tenure downloader to return the tenure-start block of the next tenure as well as the tenure's blocks, so that when the relayer stores them and the chains coordinator processes them, the tenure will be treated as "finished" by the chainstate DB (which the downloader requires in order to mark the tenure as "fully obtained").
  • The relayer will now push StackerDB chunks in addition to the StackerDB state machine. It will forward chunks that it was able to successfully store, indicating that the chunk was novel and it's likely that the node's neighbors need it.
  • The Nakamoto inventory state machine will perform a configurable burst of inventory syncs with its neighbors right after a sortition is detected. This helps the downloader quickly detect when a neighbor has the next tenure-start block, so if the node doesn't get the tenure-start block pushed to it, it will likely find it shortly.
  • Fixes Nakamoto block push by removing the late-stage check to see if the block had already been stored and aborting if so. This was a bug because the caller will have already stored the block before the relayer has a chance to consider it. Instead, the relayer will assume the caller is only trying to relay the block because the caller determined that the block was novel, as had been the case all along.
  • Fixes the Nakamoto main loop to honor burnchain.poll_time_secs.

Things I still need to do:

  • Test the wanted and available tenure calculation at a reward cycle boundary in a unit test
  • Test StackerDB sync in a unit test whereby A synchronizes blocks with B, and B pushes blocks to C (and C doesn't run the StackerDB synchronization)
  • Consolidate some debug output

…a better understanding of when it becomes readable by other threads
… wanted tenure and availability calculations at reward cycle boundaries, and fix the transition logic for confirmed/unconfirmed states
…'t abort a walk because we can't find any always-allowed peers (use the seed peers in that case)
…nakamoto blocks if told to (don't look at the chainstate, since the caller stores the block before relaying it). This latter change fixes nakamoto block push
@saralab saralab requested review from hstove and obycode August 20, 2024 14:39
@kantai kantai dismissed stale reviews from obycode and themself via bfdb6e4 September 12, 2024 17:53
@kantai
Copy link
Member

kantai commented Sep 12, 2024

Pushed bfdb6e4 -- this should fix the unit test failures. The matches!() macro invocations were backwards -- if you match on the enum variant first, it treats the variable afterwards as a binding. Cargo would warn against this, but unused variable warnings are disabled in a lot of the testing codebase especially.

kantai
kantai previously approved these changes Sep 12, 2024
Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

After checking the failing CI tests:

Failed tests:

-- this is new to this PR, and it seems like it just has networking issues in CI? nakamoto_integrations::follower_bootup_across_multiple_cycles

v0::bitcoind_forking_test * fixed downstream in #5138
v0::miner_forking * broken in develop, fixed by another PR against develop

Working locally

v0::forked_tenure_okay
nakamoto_integrations::mock_mining

This seems okay to merge as is -- we can work on making sure the downstream PRs fix the CI issues.

jferrant
jferrant previously approved these changes Sep 12, 2024
@kantai kantai added this pull request to the merge queue Sep 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 12, 2024
@kantai kantai dismissed stale reviews from jferrant and themself via ba1ce0c September 12, 2024 20:09
kantai
kantai previously approved these changes Sep 12, 2024
obycode
obycode previously approved these changes Sep 12, 2024
@obycode
Copy link
Contributor

obycode commented Sep 12, 2024

doh! looks like we still need a formatting fix.

@kantai kantai dismissed stale reviews from obycode and themself via 80c069f September 12, 2024 20:18
@kantai kantai added this pull request to the merge queue Sep 12, 2024
Merged via the queue into develop with commit 1f44a69 Sep 12, 2024
1 check passed
@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Oct 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants