Minimize time between sortitions and block-acceptance #2989

jcnelson · 2022-01-09T03:53:53Z

This PR fixes #2986, #2944, and #2969.

The overarching goal of this PR is to improve chain quality by minimizing the time between when a sortition happens and when the miner can start producing the next block. This is achieved in three principal ways:

Aggressive block replication. This PR makes it so that peers directly push blocks and confirmed microblock streams to their outbound neighbors if they detect that they are missing them from their inventories. The original behavior was to send a BlocksAvailable or MicroblocksAvailable. Instead, these two messages will only be sent to inbound neighbors (whose inventories are not tracked).
Minimize time spent doing things besides block downloads. This PR removes the full inventory synchronization paths from the inventory state machine, which could cause a node to miss several sortitions' blocks every 12 hours (which had a detrimental effect on miners' ability to produce blocks on the chain tip). Instead, this PR makes it so that a node will only do a full inventory synchronization with its neighbors on boot-up, and from then on, it will only query the past two reward cycles' inventories.
Aggressively cancel stale RunTenure events and RunMicroblockTenure events. This PR updates the mining code-paths to check the current chain tip against each is7ued RunTenure, and drop the RunTenure if it refers to a now-stale chain tip. In addition, it checks the canonical Stacks chain tip after block assembly but before sending the block-commit, and cancels the operation if the canonical chain tip has changed. In addition, a similar set of checks are now done on RunMicroblockTenure. The reason for these changes is that the act of processing a RunTenure or RunMicroblockTenure directive in the relayer thread can take on the order of 30 seconds, during which time the chain tip they targeted can become stale. I believe this was the root cause of Multiple commits for btc block, missed target block #2969. With this change, the node's mock-miner reliably mines atop block-commits on the canonical chain.

In addition to making these improvements, this PR fixes three bugs, one of which is a show-stopper for 2.05.1.0.

A denial-of-service bug in the mempool synchronization logic. This must be merged into 2.05.0.1.0. Specifically, the mempool synchronization logic would do a full mempool scan on a query to /v2/mempool/query, which could take on the order of 500ms. This in turn would stall the network. We didn't notice it before because the effect was small during testing; ALEX's launch revealed that this is a problem. The fix is to make it so that the mempool synchronization happens in pages, where each page is only permitted to do a fixed number of mempool database queries.
A denial-of-service bug in the Bitcoin indexer whereby it would not be able to recover from a deep reorg. I actually discovered this first with my appchain MVP, and since noticed that this can happen in Bitcoin (and indeed, occasionally the bitcoind_forking_test flaps from it). Basically, there are cases where the reorg is deeper than the maximum number of blocks the indexer is allowed to synchronize in one go. If this happens, then the node gets stuck because it forgets to download the blocks in-between the reorg depth + maximum number height and the chain tip height. This PR fixes this by having the indexer drop the headers after reorg depth + maximum number, so the indexer is forced to re-fetch them and their blocks.
A lot of I/O paths are needlessly slow due to accidental database scans. This PR adds a set of indexes to the various databases that significantly improve boot-up and steady-state performance. In my testing, they cut boot-up time in half. I've also addressed [burnchain-download] download burnchain blocks in parallel to processing sortitions #2944, so that the node does not need to wait for all of a reward cycle's sortitions to be processed before downloading Stacks blocks. I already merged the highest-impact index update, but these ones also help. In addition, I've instrumented the test code so that if the BLOCKSTACK_DB_TRACE envar is set, the node will print out EXPLAIN QUERY PLAN statements in the log file that you can go and execute yourself against the test's databases. Using this new feature, I systematically went through each query that high I/O tests were doing and found queries that were doing (un-LIMIT-ed) table scans or creating temporary B-trees, and added indexes to prevent them.

In order to achieve the above, I refactored neon_node.rs and runloop.rs in order to gather related state into the RunLoop struct, and to break up a lot of path-dependent spaghetti code into single-purpose methods. This should improve maintainability going forward.

I realize that there's a lot here, but time really is of the essence with the mempool being as full as it is. I have three nodes running with this right now (one from boot-up, two from existing chain state), and they're all doing fine.

…tatus

…ain blocks while processing one reward cycle, so we can process stacks blocks concurrently as well.

…otup

…ecking to see if we're sync'ed up

…oblock push

…the next reward cycle to sync block data from by looking at the number of sortitions in the remote peer's block inventory, instead of the number of reward cycles in its PoX bit vector

…nce we bind to 0.0.0.0 but sometimes localhost resolves to ::1, leading to HTTP connection failures

…and to advertize block/microblock data

…tbound peer does not know about, push it directly to them (don't advertize it via an *Available message). Add tests to verify that push-exclusive behavior works.

… it around across burnchain syncs

…t expensive and would save us a lot of pain

…ol starts from the latest reward cycle and runs backwards; make it so that antientropy protocol restarts whenever the burnchain view changes (to force it to reconsider the latest blocks)

codecov · 2022-01-10T00:02:09Z

Codecov Report

Merging #2989 (c395794) into develop (458cc9d) will decrease coverage by 0.14%.
The diff coverage is 61.95%.

@@             Coverage Diff             @@
##           develop    #2989      +/-   ##
===========================================
- Coverage    82.68%   82.54%   -0.15%     
===========================================
  Files          242      242              
  Lines       194487   195380     +893     
===========================================
+ Hits        160821   161278     +457     
- Misses       33666    34102     +436

Impacted Files	Coverage Δ
src/burnchains/db.rs	`94.83% <ø> (ø)`
src/chainstate/stacks/db/mod.rs	`86.78% <ø> (ø)`
src/net/download.rs	`41.25% <0.00%> (-0.09%)`	⬇️
...t/stacks-node/src/burnchains/mocknet_controller.rs	`77.05% <0.00%> (-1.71%)`	⬇️
testnet/stacks-node/src/burnchains/mod.rs	`52.38% <ø> (ø)`
testnet/stacks-node/src/main.rs	`0.62% <ø> (ø)`
src/net/p2p.rs	`58.46% <21.57%> (-3.13%)`	⬇️
src/net/relay.rs	`32.74% <28.52%> (-1.34%)`	⬇️
src/net/rpc.rs	`30.78% <37.09%> (+0.17%)`	⬆️
src/net/mod.rs	`67.36% <50.00%> (-0.12%)`	⬇️
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 458cc9d...c395794. Read the comment docs.

…on't get stuck forever

…the burnchain (even if we haven't downloaded everything yet)

…correctly infer ibd status

kantai · 2022-01-14T21:30:19Z

Any progress on this?

The network has been experiencing very high orphan rates, so I think the priority of resolving #2986 should be pretty high. There may be other contributing factors, so it would be useful to figure out how much this PR would improve the current situation.

jcnelson · 2022-01-15T04:25:03Z

I discovered a couple interesting things today when investigating why bootup was taking longer than I thought.

First, as you know, the chains-coordinator thread will process either a Stacks block, a burnchain block, or neither, but never both at the same time. What this means is that when the node is booting, there's no way to download and process burnchain blocks in parallel to processing Stacks blocks -- the coordinator serializes them regardless of what we do in the sync_with_indexer() method or the Neon run loop. While this PR does change the Neon run loop so that burnchain blocks are downloaded in parallel to Stacks block processing, this does not address this bottleneck.

Second, I discovered that as the bootup progresses, it becomes more and more likely that the chains-coordinator thread spends a lot of time in the handle_new_burnchain_block() loop, which queries multiple unprocessed burnchain blocks: https://github.com/blockstack/stacks-blockchain/blob/master/src/chainstate/coordinator/mod.rs#L570. In the worst cases, this stall can take on the order of 100 seconds, and the returned list of unprocessed burnchain blocks is on the order of 1,000 entries long. This loop returns multiple entries about 5% of the time during bootup, and they get longer as the node runs. This could be due to the fact that a slew of Stacks blocks getting processed at once can stall sortition processing, but not stall burnchain block downloads.

My node finished sync'ing yesterday; it spent most of its time processing blocks after height 30,000. I'll keep it running over the weekend. While I don't think we'll see any really interesting outlier behavior until we have a few thousand sortitions tracked, I think we should be able to determine whether or not some of the the other latency-lowering tactics in this PR work by comparing it to how develop currently works. However, it will be hard to gauge the effectiveness of one of the main changes in this PR -- having the node push blocks to peers that want them -- without running several public nodes with it applied. Currently, I'm only running this PR on a NAT'ed node. I'll try spinning up a public one as well tomorrow; even if only a few public nodes run this new pushing-behavior PR, we should start to see a reduction in sortition-to-download times.

I haven't had a chance yet to modify the miner run loop behavior to make it coalesce ProcessTenure and RunTenure requests. I'll get to that hopefully tomorrow.

jcnelson · 2022-01-21T06:02:09Z

Okay, codecov is misbehaving, which is causing everything to break.

kantai · 2022-01-21T13:59:21Z

Okay, codecov is misbehaving, which is causing everything to break.

Hmm-- I'd guess possibly due to the Github org name change

kantai · 2022-01-22T15:54:23Z

src/core/mempool.rs

+                } else if let Some(next_txid) = next_last_randomized_txid_opt {
+                    test_debug!("No rows returned for {}", &query.last_randomized_txid);
+
+                    // no rows found
+                    query.last_randomized_txid = next_txid;
+
+                    // send the next page ID
+                    query.tx_buf_ptr = 0;
+                    query.tx_buf.clear();
+                    query.corked = true;
+
                    test_debug!(
-                        "No more txs in query after {:?}",
+                        "Cork tx stream with next page {}",
                        &query.last_randomized_txid
                    );
-                    break;
+                    query
+                        .last_randomized_txid
+                        .consensus_serialize(&mut query.tx_buf)
+                        .map_err(ChainstateError::CodecError)?;


According to code coverage, this branch isn't covered -- is it possible to add coverage for this to the unit tests?

Yes; will do

src/net/http.rs

kantai · 2022-01-22T16:03:59Z

src/net/p2p.rs

+                if let Some(txs) = txs_opt {
+                    debug!(
+                        "{:?}: Mempool sync obtained {} transactions from mempool sync, but have more",
+                        &self.local_peer,
+                        txs.len()
+                    );
+
+                    return Ok(Some(txs));


The if let Some(...) branch here appears to not be covered

~~I don't think that code is even reachable. I will just remove it instead.~~ Nevermind; this is supposed to be exercised through pagination. Will update the mempool sync test to do so.

Hmm, it's definitely getting exercised in test_mempool_sync_2_peers_paginated. I wonder why codecov didn't report it. Is it because it's an #[ignore]'ed test?

Yes -- that'd be why -- I think the #[ignore] tests aren't getting executed by the code coverage unit tests.

src/net/p2p.rs

MaksimalistT · 2022-01-23T18:26:03Z

Hi, @jcnelson, @kantai
my node is struggling with high frequency of invalid blocks and would be happy to increase number of txs mined per block to help the network and earn more in fees
willing to upgrade my node before official release
which branches are addressing this issues and safe enough to use?

…rious ways that transaction streaming can fail

…k/stacks-blockchain into feat/faster-bootup

…before advancing to the next tenure

kantai

This LGTM, thanks @jcnelson!

pavitthrap

lgtm - just some minor comments

pavitthrap · 2022-01-20T23:56:18Z

src/util/db.rs

+        sql_query.to_string()
+    };
+
+    while let Some(part) = parts.next() {


Could you do a replace here?

I'm not sure what you're asking? The code is constructing a new string because trying to replace ?X with "mock_arg" would in effect be doing the same thing.

pavitthrap · 2022-01-25T16:28:41Z

src/net/p2p.rs

@@ -2833,7 +2862,8 @@ impl PeerNetwork {
            reward_cycle_finish
        );

-        for reward_cycle in reward_cycle_start..reward_cycle_finish {
+        // go from latest to earliest reward cycle
+        for reward_cycle in (reward_cycle_finish..reward_cycle_start).rev() {


to go from latest to earliest reward cycle, I think you would need to iterate from reward_cycle_start to reward_cycle_finish

Ah, but reward_cycle_finish <= reward_cycle_start. The anti-entropy protocol works backwards, from latest reward cycle to earliest reward cycle.

if reward_cycle_finish <= reward_cycle_start, wouldn't that make reward_cycle_start the latest reward cycle?

Yes, it would. Here's an execution trace of some of the debug statements from a unit test:

$ grep -E 'Local blocks inventory|over reward cycles' /tmp/test.out DEBG [1643228629.268898] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 6-5 DEBG [1643228629.273409] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 5 is BlocksInvData { bitlen: 5, block_bitvec: [31], microblocks_bitvec: [30] } DEBG [1643228630.151754] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 6-5 DEBG [1643228630.155805] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 5 is BlocksInvData { bitlen: 5, block_bitvec: [3], microblocks_bitvec: [2] } DEBG [1643228631.350438] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 5-4 DEBG [1643228631.353977] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 4 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228632.425082] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 5-4 DEBG [1643228632.428814] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 4 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228633.348980] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 4-3 DEBG [1643228633.352538] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 3 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228634.376591] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 4-3 DEBG [1643228634.380029] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 3 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228635.396177] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 3-2 DEBG [1643228635.399907] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 2 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228636.350704] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 3-2 DEBG [1643228636.354618] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 2 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228637.306919] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 2-1 DEBG [1643228637.310311] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 1 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228638.418983] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 2-1 DEBG [1643228638.423283] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 1 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228639.353688] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 1-0 DEBG [1643228639.357670] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 0 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228640.394823] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 1-0 DEBG [1643228640.399527] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 0 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228641.443450] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 6-5 DEBG [1643228641.452464] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 5 is BlocksInvData { bitlen: 5, block_bitvec: [31], microblocks_bitvec: [30] } DEBG [1643228642.554711] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 6-5 DEBG [1643228642.560304] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 5 is BlocksInvData { bitlen: 5, block_bitvec: [15], microblocks_bitvec: [14] } DEBG [1643228643.357928] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 5-4 DEBG [1643228643.363271] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 4 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228644.372366] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 5-4 DEBG [1643228644.376326] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 4 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228645.404950] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 4-3 DEBG [1643228645.408651] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 3 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228646.427323] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 4-3 DEBG [1643228646.432209] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 3 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228647.400470] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 3-2 DEBG [1643228647.405313] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 2 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228648.366515] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 3-2 DEBG [1643228648.371189] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 2 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228649.301985] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 2-1 DEBG [1643228649.306562] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 1 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228650.444922] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 2-1 DEBG [1643228650.449579] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 1 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228651.346471] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 1-0 DEBG [1643228651.352485] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 0 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228652.412420] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 1-0 DEBG [1643228652.417380] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4242)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 0 is BlocksInvData { bitlen: 5, block_bitvec: [0], microblocks_bitvec: [0] } DEBG [1643228653.391679] [src/net/p2p.rs:2857] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: run protocol for 1 neighbors, over reward cycles 6-5 DEBG [1643228653.400161] [src/net/p2p.rs:2879] [ThreadId(2)] local.80000000://(bind=127.0.0.1:4240)(pub=None): AntiEntropy: Local blocks inventory for reward cycle 5 is BlocksInvData { bitlen: 5, block_bitvec: [31], microblocks_bitvec: [30] }

As you can see, the system starts from the highest reward cycle, looks for data to push, and on the next pass, starts from the next-highest reward cycle, and so on and so forth until it wraps around. The idea is that other nodes are most likely to be missing recent blocks, so the antientropy system should prioritize them first.

src/net/p2p.rs

src/net/relay.rs

testnet/stacks-node/src/neon_node.rs

jcnelson added 16 commits December 8, 2021 12:51

chore: pass ibd status to relayer

589c68b

fix: don't send blocksavailable messages while in ibd (#1661)

7fdc3ac

fix: pass thru sync_comms so the relayer thread can be aware of ibd s…

1677624

…tatus

feat: attempt to process sortitions in parallel to downloading burnch…

eed9150

…ain blocks while processing one reward cycle, so we can process stacks blocks concurrently as well.

Merge branch 'feat/appchain-network-improvements' into feat/faster-bo…

8e31cdb

…otup

fix: don't query the spv headers database if we don't need to when ch…

07bec65

…ecking to see if we're sync'ed up

fix: remove full inv sync setting; add fault injection for block/micr…

425b700

…oblock push

chore: test-debug DNS resolution

2e2f9d5

fix: less-verbose debug output

508420e

fix: remove full inv sync logic, and fix a bug in which we calculate …

3b27521

…the next reward cycle to sync block data from by looking at the number of sortitions in the remote peer's block inventory, instead of the number of reward cycles in its PoX bit vector

fix: default to 127.0.0.1 instead of localhost for test data URLs, si…

7023db4

…nce we bind to 0.0.0.0 but sometimes localhost resolves to ::1, leading to HTTP connection failures

chore: fix compiler warning

cf93181

fix: pass block and microblock data for pushing when receiving a comm…

b5155e0

…and to advertize block/microblock data

feat: when we receive a block or microblock stream that we know an ou…

5b30a82

…tbound peer does not know about, push it directly to them (don't advertize it via an *Available message). Add tests to verify that push-exclusive behavior works.

chore: whitespace

19095d1

refactor: don't re-instantiate the BitcoinIndexer over and over; keep…

0944837

… it around across burnchain syncs

jcnelson changed the base branch from master to develop January 9, 2022 03:54

jcnelson added 7 commits January 8, 2022 22:54

feat: advertize a block directly after we mine it

56fcf39

fix: correct comment

6112cac

Merge branch 'develop' into feat/faster-bootup

798958d

fix: by default, run antientropy protocol once a minute. it's not tha…

803686c

…t expensive and would save us a lot of pain

fix: compile errors from develop merge; make it so antientropy protoc…

f2209c2

…ol starts from the latest reward cycle and runs backwards; make it so that antientropy protocol restarts whenever the burnchain view changes (to force it to reconsider the latest blocks)

fix: remove full inventory sync config option

e9b7034

feat: push the mined block directly to neighbors

1242a47

jcnelson added 3 commits January 9, 2022 20:05

fix: correct exit condition with new burnchain sync loop so that we d…

419196f

…on't get stuck forever

feat: add trait to burnchain controller to query the total height of …

5407d08

…the burnchain (even if we haven't downloaded everything yet)

fix: pass absolute burnchain block height to pox sync watchdog so we …

5b70e00

…correctly infer ibd status

jcnelson added 3 commits January 20, 2022 23:41

chore: use k/v logging

1da2ef3

chore: API sync with new config layout; fix flash block test

a7fbb16

chore: cargo fmt

43e415c

Merge branch 'develop' into feat/faster-bootup

915a61c

kantai reviewed Jan 22, 2022

View reviewed changes

src/net/http.rs Show resolved Hide resolved

kantai reviewed Jan 22, 2022

View reviewed changes

src/net/http.rs Show resolved Hide resolved

kantai reviewed Jan 22, 2022

View reviewed changes

src/net/p2p.rs Show resolved Hide resolved

kantai reviewed Jan 22, 2022

View reviewed changes

src/net/p2p.rs Show resolved Hide resolved

jcnelson added 8 commits January 24, 2022 17:51

refactor: track and log pagination data; consolidate some logs

56da2d4

tests: expand coverage for ExpectedEndOfStream error cases and for va…

0a3d54b

…rious ways that transaction streaming can fail

refactor: test_debug!() on ExpectedEndOfStream

e27f2a6

fix: add a few more indexes after finding some rarer table scans

58743b7

Merge branch 'feat/faster-bootup' of https://github.com/stacks-networ…

3e6f365

…k/stacks-blockchain into feat/faster-bootup

fix: set first-attempt to 5s and subsequent-attempt to 180s by default

d35d302

fix: give the microblock integration test time to build a microblock …

7dc2079

…before advancing to the next tenure

docs: document mempool_sync_* timeouts

724030e

kantai approved these changes Jan 25, 2022

View reviewed changes

pavitthrap approved these changes Jan 25, 2022

View reviewed changes

jcnelson mentioned this pull request Jan 26, 2022

P2P: Don't broadcast BlocksAvailable until finishing an initial block download #1661

Closed

chore: address PR feedback

c422db5

jcnelson mentioned this pull request Jan 26, 2022

[burnchain-download] download burnchain blocks in parallel to processing sortitions #2944

Closed

Merge branch 'develop' into feat/faster-bootup

c395794

jcnelson merged commit 1a92039 into develop Jan 27, 2022

pavitthrap added the release 2.05.0.1.0 label Feb 1, 2022

jcnelson mentioned this pull request Feb 7, 2022

Multiple commits for btc block, missed target block #2969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize time between sortitions and block-acceptance #2989

Minimize time between sortitions and block-acceptance #2989

jcnelson commented Jan 9, 2022 •

edited

Loading

codecov bot commented Jan 10, 2022 •

edited

Loading

kantai commented Jan 14, 2022

jcnelson commented Jan 15, 2022 •

edited

Loading

jcnelson commented Jan 21, 2022

kantai commented Jan 21, 2022

kantai Jan 22, 2022

jcnelson Jan 24, 2022

kantai Jan 22, 2022

jcnelson Jan 24, 2022 •

edited

Loading

jcnelson Jan 24, 2022 •

edited

Loading

kantai Jan 24, 2022

MaksimalistT commented Jan 23, 2022

kantai left a comment

pavitthrap left a comment

pavitthrap Jan 20, 2022

jcnelson Jan 26, 2022

pavitthrap Jan 25, 2022

jcnelson Jan 26, 2022

pavitthrap Jan 26, 2022

jcnelson Jan 26, 2022

Minimize time between sortitions and block-acceptance #2989

Minimize time between sortitions and block-acceptance #2989

Conversation

jcnelson commented Jan 9, 2022 • edited Loading

codecov bot commented Jan 10, 2022 • edited Loading

Codecov Report

kantai commented Jan 14, 2022

jcnelson commented Jan 15, 2022 • edited Loading

jcnelson commented Jan 21, 2022

kantai commented Jan 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcnelson Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

jcnelson Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaksimalistT commented Jan 23, 2022

kantai left a comment

Choose a reason for hiding this comment

pavitthrap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcnelson commented Jan 9, 2022 •

edited

Loading

codecov bot commented Jan 10, 2022 •

edited

Loading

jcnelson commented Jan 15, 2022 •

edited

Loading

jcnelson Jan 24, 2022 •

edited

Loading

jcnelson Jan 24, 2022 •

edited

Loading