Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IF: Move vote processing off the net/main thread #2102

Closed
arhag opened this issue Jan 17, 2024 · 7 comments · Fixed by #2385
Closed

IF: Move vote processing off the net/main thread #2102

arhag opened this issue Jan 17, 2024 · 7 comments · Fixed by #2385
Assignees
Labels
👍 lgtm OCI Work exclusive to OCI team

Comments

@arhag
Copy link
Member

arhag commented Jan 17, 2024

In order to prepare for processing votes after a block arrives, move from processing votes immediately as they come in on the net thread and main thread. Also place blocks as soon as the block state is created into the fork database so they are available to be retrieved for vote processing.

@heifner
Copy link
Member

heifner commented Mar 29, 2024

  • Already guaranteed for produced blocks.
    • accepted_block_header emitted before voted_block is emitted. net_plugin posts both of those to dispatcher strand which enforces order.
  • Received blocks there is currently no guarantee and the way it is currently processed seems rather likely the vote would be processed before the block since the block is posted to the dispatcher strand but the vote is processed directly on the net thread.
    • We already believe that these votes need to be processed on a different thread than the net thread to free up net threads to send blocks.
    • We should be able to process blocks directly without posting to the dispatcher strand.
    • Currently we add to the fork db in push_block or accept_block for blocks received while producing.
      • Any reason not to add to fork db immediately after block header validation?
        • Would allow for even faster block propagation.
        • emit accepted_block_header would be from net thread. Is that an issue? We can verify for our plugins, what about external plugins? Could post the emit to the app() thread if we think it is an issue.
          • net_plugin would have to be updated to not call update_chain_info which is fine, probably better to do that in on_accepted_block anyway.
          • producer_plugin would need to make producer_watermarks thread safe.
          • Two uses and they both expected to be called from the main thread, probably best to maintain that invariant.
            • Do not move accepted_block_header add to forkdb before emit?
        • Just ignore_duplicate_t::yes when we add in push_block and accept_block

@arhag arhag changed the title IF: Guarantee net_plugin sends out vote after it has sent out the block the vote was for IF: Guarantee net_plugin processes votes after it has has processed the block the vote was for Apr 1, 2024
@heifner
Copy link
Member

heifner commented Apr 2, 2024

  • 3000 vote_message (size 32+1+96+192) is less than 1MB.
    • Limit to 3000 votes per connection, drop incoming votes if > 3000
    • Maintain count per connection for quick check
  • Purge votes on LIB (index by block_num)
  • Active thread pool for processing
    • We know we don't want to use net threads
    • Chain threads would compete with trx signature verification. Don't want trx sig verification slowing down votes.
    • New thread pool (vote processing)
      • --vote-threads default to 2 4
    • vote_processor in chain library
   atomic<uint32_t> latest_block_num;
   atomic<uint32_t> lib;
   set_lib(lib) { lib = lib; notify_one(); }
   add_vote( v ) { append(v); latest_block_num = std::max(latest_block_num, v.block_num); notify_one(); }

   each thread:
      while (true) {
         _cond.wait([]() {
            if (stopped || !empty) return true;
         }
         if (stopped) break;
         remove_votes_before_lib();
         uint32_t block_num = latest_block_num;
         block_state_ptr bsp = forkdb.get(block_num);
         for each connection by latency {
            if( block_num != latest_block_num) {
               block_num = latest_block_num;
               bsp = forkdb.get(block_num);
            }
            process_votes_for_block(conn_id, bsp);
            block_num = idx.get_latest_block_num();
         }
      }

   process_votes_for_block(conn_id, bsp) {
      for bsp->finalizer_size():
         process_vote(conn_id, bsp);
   }

@greg7mdp
Copy link
Contributor

greg7mdp commented Apr 2, 2024

Are two threads enough? If each verify takes 2ms, this means that processing the 20 votes received would take a minimum of 20ms. Maybe that's OK, but I think 4 threads might be better.

@linh2931
Copy link
Member

linh2931 commented Apr 2, 2024

What's idx in block_num = idx.get_latest_block_num();?

You would always process latest block's votes first? Where do you remove the vote after it is processed?

@heifner
Copy link
Member

heifner commented Apr 2, 2024

What's idx in block_num = idx.get_latest_block_num();?

The multi-index where one of the indexes is on block_num.

You would always process latest block's votes first? Where do you remove the vote after it is processed?

In process_vote.

   void process_vote(conn_id, bsp) {
      auto itr = idx.find(conn_id, bsp->block_num())
      if (itr != idx.end(); auto vote = *itr) {
         vote_status s = bsp->aggregate_vote(vote);
         idx.remove(itr);
         --c->vote_count;
         return s;
      }
   }

@greg7mdp
Copy link
Contributor

greg7mdp commented Apr 2, 2024

  • accesses to the multi-index need to be protected by a mutex
  • that mutex should be released in process_vote around bsp->aggregate_vote(vote);
  • maybe we should track of when lib changes so that we don't do any work in remove_votes_before_lib(); if lib didn't change - could be a thread_local variable.
  • should we use block_id instead of block_num? it seems that there could be a fork switch so we receive votes for a different block, which has the same block_num as the previous one.
  • why is if( block_num != latest_block_num) { ... } in the for each connection loop?
  • when we do bsp = forkdb.get(block_num); what do we do if the block is not found? Probably should process votes for older blocks if these are in fork_db.

@heifner
Copy link
Member

heifner commented Apr 2, 2024

  • why is if( block_num != latest_block_num) { ... } in the for each connection loop?

Because it could change at any time, so check it periodically to see if we need to move to a different block.

heifner added a commit that referenced this issue Apr 2, 2024
heifner added a commit that referenced this issue Apr 3, 2024
heifner added a commit that referenced this issue Apr 3, 2024
heifner added a commit that referenced this issue Apr 3, 2024
heifner added a commit that referenced this issue Apr 4, 2024
heifner added a commit that referenced this issue Apr 4, 2024
…orks now also can apply staged blocks in the fork db.
heifner added a commit that referenced this issue Apr 4, 2024
heifner added a commit that referenced this issue Apr 4, 2024
… to terminate at a block don't put any blocks above that in the fork database.
heifner added a commit that referenced this issue Apr 5, 2024
heifner added a commit that referenced this issue Apr 5, 2024
heifner added a commit that referenced this issue Apr 5, 2024
heifner added a commit that referenced this issue Apr 5, 2024
heifner added a commit that referenced this issue Apr 6, 2024
heifner added a commit that referenced this issue Apr 8, 2024
heifner added a commit that referenced this issue Apr 8, 2024
heifner added a commit that referenced this issue Apr 9, 2024
heifner added a commit that referenced this issue Apr 10, 2024
IF: Populated fork db ASAP; Vote processing off the main thread
@arhag arhag closed this as completed Apr 10, 2024
@heifner heifner changed the title IF: Guarantee net_plugin processes votes after it has has processed the block the vote was for IF: Move vote processing off the main thread Apr 10, 2024
@heifner heifner changed the title IF: Move vote processing off the main thread IF: Move vote processing off the net/main thread Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👍 lgtm OCI Work exclusive to OCI team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants