Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block announcements are dropped because all per-peer validation slots are occupied #1929

Open
dmitry-markin opened this issue Oct 18, 2023 · 8 comments
Assignees
Labels
I2-bug The node fails to follow expected behavior.

Comments

@dmitry-markin
Copy link
Contributor

dmitry-markin commented Oct 18, 2023

As reported by @altonen while running a burn-in for #1631, some block announcements are dropped with a warning:

"💔 Ignored block (#{} -- {}) announcement from {} because all validation slots for this peer are occupied.",

This can be happening for several reasons:

  1. Validations are not correctly removed from the queue after they complete.
  2. Validations are not polled enough.
  3. Peers are announcing too many validations for them to be processed concurrently (MAX_CONCURRENT_BLOCK_ANNOUNCE_VALIDATIONS_PER_PEER are currently set to 4).
@dmitry-markin dmitry-markin added the I2-bug The node fails to follow expected behavior. label Oct 18, 2023
@dmitry-markin dmitry-markin self-assigned this Oct 18, 2023
@dmitry-markin
Copy link
Contributor Author

It doesn't seem that this is happening due to 1. or 2. Validation slots are always deallocated once validations complete here:

self.deallocate_slot_for_block_announce_validation(validation.peer_id());

and per-peer slot deallocation is tested here:
#[test]
fn validation_slots_per_peer_deallocated() {
let mut validator =
BlockAnnounceValidator::<Block>::new(Box::new(DefaultBlockAnnounceValidator {}));
let peer_id = PeerId::random();
for _ in 0..MAX_CONCURRENT_BLOCK_ANNOUNCE_VALIDATIONS_PER_PEER {
assert!(matches!(
validator.allocate_slot_for_block_announce_validation(&peer_id),
AllocateSlotForBlockAnnounceValidation::Allocated,
));
}
assert!(matches!(
validator.allocate_slot_for_block_announce_validation(&peer_id),
AllocateSlotForBlockAnnounceValidation::MaximumPeerSlotsReached,
));
validator.deallocate_slot_for_block_announce_validation(&peer_id);
assert!(matches!(
validator.allocate_slot_for_block_announce_validation(&peer_id),
AllocateSlotForBlockAnnounceValidation::Allocated,
));
}

Also, assumption 2. is unlikely, because validations are always polled until the stream is pending (the stream never terminates):

while let Poll::Ready(Some(result)) = self.block_announce_validator.poll_next_unpin(cx) {
self.process_block_announce_validation_result(result);
}
.

@dmitry-markin
Copy link
Contributor Author

It looks like the issue might be caused by major syncing peers announcing imported blocks. By default, we always announce imported blocks:

if announce_imported_blocks {
sync_service.announce_block(notification.hash, None);
}

announce_imported_blocks:
/// Activate or not the automatic announcing of blocks after import
///
/// By default this is `false`.
fn announce_block(&self) -> Result<bool> {
Ok(true)
}

There is no "fuse" in SyncingEngine making block announcement a no-op during major sync either:

pub fn announce_block(&mut self, hash: B::Hash, data: Option<Vec<u8>>) {
let header = match self.client.header(hash) {
Ok(Some(header)) => header,
Ok(None) => {
log::warn!(target: LOG_TARGET, "Trying to announce unknown block: {hash}");
return
},
Err(e) => {
log::warn!(target: LOG_TARGET, "Error reading block header {hash}: {e}");
return
},
};
// don't announce genesis block since it will be ignored
if header.number().is_zero() {
return
}
let is_best = self.client.info().best_hash == hash;
log::debug!(target: LOG_TARGET, "Reannouncing block {hash:?} is_best: {is_best}");
let data = data
.or_else(|| self.block_announce_data_cache.get(&hash).cloned())
.unwrap_or_default();
for (peer_id, ref mut peer) in self.peers.iter_mut() {
let inserted = peer.known_blocks.insert(hash);
if inserted {
log::trace!(target: LOG_TARGET, "Announcing block {hash:?} to {peer_id}");
let message = BlockAnnounce {
header: header.clone(),
state: if is_best { Some(BlockState::Best) } else { Some(BlockState::Normal) },
data: Some(data.clone()),
};
self.last_notification_io = Instant::now();
peer.sink.send_sync_notification(message.encode());
}
}
}

When a peer is major syncing, the announcements might be broadcast faster then they are processed, what triggers the warning.

@bkchr
Copy link
Member

bkchr commented Oct 29, 2023

When we are major syncing the node internally isn't sending block import notification and thus, we also don't announce anything.

@dmitry-markin
Copy link
Contributor Author

#2132 helped with polling block announce validations during versi scaling, but if the issue described in #2152 continues, we'll need to look into it again.

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Mar 26, 2024
@senseless
Copy link
Contributor

Adding logs for an event that might be related. I brought up a RPC node for Frequency and received a large number of block imports. Nothing special was happening on the nodes at the time. No other details / errors / warnings were indicated. @wilwade believed this issue may be related.

frequency-logs.txt

@wilwade
Copy link
Contributor

wilwade commented May 14, 2024

Thanks @senseless Here are the logs for one of the archive nodes we run that picked up on the error
Ignored-Block-Logs.txt

Frequency is currently using Polkadot SDK v1.1.0 So possible that this has been fixed and we just need to update, but wanted to make sure the logs were here in case.

Also getting 1000 of them in < 1 second feels odd.

@bkchr
Copy link
Member

bkchr commented May 14, 2024

We should probably stop logging this, as this could be literally someone spamming the node.

@dmitry-markin
Copy link
Contributor Author

We should probably stop logging this, as this could be literally someone spamming the node.

Addressed in #4480.

github-merge-queue bot pushed a commit that referenced this issue May 16, 2024
Demote `Ignored block announcement because all validation slots for this
peer are occupied.` message to debug level.

This is mostly an indicator of somebody spamming the node or (more
likely) some node actively keeping up with the network but not
recognizing it's in a major sync mode, so sending zillions of block
announcements (have seen this on Versi).

This warning shouldn't be considered an error by the end user, so let's
make it debug.

Ref. #1929.
hitchhooker pushed a commit to ibp-network/polkadot-sdk that referenced this issue Jun 5, 2024
Demote `Ignored block announcement because all validation slots for this
peer are occupied.` message to debug level.

This is mostly an indicator of somebody spamming the node or (more
likely) some node actively keeping up with the network but not
recognizing it's in a major sync mode, so sending zillions of block
announcements (have seen this on Versi).

This warning shouldn't be considered an error by the end user, so let's
make it debug.

Ref. paritytech#1929.
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
Demote `Ignored block announcement because all validation slots for this
peer are occupied.` message to debug level.

This is mostly an indicator of somebody spamming the node or (more
likely) some node actively keeping up with the network but not
recognizing it's in a major sync mode, so sending zillions of block
announcements (have seen this on Versi).

This warning shouldn't be considered an error by the end user, so let's
make it debug.

Ref. paritytech#1929.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior.
Projects
Status: Backlog 🗒
Development

No branches or pull requests

4 participants