Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix sync relayer collaboration #4562

Merged

Conversation

driftluo
Copy link
Collaborator

@driftluo driftluo commented Aug 1, 2024

What problem does this PR solve?

HeaderProcess uses the current snapshot to process the one-time message, During the entire processing process, the changes in the db itself are imperceptible. Normally, there is no problem here. But during this process, if a compact block message is received and processed successfully, the chain inserts a new block, and the block's header is in the previous header process message, there will be a concurrency problem.

I use a simple diagram to describe the time series problem here:

sync thread   |          process headers         |

relay thread    |  compact block process  |

The time series here is that the header process holds the old snapshot longer than the compact block process completion time.

This will cause the block status in the old snapshot to differ from the actual status.

ckb/sync/src/types/mod.rs

Lines 2315 to 2335 in f3efbf2

pub fn get_block_status(&self, block_hash: &Byte32) -> BlockStatus {
match self.shared().state().block_status_map.get(block_hash) {
Some(status_ref) => *status_ref.value(),
None => {
if self.shared().state().header_map.contains_key(block_hash) {
BlockStatus::HEADER_VALID
} else {
let verified = self
.snapshot
.get_block_ext(block_hash)
.map(|block_ext| block_ext.verified);
match verified {
None => BlockStatus::UNKNOWN,
Some(None) => BlockStatus::BLOCK_STORED,
Some(Some(true)) => BlockStatus::BLOCK_VALID,
Some(Some(false)) => BlockStatus::BLOCK_INVALID,
}
}
}
}
}

The actual status should be BLOCK_VALID, but because the block cannot be seen in the old snapshot, the returned status will be UNKNOWN

if status.contains(BlockStatus::HEADER_VALID) {
let header_index = shared
.get_header_index_view(
&self.header.hash(),
status.contains(BlockStatus::BLOCK_STORED),
)
.expect("header with HEADER_VALID should exist")
.as_header_index();
state
.peers()
.may_set_best_known_header(self.peer, header_index);
return result;
}
if self.prev_block_check(&mut result).is_err() {
debug!(
"HeadersProcess rejected invalid-parent header: {} {}",
self.header.number(),
self.header.hash(),
);
state.insert_block_status(self.header.hash(), BlockStatus::BLOCK_INVALID);
return result;
}
if let Some(is_invalid) = self.non_contextual_check(&mut result).err() {
debug!(
"HeadersProcess rejected non-contextual header: {} {}",
self.header.number(),
self.header.hash(),
);
if is_invalid {
state.insert_block_status(self.header.hash(), BlockStatus::BLOCK_INVALID);
}
return result;
}
if self.version_check(&mut result).is_err() {
debug!(
"HeadersProcess rejected invalid-version header: {} {}",
self.header.number(),
self.header.hash(),
);
state.insert_block_status(self.header.hash(), BlockStatus::BLOCK_INVALID);
return result;
}
shared.insert_valid_header(self.peer, self.header);

The UNKNOWN status will cause the header to be reinserted into the header map, which will cause the block's status to become HEADER_VALID, the compact blocks received later will all be put into the orphan block pool due to this abnormal state. This will cause synchronization to pause until no compact block messages are received for a while. The sync protocol can then resume synchronization by re-downloading the abnormal block.

This PR forcefully reset the snapshot in get block status to obtain the latest status

Check List

Tests

  • Unit test
  • Integration test

Release note

Title Only: Include only the PR title in the release note.

@driftluo driftluo requested a review from a team as a code owner August 1, 2024 07:12
@driftluo driftluo requested review from quake, zhangsoledad and eval-exec and removed request for a team August 1, 2024 07:12
sync/src/types/mod.rs Outdated Show resolved Hide resolved
@driftluo driftluo force-pushed the fix-sync-relayer-collaboration branch 3 times, most recently from ec11234 to b0af066 Compare August 1, 2024 08:18
@driftluo driftluo force-pushed the fix-sync-relayer-collaboration branch from b0af066 to f28d622 Compare August 1, 2024 08:27
@eval-exec eval-exec added m:sync module: ckb-sync t:test: Integration Test Type: This issue is related to test. labels Aug 1, 2024
@driftluo driftluo added this pull request to the merge queue Aug 1, 2024
Merged via the queue into nervosnetwork:develop with commit b939b84 Aug 1, 2024
32 checks passed
@driftluo driftluo deleted the fix-sync-relayer-collaboration branch August 1, 2024 15:05
@doitian doitian mentioned this pull request Aug 19, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
m:sync module: ckb-sync t:test: Integration Test Type: This issue is related to test.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants