Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry mechanism when getting a broadcasted block fail on all peers #4271

Merged
merged 57 commits into from
Aug 26, 2022

Conversation

fab-10
Copy link
Contributor

@fab-10 fab-10 commented Aug 17, 2022

PR description

Currently even if we use the RetryingGetBlockFromPeers, that tries to download the block, checking every peer in turn, if all peers fail, the retrieval is not repeated, unless the block is announced again.
This PR improve the block propagation manager, to repeat the download of a block, until some conditions apply: it is still not imported and the distance from the local head is within configured range.

Fixed Issue(s)

relates to #3955

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if
    updates are required.

Changelog

Gabriel-Trintinalia and others added 30 commits August 8, 2022 13:19
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
…ancestors of Block

Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
…ing pending block to the list

Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Gabriel Trintinalia <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
RetryingGetBlockFromPeersTask had a problem that prevented to complete
when all the peers were tried without success, and that also had the
consequence to not removing the failed requested block for the internal
caches in BlockPropagationManager, that could cause a stall since that
block will to be tried to be retrieved again.

Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 changed the title 3955 logs and retries Retry mechanism when getting a broadcasted block fail on all peers Aug 17, 2022
@fab-10 fab-10 marked this pull request as ready for review August 17, 2022 13:50
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 added the mainnet label Aug 17, 2022
Signed-off-by: Fabio Di Fabio <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	ethereum/eth/src/test/java/org/hyperledger/besu/ethereum/eth/sync/AbstractBlockPropagationManagerTest.java
Copy link
Contributor

@jflo jflo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor non-blocking suggestion, looks good though.

if (!config.getBlockPropagationRange().contains(distanceFromChainHead)) {
debugLambda(
LOG,
"Not retrying to get block {} since we are too far from local chain head {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good place to suggest the user to start over with a checkpoint sync? If we got here during a backwards sync, that might be the only way to recover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got this, backward sync does not run along with BlockPropagationManager

@fab-10 fab-10 requested a review from jflo August 24, 2022 15:57
@fab-10
Copy link
Contributor Author

fab-10 commented Aug 24, 2022

@jflo added a timeout and did some refactoring in the last commit, do you mind to review again?

@fab-10 fab-10 merged commit af65c86 into hyperledger:main Aug 26, 2022
@fab-10 fab-10 deleted the 3955-logs-and-retries branch August 26, 2022 09:14
@macfarla macfarla added the 22.7.2 label Sep 4, 2022
garyschulte pushed a commit to garyschulte/besu that referenced this pull request Sep 7, 2022
garyschulte pushed a commit to garyschulte/besu that referenced this pull request Sep 7, 2022
garyschulte pushed a commit to garyschulte/besu that referenced this pull request Sep 7, 2022
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants