-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] heal the worldstate #4972
Conversation
f173297
to
d440264
Compare
Signed-off-by: Karim TAAM <[email protected]>
750c53b
to
fd31a2d
Compare
Signed-off-by: Karim TAAM <[email protected]>
6593ccd
to
1f1f96b
Compare
Signed-off-by: Karim TAAM <[email protected]>
39f1ed8
to
dead4f8
Compare
Signed-off-by: Karim TAAM <[email protected]>
3103951
to
1311342
Compare
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
20f76e2
to
3f3000d
Compare
Signed-off-by: Matt Nelson <[email protected]>
3f3000d
to
5cab1c2
Compare
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: garyschulte <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
90bfed2
to
960a8bb
Compare
public boolean isInitialSyncPhaseDone() { | ||
return isInitialSyncPhaseDone; | ||
public boolean isResyncNeeded() { | ||
return isResyncNeeded; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never reset this value afaict. once we need a resync, we always need a resync ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed I switch this flag on markInitialSyncPhaseAsDone
&& protocolContext.getBlockchain().getChainHeadBlockNumber() | ||
!= BlockHeader.GENESIS_BLOCK_NUMBER) { | ||
LOG.info( | ||
"Checkpoint sync was requested, but cannot be enabled because the local blockchain is not empty."); | ||
"Snap sync was requested, but cannot be enabled because the local blockchain is not empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be "checkpoint"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
trieLogStorage, | ||
fallbackNodeFinder); | ||
final KeyValueStorage trieLogStorage) { | ||
super(accountStorage, codeStorage, storageStorage, trieBranchStorage, trieLogStorage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -44,7 +44,8 @@ static FinalBlockConfirmation genesisConfirmation(final MutableBlockchain blockc | |||
} | |||
|
|||
static FinalBlockConfirmation ancestorConfirmation(final MutableBlockchain blockchain) { | |||
return firstHeader -> blockchain.contains(firstHeader.getParentHash()); | |||
return firstHeader -> | |||
blockchain.getChainHeadHeader().getHash().equals(firstHeader.getParentHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might this break backward sync for a reorg. e.g. blockhain.contains() will return true if we have block A' but it isn't the chain head. This check requires it to be the chain head...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the blck number. Should be fine now but. if you can verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I pull this branch here I see:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();
I might be missing something, but I read the firstAncestor block as being the earliest block fetched by backward sync. and finalBlockConfirmation.ancestorHeaderReached as the termination condition for fetching further back in BackwardSyncAlgorithm.pickNextStep.
If we are backward syncing because of a fork from 5 or 10 blocks behind head, I think this condition could terminate early. it is almost logically identical to the conditional above it:
return firstHeader -> blockchain.getChainHeadBlockNumber() + 1 >= firstHeader.getNumber();
The first condition makes sense to me b/c we really just need to ensure this backward path resolves to something we have, rather than a particular height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I understand your remark. On the other hand, I have to keep this check because it allows me to have the backardsync works after the heal in case we rewind. Because we will have the blocks but I still want to go back to import them again . So I think I should add the initial condition + this one. In the case of a REORG the first condition will make sure that it does not stop too soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed another fix on this PR https://github.com/hyperledger/besu/pull/5059/files#diff-0361d2ff12cd43f507bf88a3cb4d781a537ca765329e0811f60a9507527cbe13R47
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid, some of the backward sync changes are concerning.
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ethereum/referencetests/src/reference-test/external-resources
be changing as part of this PR?
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/MainnetBlockValidator.java
Outdated
Show resolved
Hide resolved
@@ -30,13 +31,16 @@ public class ProtocolContext { | |||
private final WorldStateArchive worldStateArchive; | |||
private final ConsensusContext consensusContext; | |||
|
|||
private Optional<Synchronizer> synchronizer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should consider alternatives to bloating ProtocolContext, a global object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is indeed a problem. I tried a lot to do otherwise but this was the cleanest I found with the current code structure
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/bonsai/BonsaiInMemoryWorldState.java
Show resolved
Hide resolved
@@ -85,6 +85,7 @@ protected CompletableFuture<FastSyncState> start(final FastSyncState fastSyncSta | |||
if (worldStateStorage instanceof BonsaiWorldStateKeyValueStorage) { | |||
LOG.info("Clearing bonsai flat account db"); | |||
worldStateStorage.clearFlatDatabase(); | |||
worldStateStorage.clearTrieLog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will not be able to go back before this pivot block. In addition, we could have bad trielog so it is preferable to heal them too. Finally keeping old trielog could make a false positive for the isWorldStateAvailable.
protocolContext.getBlockchain().getChainHeadHash().equals(pivotBlockHeader.getHash()); | ||
if (!isValidChainHead) { | ||
if (protocolContext.getBlockchain().contains(pivotBlockHeader.getHash())) { | ||
protocolContext.getBlockchain().rewindToBlock(pivotBlockHeader.getHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the pivot block be > the max allowable bonsai trie layers away? If so, could this rewind take an unfeasible amount of time or maybe error? If so, then what state is the user left in and is it recoverable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part rewind only the blockchain part not the worldstate part.
If the pivot block is before the current head, we do a rewind otherwise we do nothing. In any case the worldstate will be healed with the new pivot block without doing a rollback or rollforward because we cannot do it if it is corrupted
it is important to choose a new pivot block and not to heal the current head. Because we are not sure that block will remain on the canonical chain
@@ -82,9 +81,8 @@ public static Optional<FastSyncDownloader<?>> create( | |||
final FastSyncState fastSyncState = | |||
fastSyncStateStorage.loadState(ScheduleBasedBlockHeaderFunctions.create(protocolSchedule)); | |||
|
|||
if (isResync) { | |||
worldStateStorage.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no need to clear the worldStateStorage now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a resync there is already a clear in the Worldstate downloader part https://github.com/hyperledger/besu/pull/4972/files#diff-28d8c7a62dddded52099a727bf58aa864d58b055795b95ebfeee6ac8e2a8cea2R173
So it's clearly useless to do it again @garyschulte
yes good catch I fixed that |
Signed-off-by: Karim TAAM <[email protected]>
Because of some release difficulties with this PR , I created another one @garyschulte @siladu -> #5059 |
Signed-off-by: Karim TAAM [email protected]
PR description
This PR add a heal mechanism fo the worldtstate in case of inconsistency (unable to trie node). To fix this, we start a quick heal of the worldstate automatically and once the fix is done we restart the block import.
After the detection of an invalid path
This feature can also heal a node that has been inconsistent for a long time, but it will take longer because there will be more nodes to heal. With this PR the healing will be done as soon as the problem is detected so there will not a lot to heal and it will be fast
Performed tests
Fixed Issue(s)
#4379
#4785
#4768
Documentation
doc-change-required
label to this PR ifupdates are required.
Changelog