Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: re-sync on drifted head + sentry alert #1332

Merged
merged 7 commits into from
Oct 17, 2024

Conversation

rodrigo-o
Copy link
Collaborator

@rodrigo-o rodrigo-o commented Oct 14, 2024

Motivation

We had an issue with the node stopping without addition information nor any errors on sight, we want to add an alert related to it that can be picked up by Sentry.

Description

This PR adds an Alert for when state transitions stop and the head start to drift. This is accomplished using the already present :syncing keyword in the LibP2PPort state and just error when we detect the first time the head drifted by a predefined amount, now set to 12.

It also deals with a small change needed to run the block_processing.exs file after the addition of the StoreStatesSupervisor.

After reproducing #1333 in a Sepolia run in my local machine I was able to resume execution when re-dyncing, now it might solved.

Resolves #1333 (We need to validate this live and reopen the issue if it happens again)

@rodrigo-o rodrigo-o marked this pull request as ready for review October 14, 2024 22:03
@rodrigo-o rodrigo-o requested a review from a team as a code owner October 14, 2024 22:03
@rodrigo-o
Copy link
Collaborator Author

While doing a small performance check I was able to encounter a very similar issue to the one reported in #1333, I was running the node and it suddenly stoped for some minutes. Just restarting it solved the issue, because the Sync kicked in, following that logic i added a quick work-arround to re-sync when the drift happens, this proved to work as expected in a test and might be able to avoid #1333 in the future. Unfortunately we'll need to monitor this changes live and make sure the issue actually stopped. Here is a small log output showing how it worked in a simple scenario of a small delay.

INFO 19:12:48.001 Pruning states before slot 190875 
INFO 19:12:48.001 [StateDb] Pruning started. slot=6108000
INFO 19:12:48.002 [BlockDb] Pruning started. slot=6108000
INFO 19:12:48.004 [BlobDb] Pruning started. slot=5976928
INFO 19:12:48.004 [BlobDb] Pruning finished. 0 blobs removed. 
INFO 19:12:48.005 [Libp2p] Slot transition slot=6108064
INFO 19:12:48.014 [BlockDb] Pruning finished. 45 blocks removed. 
INFO 19:12:48.019 [StateDb] Pruning finished. 45 states removed. 
INFO 19:13:00.002 [Libp2p] Slot transition slot=6108065
INFO 19:13:00.811 [Gossip] Block received, block.slot: 6108065. 
INFO 19:13:12.002 [Libp2p] Slot transition slot=6108066
ERROR 19:13:12.002 [Libp2p] Head slot drifted by 4 slots. 
INFO 19:13:12.717 [Optimistic sync] Performing optimistic sync between slots 6108063 and 6108066, for a total of 4 slots. 
INFO 19:13:12.719 [Optimistic sync] Sending request for slots 6108063 to 6108066 (request size = 4). 
INFO 19:13:12.719 [Optimistic Sync] Blocks remaining: 4 
INFO 19:13:12.719 [Gossip] Block received, block.slot: 6108066. 
INFO 19:13:24.004 [Libp2p] Slot transition slot=6108067
INFO 19:13:29.988 [Optimistic Sync] Range 6108063 - 6108066 downloaded successfully, with 4 blocks and 0 missing. 
INFO 19:13:30.008 [Optimistic Sync] Sync completed. Subscribing to gossip topics. 
INFO 19:13:31.017 [ForkChoice] Adding new block slot=6108063 root=0x2be..1e90
INFO 19:13:31.296 [ForkChoice] Block applied in 0.272316s 
INFO 19:13:31.301 [ForkChoice] Block processed in 0.283755s 
INFO 19:13:31.301 [ForkChoice] Block processed. Recomputing head. 
INFO 19:13:31.399 [ForkChoice] Head recomputed in 0.09851s 
INFO 19:13:31.400 [ForkChoice] Added new block slot=6108063 root=0x2be..1e90
INFO 19:13:31.400 [ForkChoice] Recomputed head slot=6108063 root=0x2be..1e90
INFO 19:13:31.400 [ForkChoice] Block added in 0.383s 
INFO 19:13:36.002 [Libp2p] Slot transition slot=6108068
INFO 19:13:38.758 [Gossip] Block received, block.slot: 6108068. 
INFO 19:13:42.249 [ForkChoice] Adding new block slot=6108064 root=0x2eb..be63

@rodrigo-o rodrigo-o changed the title chore: new sentry alert and block_procesing.exs fix fix: re-sync on drifted head + sentry alert Oct 15, 2024
Copy link
Collaborator

@Arkenan Arkenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rodrigo-o rodrigo-o merged commit 788e1c6 into main Oct 17, 2024
22 checks passed
@rodrigo-o rodrigo-o deleted the sentry-alert-on-head-not-updated branch October 17, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Node stopped processing new states without errors
2 participants