Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup eth1 node provider failover does not actually work? #3845

Closed
timothysu opened this issue Mar 10, 2022 · 6 comments · Fixed by #4965
Closed

backup eth1 node provider failover does not actually work? #3845

timothysu opened this issue Mar 10, 2022 · 6 comments · Fixed by #4965
Assignees
Labels
prio-high Resolve issues as soon as possible.

Comments

@timothysu
Copy link

timothysu commented Mar 10, 2022

Describe the bug
When a the primary eth1 node goes down and a second eth1 node begins serving requests, the logs get littered with messages as follows:

error: Error updating eth1 chain cache code=ETH1_ERROR_NON_CONSECUTIVE_LOGS, newIndex=123809, prevIndex=90472

Error: ETH1_ERROR_NON_CONSECUTIVE_LOGS
    at Eth1DepositsCache.add (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositsCache.ts:48:15)
    at Eth1DepositDataTracker.updateDepositCache (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:174:5)
    at Eth1DepositDataTracker.update (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:155:33)
    at Eth1DepositDataTracker.runAutoUpdate (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:129:29)

Expected behavior
No errors (and no DB corruption?)

Steps to Reproduce

  1. Have a fully synced node (unsure if this is required)
  2. Specify two eth1 nodes with --eth1.providerUrls
  3. Take the first of the two nodes offline and have it fall back (can be verified by seeing json rpc requests on the secondary)

Screenshots
n/a

Desktop (please complete the following information):

  • OS: Ubuntu 20.04 LTS
  • Version: chainsafe/lodestar:v0.34.1 via docker
  • Branch: n/a
  • Commit hash: n/a
@dapplion
Copy link
Contributor

@g11tech can you take a look?

@g11tech
Copy link
Contributor

g11tech commented Mar 14, 2022

@dapplion 👍

@dapplion
Copy link
Contributor

Marking as HIGH priority since this issue can potentially lead to proposal errors in un-resolved before proposing

@philknows philknows added this to the Sprint July 15 milestone Jun 29, 2022
@dadepo dadepo self-assigned this Jul 25, 2022
@dadepo dadepo moved this from Todo to In Progress in Lodestar Sprint Planning Aug 15, 2022
@philknows philknows removed this from the Sprint: July 15, 2022 milestone Sep 2, 2022
@twoeths
Copy link
Contributor

twoeths commented Dec 27, 2022

somehow there is a gap between new deposit index and old deposit index, this is strange because we always based on highest deposit event block number before fetching deposit events

if we prioritize to work on this in a Sprint, need to prepare 2 public eth1 nodes to reproduce the issue

@philknows
Copy link
Member

somehow there is a gap between new deposit index and old deposit index, this is strange because we always based on highest deposit event block number before fetching deposit events

if we prioritize to work on this in a Sprint, need to prepare 2 public eth1 nodes to reproduce the issue

Would you be able to test against with some of the rescue nodes we have setup for production @tuyennhv ? I believe we have two from two different providers available.

@twoeths twoeths self-assigned this Dec 30, 2022
@twoeths
Copy link
Contributor

twoeths commented Jan 2, 2023

I have a branch (tuyen/eth1_use_fallback_url) to switch between 2 different eth1 provider urls every 5 minutes and it still can fetch deposits successfully (this is on mainnet)

Screen Shot 2023-01-02 at 19 07 54

also the log does not show the error in this issue

grep -e "ETH1_ERROR_NON_CONSECUTIVE_LOGS" -rn beacon-2023-01-02.log
grep -e "Error updating eth1 chain" -rn beacon-2023-01-02.log

since this issue was open for a while and code changed, I suppose we don't have it anymore.

@timothysu if you can reproduce, feel free to reopen. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-high Resolve issues as soon as possible.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants