backup eth1 node provider failover does not actually work? #3845

timothysu · 2022-03-10T19:04:21Z

Describe the bug
When a the primary eth1 node goes down and a second eth1 node begins serving requests, the logs get littered with messages as follows:

error: Error updating eth1 chain cache code=ETH1_ERROR_NON_CONSECUTIVE_LOGS, newIndex=123809, prevIndex=90472

Error: ETH1_ERROR_NON_CONSECUTIVE_LOGS
    at Eth1DepositsCache.add (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositsCache.ts:48:15)
    at Eth1DepositDataTracker.updateDepositCache (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:174:5)
    at Eth1DepositDataTracker.update (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:155:33)
    at Eth1DepositDataTracker.runAutoUpdate (/usr/app/node_modules/@chainsafe/lodestar/src/eth1/eth1DepositDataTracker.ts:129:29)

Expected behavior
No errors (and no DB corruption?)

Steps to Reproduce

Have a fully synced node (unsure if this is required)
Specify two eth1 nodes with --eth1.providerUrls
Take the first of the two nodes offline and have it fall back (can be verified by seeing json rpc requests on the secondary)

Screenshots
n/a

Desktop (please complete the following information):

OS: Ubuntu 20.04 LTS
Version: chainsafe/lodestar:v0.34.1 via docker
Branch: n/a
Commit hash: n/a

The text was updated successfully, but these errors were encountered:

dapplion · 2022-03-14T03:40:36Z

@g11tech can you take a look?

g11tech · 2022-03-14T11:06:41Z

@dapplion 👍

dapplion · 2022-05-10T12:16:47Z

Marking as HIGH priority since this issue can potentially lead to proposal errors in un-resolved before proposing

twoeths · 2022-12-27T07:14:13Z

somehow there is a gap between new deposit index and old deposit index, this is strange because we always based on highest deposit event block number before fetching deposit events

if we prioritize to work on this in a Sprint, need to prepare 2 public eth1 nodes to reproduce the issue

philknows · 2022-12-28T21:43:34Z

somehow there is a gap between new deposit index and old deposit index, this is strange because we always based on highest deposit event block number before fetching deposit events

if we prioritize to work on this in a Sprint, need to prepare 2 public eth1 nodes to reproduce the issue

Would you be able to test against with some of the rescue nodes we have setup for production @tuyennhv ? I believe we have two from two different providers available.

twoeths · 2023-01-02T12:12:45Z

I have a branch (tuyen/eth1_use_fallback_url) to switch between 2 different eth1 provider urls every 5 minutes and it still can fetch deposits successfully (this is on mainnet)

also the log does not show the error in this issue

grep -e "ETH1_ERROR_NON_CONSECUTIVE_LOGS" -rn beacon-2023-01-02.log

grep -e "Error updating eth1 chain" -rn beacon-2023-01-02.log

since this issue was open for a while and code changed, I suppose we don't have it anymore.

@timothysu if you can reproduce, feel free to reopen. Thanks.

dapplion assigned g11tech Mar 14, 2022

dapplion unassigned g11tech May 10, 2022

dapplion mentioned this issue May 10, 2022

Investigate: ETH1_ERROR_NON_CONSECUTIVE_LOGS #3853

Closed

dapplion added prio-high Resolve issues as soon as possible. scope-testnet-debugging labels May 10, 2022

philknows added this to the Sprint July 15 milestone Jun 29, 2022

philknows added this to Lodestar Sprint Planning Jun 30, 2022

philknows moved this to Sprint in Lodestar Sprint Planning Jun 30, 2022

philknows assigned dadepo Jul 1, 2022

dapplion unassigned dadepo Jul 12, 2022

dadepo self-assigned this Jul 25, 2022

dadepo moved this from Todo to In Progress in Lodestar Sprint Planning Aug 15, 2022

philknows removed this from the Sprint: July 15, 2022 milestone Sep 2, 2022

dapplion unassigned dadepo Dec 24, 2022

twoeths self-assigned this Dec 30, 2022

twoeths closed this as completed Jan 2, 2023

github-project-automation bot moved this from In Progress to Done in Lodestar Sprint Planning Jan 2, 2023

nazarhussain mentioned this issue Jan 3, 2023

Add sim tests for multiple execution urls #4965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup eth1 node provider failover does not actually work? #3845

backup eth1 node provider failover does not actually work? #3845

timothysu commented Mar 10, 2022 •

edited

Loading

dapplion commented Mar 14, 2022

g11tech commented Mar 14, 2022

dapplion commented May 10, 2022

twoeths commented Dec 27, 2022

philknows commented Dec 28, 2022

twoeths commented Jan 2, 2023

backup eth1 node provider failover does not actually work? #3845

backup eth1 node provider failover does not actually work? #3845

Comments

timothysu commented Mar 10, 2022 • edited Loading

dapplion commented Mar 14, 2022

g11tech commented Mar 14, 2022

dapplion commented May 10, 2022

twoeths commented Dec 27, 2022

philknows commented Dec 28, 2022

twoeths commented Jan 2, 2023

timothysu commented Mar 10, 2022 •

edited

Loading