-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LND shuts down if HTTP request to BTC node fails #5661
Comments
Are you sure this isn't the healthcheck that is shutting down |
@guggero Yeah, I see in stderr (as opposed to stdout where other logs end up) a lnd/lnwallet/btcwallet/blockchain.go Line 156 in 93d12cd
During During startup (either initial or starting up after falling behind), this gets called for each block so probability of this causing a shutdown compounds with the number of blocks to process. But looking around it seems that there's currently nothing in place to handle errors from calling any RPC functions so anything unexpected in that whole path will just bubble up and stop the process.. Or am I missing something? FWIW
I actually missed this healthcheck-backoff thing - correct me if I'm wrong here but it looks to me that this is a separate go routine that checks the chain backend, killing the process if it fails, but it has nothing to do with handling errors during RPC calls? |
Are requests failing due to timeouts, or the network itself being unreliable? Typically we see users use a sort of hyper visor to automatically restart |
Experiencing this now on a node requiring a wallet rescan. At some point on startup, one of the requests will fail:
This effectively ends up in a restart loop (with wallet unlock required on each) and the node is unable to start up. Adding retry behavior would allow it to complete, as the errors are transient. If I understand the dependency resolution right and it doesn't get dropped, this should be resolved when #6285 is merged, since it brings in btcsuite/btcd#1743 |
Background
If any HTTP request to bitcoind fails during sync, the process shuts down. This is true also for intermittent errors like timeouts, networks errors, or bitcoind still syncing (related to #1533). Even under good conditions, when connecting to a bitcoin node over Tor, these kind of errors are normal.
I didn't find a clean way to address this in the LND codebase itself.
I attempted to address this with this PR in the btcd/btcwallet codebase here: btcsuite/btcd#1743
Have been running with this patch on testnet for some time and restarts have gone from multiple times per day to 0 over the past week.
Expected behaviour
LND retries failed requests with an exponential backoff
Actual behaviour
LND exits, requiring a wallet unlock on restart
The text was updated successfully, but these errors were encountered: