Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation violation - goroutine panic on shutdown of node #1816

Closed
cryptomeow opened this issue Aug 31, 2018 · 7 comments
Closed

Segmentation violation - goroutine panic on shutdown of node #1816

cryptomeow opened this issue Aug 31, 2018 · 7 comments

Comments

@cryptomeow
Copy link

cryptomeow commented Aug 31, 2018

Background

After requesting the shutdown of the lnd process at the end there was a segmentation violation error and a bunch of goroute panic stacktraces:
https://gist.github.com/cryptagoras/0e2d6ec01f2772a8fc0c039ba2900ac5

Your environment

  • 0.5.0-beta commit=2f1b0246798eab417cf928d720140237bb08d0ff

  • operating system: Ubuntu 16.04 LTS - Linux host 4.4.0-119-generic 143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • bitcoind: Bitcoin Core v0.16.2

  • Lots of available resources (CPU/RAM)

  • Popular LND node 330-350 active channels & peers

  • Many long-lived channels

Steps to reproduce

Logs with stacktraces: https://gist.github.com/cryptagoras/0e2d6ec01f2772a8fc0c039ba2900ac5
Active node with long history maybe, during shutdown.
It happened at least twice, but then I tried again 4 times but I never got the stacktraces of doom, just once I got a 1-line segfault error but no stacktraces.

Expected behaviour

Shutdown cleanly without erros

Actual behaviour

Goroutine stacktraces of doom on shutdown

@Roasbeef
Copy link
Member

The issue here looks to be with boltdb in that as you're shutting down, the DB has already been closed.

@Roasbeef
Copy link
Member

How are you trying to shutdown lnd?

@cryptomeow
Copy link
Author

Single Ctrl+C, I think the culprit might be some "corrupted" old channel and the combo with the hundreds of channels.

@Roasbeef
Copy link
Member

The node is able to boot normally though correct? Do you have long running, or scripts that poll certain API endpoints?

@cryptomeow
Copy link
Author

It boots normally, but 😞 yes I forgot to mention I do have a script that polls crudely by calling 6 lncli commands sequentially with 10 seconds delay between those group of calls.

@wpaulino
Copy link
Contributor

wpaulino commented Sep 7, 2018

Looks like this can be prevented by adding a check to the RPCs to exit if the server is in the process of shutting down as the RPC server is shut down after.

@Roasbeef
Copy link
Member

I think this will be addressed by #2081 as we now signal the gRPC server itself for graceful shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants