Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: nwaku randomly fails to start #1866

Closed
mfw78 opened this issue Jul 29, 2023 · 4 comments · Fixed by #1869
Closed

bug: nwaku randomly fails to start #1866

mfw78 opened this issue Jul 29, 2023 · 4 comments · Fixed by #1869
Assignees
Labels
bug Something isn't working

Comments

@mfw78
Copy link

mfw78 commented Jul 29, 2023

Problem

When running nwaku v0.0.18 via docker, nwaku randomly fails to start. When retrying a couple of times, the nwaku node starts as expected.

docker run --network host -it statusteam/nim-waku:v0.18.0 --relay=true --filter=true --lightpush=true --rpc-admin=true --rpc-port=8545 --rpc-address=0.0.0.0 --websocket-support=true --nodekey=ca18f11664fed73af38b11ffc3ccdbdd0738c475488dfab18a91f43a1f78711a
INF 2023-07-29 07:37:49.921+00:00 Initializing networking                    tid=1 file=waku_node.nim:175 addrs="@[/ip4/0.0.0.0/tcp/60000, /ip4/0.0.0.0/tcp/8000/ws]"
INF 2023-07-29 07:37:49.924+00:00 mounting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:364
INF 2023-07-29 07:37:49.924+00:00 relay mounted successfully                 topics="waku node" tid=1 file=waku_node.nim:383
INF 2023-07-29 07:37:49.924+00:00 mounting rendezvous discovery protocol     topics="waku node" tid=1 file=waku_node.nim:897
INF 2023-07-29 07:37:49.924+00:00 mounting libp2p ping protocol              topics="waku node" tid=1 file=waku_node.nim:798
INF 2023-07-29 07:37:49.925+00:00 mounting store client                      topics="waku node" tid=1 file=waku_node.nim:605
INF 2023-07-29 07:37:49.925+00:00 mounting light push                        topics="waku node" tid=1 file=waku_node.nim:661
INF 2023-07-29 07:37:49.925+00:00 mounting filter protocol                   topics="waku node" tid=1 file=waku_node.nim:393
INF 2023-07-29 07:37:49.925+00:00 Starting Waku node                         topics="waku node" tid=1 file=waku_node.nim:912 version=v0.18.0
INF 2023-07-29 07:37:49.925+00:00 PeerInfo                                   topics="waku node" tid=1 file=waku_node.nim:915 peerId=16U*eusaXv addrs=@[]
INF 2023-07-29 07:37:49.925+00:00 Listening on                               topics="waku node" tid=1 file=waku_node.nim:922 full=[/ip4/0.0.0.0/tcp/60000/p2p/16Uiu2HAmUyv3ghfFzi9R4Hae36TgDavNYpoAuQcDEVr3RveusaXv][/ip4/0.0.0.0/tcp/8000/ws/p2p/16Uiu2HAmUyv3ghfFzi9R4Hae36TgDavNYpoAuQcDEVr3RveusaXv]
INF 2023-07-29 07:37:49.926+00:00 DNS: discoverable ENR                      topics="waku node" tid=1 file=waku_node.nim:923 enr=enr:-KO4QORs1sM__HZIAZObHpNZ1d9OkudlXeGO0ZBof4husK-AD_0brzrd-eImFj_Ej_ogfrxvKuDN2xha3AF1TpHRm9kBgmlkgnY0gmlwhAAAAACKbXVsdGlhZGRyc4wACgQAAAAABh9A3QOJc2VjcDI1NmsxoQPylCkBmkpBNaOXW6GKvKzCAHxFkRdDMA9a0jBtvcpXFYN0Y3CC6mCFd2FrdTIN
INF 2023-07-29 07:37:49.926+00:00 starting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:333
INF 2023-07-29 07:37:49.926+00:00 relay started successfully                 topics="waku node" tid=1 file=waku_node.nim:354
INF 2023-07-29 07:37:49.956+00:00 Stopping AutonatService                    topics="libp2p autonatservice" tid=1 file=service.nim:202
WRN 2023-07-29 07:37:49.957+00:00 service is already stopped                 topics="libp2p switch" tid=1 file=switch.nim:93
INF 2023-07-29 07:37:49.957+00:00 Stopping AutonatService                    topics="libp2p autonatservice" tid=1 file=service.nim:202
WRN 2023-07-29 07:37:49.957+00:00 service is already stopped                 topics="libp2p switch" tid=1 file=switch.nim:93
WRN 2023-07-29 07:37:49.958+00:00 Stopping relay without starting it         topics="libp2p relay" tid=1 file=relay.nim:382
WRN 2023-07-29 07:37:49.958+00:00 Stopping rendezvous without starting it    topics="libp2p discovery rendezvous" tid=1 file=rendezvous.nim:673
Traceback (most recent call last, using override)
/app/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(631) signalHandler
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
 ~      docker run --network host -it statusteam/nim-waku:v0.18.0 --relay=true --filter=true --lightpush=true --rpc-admin=true --rpc-port=8545 --rpc-address=0.0.0.0 --websocket-support=true --nodekey=ca18f11664fed73af38b11ffc3ccdbdd0738c475488dfab18a91f43a1f78711a
INF 2023-07-29 07:37:59.679+00:00 Initializing networking                    tid=1 file=waku_node.nim:175 addrs="@[/ip4/0.0.0.0/tcp/60000, /ip4/0.0.0.0/tcp/8000/ws]"
INF 2023-07-29 07:37:59.682+00:00 mounting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:364
INF 2023-07-29 07:37:59.682+00:00 relay mounted successfully                 topics="waku node" tid=1 file=waku_node.nim:383
INF 2023-07-29 07:37:59.683+00:00 mounting rendezvous discovery protocol     topics="waku node" tid=1 file=waku_node.nim:897
INF 2023-07-29 07:37:59.683+00:00 mounting libp2p ping protocol              topics="waku node" tid=1 file=waku_node.nim:798
INF 2023-07-29 07:37:59.683+00:00 mounting store client                      topics="waku node" tid=1 file=waku_node.nim:605
INF 2023-07-29 07:37:59.683+00:00 mounting light push                        topics="waku node" tid=1 file=waku_node.nim:661
INF 2023-07-29 07:37:59.683+00:00 mounting filter protocol                   topics="waku node" tid=1 file=waku_node.nim:393
INF 2023-07-29 07:37:59.684+00:00 Starting Waku node                         topics="waku node" tid=1 file=waku_node.nim:912 version=v0.18.0
INF 2023-07-29 07:37:59.684+00:00 PeerInfo                                   topics="waku node" tid=1 file=waku_node.nim:915 peerId=16U*eusaXv addrs=@[]
INF 2023-07-29 07:37:59.684+00:00 Listening on                               topics="waku node" tid=1 file=waku_node.nim:922 full=[/ip4/0.0.0.0/tcp/60000/p2p/16Uiu2HAmUyv3ghfFzi9R4Hae36TgDavNYpoAuQcDEVr3RveusaXv][/ip4/0.0.0.0/tcp/8000/ws/p2p/16Uiu2HAmUyv3ghfFzi9R4Hae36TgDavNYpoAuQcDEVr3RveusaXv]
INF 2023-07-29 07:37:59.684+00:00 DNS: discoverable ENR                      topics="waku node" tid=1 file=waku_node.nim:923 enr=enr:-KO4QORs1sM__HZIAZObHpNZ1d9OkudlXeGO0ZBof4husK-AD_0brzrd-eImFj_Ej_ogfrxvKuDN2xha3AF1TpHRm9kBgmlkgnY0gmlwhAAAAACKbXVsdGlhZGRyc4wACgQAAAAABh9A3QOJc2VjcDI1NmsxoQPylCkBmkpBNaOXW6GKvKzCAHxFkRdDMA9a0jBtvcpXFYN0Y3CC6mCFd2FrdTIN
INF 2023-07-29 07:37:59.684+00:00 starting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:333
INF 2023-07-29 07:37:59.685+00:00 relay started successfully                 topics="waku node" tid=1 file=waku_node.nim:354
WRN 2023-07-29 07:37:59.686+00:00 Starting gossipsub twice                   topics="libp2p gossipsub" tid=1 file=gossipsub.nim:603
INF 2023-07-29 07:37:59.686+00:00 Setting up AutonatService                  topics="libp2p autonatservice" tid=1 file=service.nim:183
INF 2023-07-29 07:37:59.687+00:00 Node started successfully                  topics="waku node" tid=1 file=waku_node.nim:941
INF 2023-07-29 07:37:59.687+00:00 Relay peer connections                     topics="waku node peer_manager" tid=1 file=peer_manager.nim:677 inRelayConns=0/25 outRelayConns=0/25 totalRelayConns=0 maxConnections=50 notConnectedPeers=0 outsideBackoffPeers=0
INF 2023-07-29 07:37:59.687+00:00 Starting JSON-RPC HTTP server              topics="JSONRPC-HTTP-SERVER" tid=1 file=httpserver.nim:80 url=http://0.0.0.0:8545
INF 2023-07-29 07:37:59.687+00:00 RPC Server started                         topics="wakunode app" tid=1 file=app.nim:747 address=0.0.0.0:8545
INF 2023-07-29 07:37:59.687+00:00 Node setup complete                        topics="wakunode main" tid=1 file=wakunode2.nim:149

Impact

This bug has significant impact as it directly affects the reliability of running wakunode.

To reproduce

  1. Run above docker commands to reproduce.

Expected behavior

The node starts consistently, or provides sane error messages indicating where the failure is.

Additional context

Potentially related to #1826 ?

@mfw78 mfw78 added bug Something isn't working track:maintenance labels Jul 29, 2023
@vpavlin vpavlin added this to the Release 0.20.0 milestone Aug 1, 2023
@vpavlin
Copy link
Member

vpavlin commented Aug 1, 2023

@mfw78 I don't think this is "random". I was not able to reproduce it by simply running your docker run command, but then I thought that since it is using host network, if you try to start 2 containers, it should crash on trying to bind to port 8545 twice - and it does crash, but with your error - is it possible that you had another container running there already?

Anyway, it should have just cleanly error out with the "Address already in use" error, so thanks for catching this!

I believe I was able to find the issue and fix it in #1869

@vpavlin vpavlin self-assigned this Aug 1, 2023
@mfw78
Copy link
Author

mfw78 commented Aug 2, 2023

I'm pretty sure that I didn't have anything else bound to local port 8545. At all times the container was started interactively, so when I killed it with Ctrl+C, it terminated, then I'd restart it.

Nonetheless, I can't guarantee that I didn't have a second container running. If I observe this fail condition again, I'll re-open this issue.

@vpavlin
Copy link
Member

vpavlin commented Aug 2, 2023

Well, to be fair, there could be other unchecked calls, so it could also be a different reason for the same error, but I ran the container many times and it never failed until running a second one, so that was the only way I could replicate the same error

@vpavlin
Copy link
Member

vpavlin commented Aug 3, 2023

@mfw78 You can try to use this container image (built from the commit containing the fix) to see if you encounter the error again: statusteam/nim-waku:0b2cfae5

@vpavlin vpavlin modified the milestones: Release 0.20.0, Release 0.19.0 Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants