Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Can't reconnect on network restart + Crash afterwards #736

Closed
juligasa opened this issue Sep 22, 2022 · 2 comments
Closed

[bug] Can't reconnect on network restart + Crash afterwards #736

juligasa opened this issue Sep 22, 2022 · 2 comments
Assignees
Labels
platform: linux Relevant for Linux priority: high Critical to have scope: daemon Daemon and p2p networking type: bug Something isn't working

Comments

@juligasa
Copy link
Collaborator

juligasa commented Sep 22, 2022

Describe the bug
When the network connectivity goes off and on. Next sync/connection loop iterations won't work.

To Reproduce Steps to reproduce the behavior:

  1. Start daemon
  2. Connect to someone using all addresses (tcp + udp)
  3. Write a draft and publish (i don't know if this makes any difference)
  4. switch network connection (WiFi/Wire) off
  5. Wait for the next sync cycle to fail. This is expected (usually 1 minute, since the sync loop triggers every 60s). One can see the daemon logs
  6. Switch network connection back on
  7. Wait for the next sync cycle to fail. This should not be expected since no we have internet connection. Note that the error (see Screenshots section) suggest there is an error dialing the relay through the udp/quic connection which should not be the default. I can't use that address even for pinging a node. However, if only provided tcp addresses in step 2 the error is different but the green status led in the frontend goes green)
  8. If the daemon is restarted, it chrashes every time we run it on the same ~/.mtt folder. Note that this only happens if step 2 was with tcp + udp addresses

Expected behavior
When network connectivity is back, the node should connect again to the relays, reserving new slots as if it was the first time connected. (note that when a node goes offline, the relay notices and clears any reservation it may have)

Screenshots
sync failing after step 7 with connections to udp+tcp

2022-09-22T09:38:38.381+0200	INFO	mintter/syncing	syncing/syncing.go:176	SyncLoopStarted	{"traceID": 1663832318}
2022-09-22T09:38:38.381+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcbx54fi7vzqppxyvgp6leq56bozxr5gae7gsecevt6tg6zi3ju7sx"}
2022-09-22T09:38:38.381+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko"}
2022-09-22T09:38:38.381+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcafkj2kfjxjyucfdocc5fv67s2t6md542wfnrsr5u25m5e7ilxxqc"}
2022-09-22T09:38:38.381+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx"}
2022-09-22T09:38:38.381+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o"}
2022-09-22T09:38:38.387+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcafkj2kfjxjyucfdocc5fv67s2t6md542wfnrsr5u25m5e7ilxxqc", "error": "failed to connect to peer 12D3KooWBFU25VMZWDfnaovDaiwKQPYqUMVhhSGp1XqSsd8A4Sqw: failed to dial 12D3KooWBFU25VMZWDfnaovDaiwKQPYqUMVhhSGp1XqSsd8A4Sqw:\n  * [/ip4/23.20.24.146/udp/4002/quic/p2p/12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq/p2p-circuit] error opening hop stream to relay: failed to dial 12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq:\n  * [/ip4/0.0.0.0/udp/4002/quic] no transport for protocol\n  * [/ip4/23.20.24.146/udp/4002/quic] no transport for protocol\n  * [/ip4/0.0.0.0/tcp/4002] dial tcp4 0.0.0.0:4002: connect: connection refused", "isMintterPeer": false}
2022-09-22T09:38:48.561+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o", "error": "failed to connect to peer 12D3KooWSEktKoE2uZwedR1yXxAsqutheQDBGnH7Q171uTnc72VK: routing: not found", "isMintterPeer": false}
2022-09-22T09:38:48.646+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx", "error": "failed to connect to peer 12D3KooWGHg7wajdG3Fur5jRZgTs58rN8esDwiTLVSTHkc7nVhBc: routing: not found", "isMintterPeer": false}
2022-09-22T09:38:52.037+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcbx54fi7vzqppxyvgp6leq56bozxr5gae7gsecevt6tg6zi3ju7sx", "error": "failed to connect to peer 12D3KooWQsjVvdQQDB5MhtPfiiLqWnQht6fdrj4pGKE44mfW2Ct6: routing: not found", "isMintterPeer": false}
2022-09-22T09:38:59.123+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko", "error": "failed to connect to peer 12D3KooWNDPUq58ijsqWa7pyc4jVo5yjJvgZ65Svdc6nkarFCNnq: routing: not found", "isMintterPeer": false}
2022-09-22T09:38:59.124+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663832318, "device": "bafzaajaiaejcafkj2kfjxjyucfdocc5fv67s2t6md542wfnrsr5u25m5e7ilxxqc", "error": "failed to connect to peer 12D3KooWBFU25VMZWDfnaovDaiwKQPYqUMVhhSGp1XqSsd8A4Sqw: failed to dial 12D3KooWBFU25VMZWDfnaovDaiwKQPYqUMVhhSGp1XqSsd8A4Sqw:\n  * [/ip4/23.20.24.146/udp/4002/quic/p2p/12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq/p2p-circuit] error opening hop stream to relay: failed to dial 12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq:\n  * [/ip4/0.0.0.0/udp/4002/quic] no transport for protocol\n  * [/ip4/23.20.24.146/udp/4002/quic] no transport for protocol\n  * [/ip4/0.0.0.0/tcp/4002] dial tcp4 0.0.0.0:4002: connect: connection refused"}
2022-09-22T09:38:59.124+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663832318, "device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko", "error": "failed to connect to peer 12D3KooWNDPUq58ijsqWa7pyc4jVo5yjJvgZ65Svdc6nkarFCNnq: routing: not found"}
2022-09-22T09:38:59.124+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663832318, "device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx", "error": "failed to connect to peer 12D3KooWGHg7wajdG3Fur5jRZgTs58rN8esDwiTLVSTHkc7nVhBc: routing: not found"}
2022-09-22T09:38:59.124+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663832318, "device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o", "error": "failed to connect to peer 12D3KooWSEktKoE2uZwedR1yXxAsqutheQDBGnH7Q171uTnc72VK: routing: not found"}
2022-09-22T09:38:59.124+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663832318, "device": "bafzaajaiaejcbx54fi7vzqppxyvgp6leq56bozxr5gae7gsecevt6tg6zi3ju7sx", "error": "failed to connect to peer 12D3KooWQsjVvdQQDB5MhtPfiiLqWnQht6fdrj4pGKE44mfW2Ct6: routing: not found"}
2022-09-22T09:38:59.124+0200	INFO	mintter/syncing	syncing/syncing.go:196	SyncLoopFinished	{"traceID": 1663832318, "failures": 5, "successes": 0}

fail after step 7 when connection tcp only

2022-09-22T10:10:25.312+0200	INFO	mintter/syncing	syncing/syncing.go:176	SyncLoopStarted	{"traceID": 1663834225}
2022-09-22T10:10:25.312+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o"}
2022-09-22T10:10:25.312+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko"}
2022-09-22T10:10:25.312+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx"}
2022-09-22T10:10:36.428+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx", "error": "failed to connect to peer 12D3KooWGHg7wajdG3Fur5jRZgTs58rN8esDwiTLVSTHkc7nVhBc: routing: not found", "isMintterPeer": false}
2022-09-22T10:10:41.394+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko", "error": "failed to connect to peer 12D3KooWNDPUq58ijsqWa7pyc4jVo5yjJvgZ65Svdc6nkarFCNnq: routing: not found", "isMintterPeer": false}
2022-09-22T10:10:49.936+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o", "error": "failed to connect to peer 12D3KooWSEktKoE2uZwedR1yXxAsqutheQDBGnH7Q171uTnc72VK: routing: not found", "isMintterPeer": false}
2022-09-22T10:10:49.936+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663834225, "device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx", "error": "failed to connect to peer 12D3KooWGHg7wajdG3Fur5jRZgTs58rN8esDwiTLVSTHkc7nVhBc: routing: not found"}
2022-09-22T10:10:49.936+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663834225, "device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o", "error": "failed to connect to peer 12D3KooWSEktKoE2uZwedR1yXxAsqutheQDBGnH7Q171uTnc72VK: routing: not found"}
2022-09-22T10:10:49.936+0200	DEBUG	mintter/syncing	syncing/syncing.go:189	SyncLoopError	{"traceID": 1663834225, "device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko", "error": "failed to connect to peer 12D3KooWNDPUq58ijsqWa7pyc4jVo5yjJvgZ65Svdc6nkarFCNnq: routing: not found"}
2022-09-22T10:10:49.936+0200	INFO	mintter/syncing	syncing/syncing.go:196	SyncLoopFinished	{"traceID": 1663834225, "failures": 3, "successes": 2}

crash after restarting in step 8 when step 2 was with tcp + udp addresses

julio@workstation:~/Documents/mintter$ ./dev run-backend
2022-09-22T09:39:17.842+0200	DEBUG	mintter/repo	ondisk/ondisk.go:183	AccountInitialized	{"accountID": "bahezrj4iaqacicabciqejzrtjik6l5zdbozawzvpug6bamw3vgktan5lwvfvihtfbh2eypa"}
2022-09-22T09:39:17.848+0200	INFO	mintter/daemon	daemon/daemon.go:154	DaemonStarted	{"grpcListener": "[::]:55002", "httpListener": "[::]:55001", "repoPath": "/home/julio/.mtt"}
2022-09-22T09:39:17.851+0200	DEBUG	mintter/network	mttnet/mttnet.go:209	P2PNodeStarted
2022-09-22T09:39:19.599+0200	DEBUG	mintter/wallet	wallet/wallet.go:253	Only one type of lndhub.go wallet is allowed: Only one lndhub.go wallet is allowed and we already had one
2022-09-22T09:39:32.857+0200	INFO	mintter/network	mttnet/mttnet.go:276	BootstrapFinished	{"peersTotal": 6, "failedConnectionsTotal": 4}
2022-09-22T09:39:32.857+0200	DEBUG	mintter/network	mttnet/mttnet.go:287	BootstrapConnectionError	{"peer": "QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN", "error": "no good addresses"}
2022-09-22T09:39:32.857+0200	DEBUG	mintter/network	mttnet/mttnet.go:287	BootstrapConnectionError	{"peer": "QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa", "error": "no good addresses"}
2022-09-22T09:39:32.857+0200	DEBUG	mintter/network	mttnet/mttnet.go:287	BootstrapConnectionError	{"peer": "QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb", "error": "no good addresses"}
2022-09-22T09:39:32.857+0200	DEBUG	mintter/network	mttnet/mttnet.go:287	BootstrapConnectionError	{"peer": "QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt", "error": "no good addresses"}
2022-09-22T09:39:32.857+0200	DEBUG	mintter/syncing	syncing/syncing.go:143	SyncingServiceStarted
2022-09-22T09:39:32.857+0200	INFO	mintter/daemon	daemon/daemon.go:176	P2PNodeStarted	{"listeners": ["/p2p-circuit","/ip4/127.0.0.1/tcp/55000","/ip4/192.168.1.68/tcp/55000","/ip4/172.17.0.1/tcp/55000","/ip4/172.19.0.1/tcp/55000","/ip4/172.20.0.1/tcp/55000","/ip4/172.29.0.1/tcp/55000","/ip6/::1/tcp/55000"]}
2022-09-22T09:39:33.885+0200	DEBUG	autorelay	autorelay/relay_finder.go:652	starting relay finder
2022-09-22T09:39:34.408+0200	DEBUG	autorelay	autorelay/relay_finder.go:335	node supports relay protocol	{"peer": "12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq", "supports circuit v2": true}
2022-09-22T09:39:34.493+0200	DEBUG	autorelay	autorelay/relay_finder.go:335	node supports relay protocol	{"peer": "12D3KooWGvsbBfcbnkecNoRBM7eUTiuriDqUyzu87pobZXSdUUsJ", "supports circuit v2": true}
2022-09-22T09:39:34.493+0200	DEBUG	autorelay	autorelay/relay_finder.go:286	processed all static relays
2022-09-22T09:39:34.597+0200	DEBUG	autorelay	autorelay/relay_finder.go:483	adding new relay	{"id": "12D3KooWNmjM4sMbSkDEA6ShvjTgkrJHjMya46fhZ9PjKZ4KVZYq"}
2022-09-22T09:39:34.736+0200	DEBUG	autorelay	autorelay/relay_finder.go:483	adding new relay	{"id": "12D3KooWGvsbBfcbnkecNoRBM7eUTiuriDqUyzu87pobZXSdUUsJ"}
2022-09-22T09:40:32.857+0200	INFO	mintter/syncing	syncing/syncing.go:176	SyncLoopStarted	{"traceID": 1663832432}
2022-09-22T09:40:32.858+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcbx54fi7vzqppxyvgp6leq56bozxr5gae7gsecevt6tg6zi3ju7sx"}
2022-09-22T09:40:32.858+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcbobscxbgajtodc5dec4pm3whuem6nqv4tvwn3kcmac3d64xjqvko"}
2022-09-22T09:40:32.858+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcb472txdxwhtfvk74fekdimwcytjsdjtgnnovggpc352ugtitsh2o"}
2022-09-22T09:40:32.858+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcafkj2kfjxjyucfdocc5fv67s2t6md542wfnrsr5u25m5e7ilxxqc"}
2022-09-22T09:40:32.858+0200	DEBUG	mintter/network	mttnet/connect.go:38	ConnectStarted	{"device": "bafzaajaiaejcaybfdh3xmzmauq6mg3zlqbopnapanq65qosg2yzfpqg2jinojabx"}
2022-09-22T09:40:35.013+0200	DEBUG	mintter/network	mttnet/connect.go:40	ConnectFinished	{"device": "bafzaajaiaejcafkj2kfjxjyucfdocc5fv67s2t6md542wfnrsr5u25m5e7ilxxqc", "isMintterPeer": false}
panic: proof not found

goroutine 10569 [running]:
mintter/backend/mttnet.(*Node).handshakeInfo.func1()
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/mttnet/connect.go:177 +0x206
sync.(*Once).doSlow(0xc0006fd900?, 0x19ebda0?)
	/nix/store/dlpbgh2hqzk83gs4xhz3r1b0x24hfcpw-go-1.18.2/share/go/src/sync/once.go:68 +0xc2
sync.(*Once).Do(...)
	/nix/store/dlpbgh2hqzk83gs4xhz3r1b0x24hfcpw-go-1.18.2/share/go/src/sync/once.go:59
mintter/backend/mttnet.(*Node).handshakeInfo(0xc0000f1e60?, {0x19ebda0?, 0xc00470a660?})
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/mttnet/connect.go:157 +0x6e
mintter/backend/mttnet.(*Node).Connect(0xc0000f1e60, {0x19ebda0, 0xc00470a660}, {{0xc0043495f0, 0x26}, {0x2692e60, 0x0, 0x0}})
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/mttnet/connect.go:66 +0x4dd
mintter/backend/mttnet.(*Node).Client(0xc0000f1e60, {0x19ebda0, 0xc00470a660}, {{0xc004349530?, 0x5c?}})
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/mttnet/mttnet.go:174 +0x11a
mintter/backend/syncing.(*Service).syncWithPeer(0xc0011fc640, {0x19ebda0, 0xc00470a660}, {{0xc004349530?, 0xc0006c6f60?}})
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/syncing/syncing.go:371 +0x63
mintter/backend/syncing.(*Service).Sync.func1(0x1, {{0xc0043d0fc0, 0x26, 0x26}, 0x2, 0x2})
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/syncing/syncing.go:254 +0x225
created by mintter/backend/syncing.(*Service).Sync
	/home/julio/Documents/mintter/plz-out/tmp/backend/mintterd._build/backend/syncing/syncing.go:240 +0x61c

Desktop (please complete the following information):

  • OS: Ubuntu 22.04

Additional context Add any other context about the problem here.

@juligasa juligasa added type: bug Something isn't working scope: daemon Daemon and p2p networking priority: high Critical to have platform: linux Relevant for Linux languages::Go labels Sep 22, 2022
@juligasa juligasa changed the title Can't reconnect on network restart Can't reconnect on network restart + Crash afterwards Sep 22, 2022
@burdiyan
Copy link
Collaborator

I think we could try 2 options:

  1. Enable QUIC back in the daemon. The new update should fix the stuff that was failing for us.
  2. Disable QUIC in the relay and only leave TCP. That would be a bit unfortunate, because QUIC seems to be the way to go, but if it does the trick, then it's OK.

@juligasa
Copy link
Collaborator Author

After removing the udp/4002quic addresses from the relay the panic never happened. The rest of the issue is related to #774

@GaboHBeaumont GaboHBeaumont changed the title Can't reconnect on network restart + Crash afterwards [bug] Can't reconnect on network restart + Crash afterwards Oct 3, 2022
@juligasa juligasa closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform: linux Relevant for Linux priority: high Critical to have scope: daemon Daemon and p2p networking type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants