Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't start market node after upgrade #177

Closed
benjaminh83 opened this issue Feb 15, 2022 · 4 comments
Closed

Can't start market node after upgrade #177

benjaminh83 opened this issue Feb 15, 2022 · 4 comments
Assignees

Comments

@benjaminh83
Copy link

Issue: Market node crashing when starting up. ERROR:

2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqk54tcailxwsaqkfsmacu2h43e4z2ytn2xqwj762guubryalaaega"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqhmohnt4uukhxf3enrgoyn5yldhe6hk3ebltr7jan57j6kkytokfq"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:358 registering deal in dagstore with lazy init     {"deal_id": 2985231, "piece_cid": "baga6ea4seaqgj2d6h7pzqlvld5qdc5kfnjuxsbxagn2qk253vwvhpxbwehrukmq"}
ERROR: creating node: starting node: failed to connect index provider host with the full node: failed to connect index provider host with the full node: failed to dial 12D3KooWNCcog7KWPsjWa1FmKyqeTeBxxc5cJf27X6vKBVqm1mW3:
  * [/ip4/127.0.0.1/tcp/10231] dial tcp4 127.0.0.1:10231: connect: connection refused
  * [/ip4/192.168.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->192.168.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] dial tcp4 0.0.0.0:35397->x.x.x.x:10231: i/o timeout

OBS: My market nodes runs on a separate physical server and needs to connect to the daemon over the network.
Network connectivity is fine and system was just running perfectly on v1.14.0.
It is only allowed to connect on the internal network, so I would believe the error here is on these lines:

[/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer

Lotus Daemon and Miner is running the same version, as the Market node: master-spx.idxprov.rc-1

$ lotus-miner version
Daemon:  1.15.0-dev+mainnet+git.1bf7e6a40+api1.3.0
Local: lotus-miner version 1.15.0-dev+mainnet+git.1bf7e6a40 

The Lotus-miner is also connecting externally on the daemon, so I know for sure the daemon is reachable and is currently connected to the lotus-miner, as it is running...

masih added a commit to filecoin-project/lotus that referenced this issue Feb 15, 2022
Add the peer ID of index provider host to the list of protected peers
before connecting to full node. Otherwise, it is possible for the
connection to be reset by full node before we reach the line that adds
the ID to list of protected peers via JsonRPC API.

Relates to:
 - ipni/index-provider#177
masih added a commit to filecoin-project/lotus that referenced this issue Feb 15, 2022
Add the peer ID of index provider host to the list of protected peers
before connecting to full node. Otherwise, it is possible for the
connection to be reset by full node before we reach the line that adds
the ID to list of protected peers via JsonRPC API.

Relates to:
 - ipni/index-provider#177
masih added a commit to filecoin-project/lotus that referenced this issue Feb 15, 2022
Update to the head of the PR that introduces indexing integration in
`go-fil-markets` so that failure to connect to full node is logged only
instead of crashing markets process.

Relates to:
 - filecoin-project/go-fil-markets#673
 - ipni/index-provider#177
masih added a commit to filecoin-project/lotus that referenced this issue Feb 15, 2022
Update to the head of the PR that introduces indexing integration in
`go-fil-markets` so that failure to connect to full node is logged only
instead of crashing markets process.

Relates to:
 - filecoin-project/go-fil-markets#673
 - ipni/index-provider#177
@masih
Copy link
Member

masih commented Feb 15, 2022

Suspected root cause:

  • Markets process connects to daemon over libp2p before the markets peer ID is added as a protected peer.
  • Connection is reset before the code reaches the JSON RPC call to protect connection.
  • Failure to connect causes a hard crash of markets process.

Mitigation:

New release with fixes tagged: https://github.com/filecoin-project/lotus/releases/tag/master-spx.idxprov.rc-2

@benjaminh83 to attempt redeployment with rc-2 tag please.

@benjaminh83
Copy link
Author

Now running rc-2:

$ lotus-miner version
Daemon:  1.15.0-dev+mainnet+git.031bfaf12+api1.3.0
Local: lotus-miner version 1.15.0-dev+mainnet+git.031bfaf12

market process exit with identical ERROR as above :(

@benjaminh83
Copy link
Author

Solved for now by manually inserting in .lotus/config.toml

[Libp2p]
   ListenAddresses = ["/ip4/0.0.0.0/tcp/10231"]
   ProtectedPeers = ["12D3KooWR8nkLuyBc4VFsN5r8EYWoVHuE9z8SZJRq85z562Hpw5J"]

Using the command lotus net protect 12D3KooWR8nkLuyBc4VFsN5r8EYWoVHuE9z8SZJRq85z562Hpw5J did not seem functional.

@masih
Copy link
Member

masih commented Feb 15, 2022

The issue seems to be that NetProtectAdd API does not update the list of protected peers on daemon libp2p host connection manager.
After manually adding the peer ID of markets process to ProtectedPeers in [Libp2p] section of daemon config the connection from markets to daemon was persistent and successful.

Next step here is to investigate connection manager interaction in NetAPI on lotus daemon node.

Many thanks @benjaminh83 for bearing with me as I debugged this issue.

@masih masih closed this as completed Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants