autonat (?): node frequently changes its mind about its reachability status #2046
Labels
effort/days
Estimated to take multiple days, but less than a week
kind/bug
A bug in existing code (including security flaws)
need/analysis
Needs further analysis before proceeding
Using the event bus metrics (#2038) on a Kubo node with Accelerated DHT client enabled, it looks like the node somewhat frequently changes its mind about its reachability status. From the event metrics, we won't be able to tell what it thinks the availability is, but on a public node I wouldn't expect any changes on a node that has been running for more than 5 minutes or so.
This can have interesting consequences on higher layers. For example, when a node goes private, it will leave the DHT by switching to client mode. I'm wondering how much of the observed churn can be attributed to this.
As expected, this is accompanied by a change in supported protocols, presumably the DHT switching back and forth between client and server mode:
Unexpectedly, we don't observe any change in local addresses. It's not clear to my why we're not obtaining a relay reservation. Maybe we're switching back and forth too quickly to actually obtain the reservation? Alternatively, there could also be a bug in AutoRelay.
This issue suggests that it would be valuable to pick up the AutoNAT metrics (#2017) next. This will hopefully give us a better understanding of what's going on.
cc @Jorropo @dennis-tra @yiannisbot
The text was updated successfully, but these errors were encountered: