Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autorelay: not discarding static relays on connectivity failures #1790

Closed
wants to merge 7 commits into from

Conversation

juligasa
Copy link

@juligasa juligasa commented Sep 30, 2022

Static relays are a special kind of relays, since the client has spent some time bootstrappin those relays. Thats why if the client (autorelay) looses connection with such a relays, it should not be treated like regular relays (e.g. discarding them) Two cases come to mind:

  1. If a static relay goes down (for maintenance purposes), typically the client has more than one static relay, so autorelay will start making slots reservations on the ones online. However, once the first static relay is up again, it should be part again of the candidate list. Current implementation, discard a failing static relay right away.
  2. If the client switchs networks or simply the network gets down for a while (not restarting the client since that would imply static relays being rescanned), then the client would discard all relays, and when back up again, it won't have any static relays available (unles autorelay gets resseted)

This can be solved in two ways, either not discarding a static relay on connection errors (and send it back to candidates) or implementing a timer to periodically call handleStaticRelay. I have implemented both, since the former responses quicker in case no 1 and the latter periodically rescues forgotten static relays in case no 2

This PR solves #1782

In the case of the client loosing network connection it needs to reconnect back to relays when network is online
@MarcoPolo MarcoPolo linked an issue Sep 30, 2022 that may be closed by this pull request
@p-shahi p-shahi added this to the Best Effort Track milestone Oct 3, 2022
@p-shahi p-shahi requested a review from MarcoPolo October 3, 2022 16:03
@p-shahi p-shahi added the P1 High: Likely tackled by core team if no one steps up label Oct 3, 2022
Copy link
Collaborator

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of, thanks a ton for the contribution and apologies for the delay in the review. Follow ups will be much faster here on out.

I don't follow why we need both strategies of trying to connect to all static relays but also keeping them in our candidates. Wouldn't keeping them in our candidates suffice for both scenarios?

Either way, I think we can do something much simpler here. What if we added a new channel similar to peerChan in func findNodes and call it staticRelayChan. We do the same logic (with backoffs and checking if we're at our max) and have a separate similar timer for this. This way we really only need changes in findNode. After the new timer fires we can fill staticRelayChan with the static relay nodes that we aren't currently connected to. Backoffs will work similar to how normal candidates are handled, and we don't risk polluting our candidates list with nodes that are unresponsive (if we hit maxCandidates of unresponsive state relay nodes we're stuck). It also keeps all the other logic the same. Candidates will keep getting removed and we don't keep unresponsive relays in the candidate list.

This is a bit like giving static relays priority when sourcing new relays.

What do you think?

p2p/host/autorelay/relay_finder.go Outdated Show resolved Hide resolved
} else if evt.Connectedness == network.Connected {
for _, r := range rf.conf.staticRelays {
if r.ID == evt.Peer {
rf.notifyMaybeConnectToRelay()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why we need this logic now. Aren't we already connected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the attempt to re-reserve slots when we are back online (when the client had intermitent network connection). In that case, the relay would have dropped all our reservations and we need to re-reserve

@juligasa
Copy link
Author

juligasa commented Oct 20, 2022

I don't follow why we need both strategies of trying to connect to all static relays but also keeping them in our candidates. Wouldn't keeping them in our candidates suffice for both scenarios?

The reason for the re-scaning is that if the client is down then the relay will drop all the reservations of that client and the next time the client is up it needs to re-reserve slots. Even if we did not drop any candidates in the time we were offline, they need to reserve slots again

What do you think?

Yes, that could work. I could implement something along those lines.

@MarcoPolo
Copy link
Collaborator

Yes, that could work. I could implement something along those lines.

Great! Very much appreciated if you do, but no expectation here. If you don't get around to it, I'll probably tackle this Thursday or Friday of this week.

@juligasa
Copy link
Author

juligasa commented Nov 3, 2022

Didn't have time at all to have it done. But I was in the process of doing the required changes. I have pushed them so you can work on top of them if you want.

c.staticRelayMinInterval = time.Hour
rand.Seed(time.Now().UnixNano())
var static_candidates = static
rand.Shuffle(len(static_candidates), func(i, j int) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the shuffle? Users may have a preferred order, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale here was: if asked only for 1 or 2 candidates (<len(static_candidates)), and those candidates happened to be always down, then one wouldn't be able to connect to any. Randomizing the process guarantees than you will pull different candidates as longs as len(static_candidates)>num. But yes, if order is important, the user can just increase num and get more candidates if previous ones haven't succeeded.

rand.Shuffle(len(static_candidates), func(i, j int) {
static_candidates[i], static_candidates[j] = static_candidates[j], static_candidates[i]
})
for i := 0; i < num%c.minCandidates; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so if num == c.minCandidates we do nothing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it should have been for i := 0; i <= (num-1)%c.minCandidates-1; i++ given that num >0. However, maybe its simpler returning just num candidates with a max of len(static_candidates). Although in WithPeerSource it says that MUST return at least numPeers and here should be pretty much the same, right? if num>len(static_candidates) we can't do much about it.

@MarcoPolo
Copy link
Collaborator

MarcoPolo commented Nov 11, 2022

I got a chance to look at this. Possibly dumb thought, but why not simply use WithPeerSource to supply the static relays and get rid of the separate code path to handle static relays? Essentially, static relays are just a special type of peer source: #1875

@juligasa
Copy link
Author

I got a chance to look at this. Possibly dumb thought, but why not simply use WithPeerSource to supply the static relays and get rid of the separate code path to handle static relays? Essentially, static relays are just a special type of peer source: #1875

Yes That could be done, however in order for to a relay to be retried when closed the peersource must be called again and serve the same static relays. This is now done in the refactor of witStaticRelays. I could have deleted all the static relays functions but then the burden of handling the peersource for static relays will go to the user. But that can be changed. Let me know what you think @MarcoPolo

@MarcoPolo
Copy link
Collaborator

I merged 1875. It includes a helper for creating a peersource from static relays. It should retry static candidates automatically when we loose a candidate.

@MarcoPolo MarcoPolo closed this Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

autorelay: don't discard static relays on disconnections
3 participants