-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Split peer slots between full and light nodes #10688
Conversation
"each big refactoring has lead to months of tedious debugging sessions" -> I hope that was not your best motivation speech ever 😝 Anyways, I will go through the code, but I guess it needs a few commits to get all CI checks green yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is typical to what I see around this part of the codebase. If you look at this single patch, it is easy to reason about it. But you have to look below and above to find a simple invariant:
in:u32 + out:u32 = in_light + (in_full + out) = light:usize + full:usize
There is lots of accidental complexity in the logic where we construct objects from the configuration. We might reason, parts of it is to stay backward compatible, but we still break that every now and then.
I am staring at the codebase for a month now trying to come up with a resolution to simplify this. Every protocol needs their own config parameters, and extra_sets
does not allow to provide them. I mention this, because I wonder if there are light-client specific configuration that are relevant for protocols in extra_sets
, too.
client/network/src/protocol.rs
Outdated
debug!(target: "sync", "Too many full nodes, rejecting {}", who); | ||
self.behaviour.disconnect_peer(&who, HARDCODED_PEERSETS_SYNC); | ||
return Err(()) | ||
} else if status.roles.is_light() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else is redundant, please get rid of it. I love that you used roles.is_light()
here, although it is actually always !roles.is_full()
, but having it made it easier to read the code. Why are we not worried here about overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.peers
contains both full and light peers, but self.sync
only contains full peers, so the subtraction should always succeed
I swear I opened an issue about this, but I can't find it. There's no need to "inject" a set of accept/deny conditions to the networking, you just have to add a new state to a peer, which is "waiting for acceptance/denial by the upper layers". This is not really doable right now because all new peers substreams events are reported to all the listeners, whereas making this accept/deny system work requires a specific higher-level owner for each peer-substream tuple.
I agree it's not great, but the fact that we're doing peer-to-peer networking is really helpful here, because nodes don't have to give a reason to deny a peer or close a substream. |
Merging, as we want to fork off Polkadot 0.9.16 today. |
bot merge |
* Split peer slots between full and light nodes * Rustfmt * Oops, accidentally removed a comma * Remove else
* Split peer slots between full and light nodes * Rustfmt * Oops, accidentally removed a comma * Remove else
Close #8610
This PR adds a new CLI option
--in-peers-light
whose value defaults to 100.The "existing slots" (
--out-peers
and--in-peers
) are now reserved only for full nodes, and these 100 new additional slots are reserved for light clients.Note that this concerns only the syncing, transactions, and GrandPa/Beefy protocols/peer sets. The parachains-related code and any other user-added protocol isn't concerned.
Motivation
The problem that this PR solves is that, currently, all the "in peers" of all nodes are always full. Since there is an equal number of "out peers" and "in peers", each full node on the network brings as many slots as it consumes.
Because some nodes are unfortunately behind a NAT, there are actually fewer available "in peers" than there are "out peers" being consumed, making the problem worse.
The obvious solution to that problem would be to increase the number of "in peers". While this solves the problem in theory, in practice older nodes have more chances of being discovered, and these older nodes would likely become full while younger nodes would not receive as many "in peers".
Having for example 125 full nodes connected to you consumes a huge amount of resources, as a node sends all transactions and GrandPa messages to all full nodes it is connected to.
This PR proposes another solution: the number of "in peers" for full nodes doesn't change, but we add new "in peers" specifically for light nodes. Having a light node connected to you consumes significantly fewer resources.
Actual behavior
For pragmatic reasons, the actual behavior of this PR is not what I've just described.
The code that assigns slots (the peerset) isn't aware of in/out slots, and I am too scared to do the necessary refactoring (paritytech/polkadot-sdk#556), as each big refactoring has lead to months of tedious debugging sessions.
Instead, in practice the node can right now open outbound substreams towards light clients.
This means that a certain number of slots, both out and in, are dedicated to full nodes, and the rest is dedicated to light nodes.
I think that this is fine, especially because light clients are typically unreachable and will be purged quite rapidly, and because light clients are not discoverable through the DHT but only because they've connected to us in the past.