-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Light clients shouldn't insert themselves in the DHT #3303
Comments
This seems fairly simple, but it's not super clear where a client adds itself to kademlia. substrate/core/network/src/discovery.rs Lines 105 to 108 in eda5fe5
|
No. Nodes insert you in their buckets when you connect to them. That code is in libp2p-kad. This issue is IMO extremely complex to tackle and requires some research efforts. |
Oh ok, good to know! Do you thing you could mark this issue as |
To give some context. We use the DHT in order to discover nodes to connect to. Right now, full and light nodes are both present in the DHT. What can happen at the moment is that, by randomly walking through the DHT to discover nodes, if the number of light nodes compared to the number of full nodes is high enough, then we might only discover light nodes. This is bad, because what we need is connections to full nodes in order to function properly. Light nodes are purely parasites at the moment. We also have no way to know whether a node is full or light when we find it in the DHT. We have to connect to it and ask. In order to solve that problem, the solution that this issue title implies is that only full nodes are present in the DHT. Both light nodes and full nodes would find only full nodes to connect to through the DHT. Light nodes would also therefore never connect to each other directly. |
This issue isn't urgent at the moment because the ratio of light nodes per full node on the existing networks is something like 1 to 100. Nobody uses light clients at the moment, except occasionally someone who tries if they work. However if we start advertising light clients, or releasing a UI containing a light client for example, then this issue will need to be tackled first. |
So I suppose a solution would be to allow nodes to send a message saying that they shouldn't be added into the DHT? |
What are the key and record for this DHT? If our key or record includes a cryptographic key, then we could ask that DHT entries be signed by their key, so nodes must explicitly ask for inclusion. We're maybe worried about adversarial spam DHT entries eventually, which makes everything harder. We could however privilege buckets whose key played some on-chain role and randomly drop others when the DHT came under excessive load, but.. We've have many roles for the relay chain already with some that sound tricky to recognize, ala fishermen. And parachain specific roles make this much worse. If this becomes our approach then we could still punt on classifying the roles for quite a while. We cannot easily recognize a "full node" in such an adversarial setting. I could find some tricks like using H(KEX(bucket_holder_key,bucket_maker_key) || time) to identify some chain state the bucket maker must tell the bucket holder that the bucket holder should already know and can verify as correct. Anyways, my first question is simply: How far will simply asking the DHT entries be signed go? Even if we ask for nothing about the signing key? |
We're not using the key/value system of Kademlia, but only the The keys are therefore the identities of the nodes, and there's no associated value. |
I see. You want nodes to make claim about their roles or desired roles when introducing themselves initially then? |
Yes, that's one possibility. I also feel like there should be a way to extend this mechanism for nodes belonging/collating for parachains for example. I would therefore put this issue in the "DHT research" bucket. |
I thought w3f/parity wrote its own version of libp2p in rust, isn't this simply a case of pinging the author of that and ask him to provide a knob to do what this issue requires (i.e. for certain nodes to not add themselves to the DHT)? |
We change rust-libp2p as we see fit :) which makes @tomaka the relevant author. |
Ultimately the DHT needs an access policy that prevents light clients from adding themselves even if they tried to. |
The first problem is that it's not as trivial as you seem to think. Adding a handshake saying whether we are full or light ties Kademlia to Substrate and adds lots of additional roundtrips compared to right now. Also, any modification to the libp2p-kad code would obviously violate the specs. |
@tomaka I meant a local-only option, at least for the time being. A light client knows it's a light client and can just omit the DHT step. |
Nodes don't choose to insert themselves in the DHT, they get immediately inserted by others when they open a connection. |
I agree we should do this in a "compelling" and "correct" way. :) At the libp2p layer, we should ideally provide protocol labs at least some solid technical reasons to follow our lead because our Go implementation should actually use go-libp2p, so protocol labs accepting the Go teams PRs helps us. |
Further discussion of restricting peers in the Kademlia routing table happens in libp2p/rust-libp2p#1560. The restriction ability in combination with the |
We didn't consider this issue when doing #6549, but the foundations should now be in place. |
substrate already supports multiple dht's. would it be possible to have light clients insert themselves in a different dht, thus allowing them to still connect to each other? |
I suppose but it sounds non-scalable if you mean true unaffiliated light clients. We'll already want this for more structured stuff like parachains. |
According to [0] it is possible to achieve subsecond lookups with a median latency of 200ms in a kademlia dht with 9.5mio nodes. How many nodes is it reasonable to expect? Maybe we can deal with >9.5mio nodes when we get there? |
There are over 1 billion visa cards in the world, many affiliated with some phone, and like 30 million visa merchants. ;) We want nodes to play specific roles in chains or in layer two systems, and different roles obtain different evidence from chains, so the term "light client" alone makes little sense. We're working on techniques for one chain's full nodes to track another chain, especially for parachain nodes to track the relay chain, and for validators assigned to a parachain to talk to collators from that parachain. We do ask parachain collators to be full node of the relay chain right now, which limits things considerably. |
Why? There's much more incentive to attack the Polkadot DHT than the bittorrent DHTs. The main thing we're concerned about here is spam - we don't want the validator address book to get spammed, which prevents people from finding validators. |
That's why I suggested adding them in a separate dht. Practical applications require some way for mobile or web applications that use substrate, to be able to communicate with each other. So while you might build something like youtube on substrate, to handle micro payments to content producers, it's not a good idea for content producers to try to insert their content into a transaction. So in that example, they could add their content to ipfs and publish the cid on chain and users could query the chain for the content they want. If this particular example is a good fit for blockchains in general, I don't know, but obviously it's very limiting in terms of what you can do if you only have thin clients. |
To add some context, substrate exposes the NetworkService and a generic request response protocol as a public api. Maybe it's only intended to be used in very specific ways by polkadot. In that case it should be documented as a polkadot api that should not be used by other applications. |
A big concern with light clients is that they might briefly connect to the network just to send a transaction, then disappear. We don't want this type of ephemeral node to play a role in the DHT. The problem at the moment is that as soon as we receive an incoming connection from a node of the same chain, we had this node to our local k-buckets. While it is not a huge problem, as the node will eventually be removed its k-bucket when we realize it is unreachable, it is still somewhat of a pollution. I don't think we should have a code that says |
Update on the issue: after #6549, the only thing left to do is for light clients to no longer advertise support for the Kademlia protocol. Marking as easy, might need some minor changes in libp2p. |
Cross-referencing libp2p's Kademlia client mode here libp2p/rust-libp2p#2025 (comment). |
Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions. |
This is implemented for the listening side, however the light client still advertises support for Kademlia. |
Light client support has been removed from Substrate altogether, so this is irrelevant. |
EDIT: (see description below)
The text was updated successfully, but these errors were encountered: