Replies: 4 comments 4 replies
-
@mxinden @thomaseizinger Sorry in advance for the tag if you already get notifications on discussions, but I wanted to make sure the SMEs were notified 😄 Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
I see it as a safe default. As a node in a large network, one only needs to store a small amount of data.
I don't think this use-case ever came up. Actually, I don't think
:D Not at all. |
Beta Was this translation helpful? Give feedback.
-
While we're talking about modifying |
Beta Was this translation helpful? Give feedback.
-
Closing this in favor of #3035. Open to discussing new APIs and usecases there: #3035 |
Beta Was this translation helpful? Give feedback.
-
I'll start by suggesting that this may be a naive question as I am not intimately familiar with the overall Kademlia protocol...so please bear with me :)
Premise
I'm looking to implement a DHT that will have an arbitrarily large number of keys...say, in the millions. In practice, I expect the key space to be distributed over a wide number of nodes, but initially and in contrived testing scenarios, they will be only on a handful of nodes. Additionally, this DHT will only be storing provider information, i.e. "for key X, you can connect to peer Y to go get it." It will not be storing nor obviously serving data records itself. So think of it as only being used as a routing table for keys.
Situation
Looking at the rust-libp2p implementation of KAD-DHT, I'm seeing this concept of a
RecordStore
that stores bothRecord
s andProviderRecord
s. This trait has one canned implementation,MemoryStore
. Looking at the definition ofRecordStore
and the implementation ofMemoryStore
and its default configuration, I noticed a few things:MemoryStore
's default configuration is fairly constrained, i.e. a small max number of keys and providers.RecordStore
operations are defined to only be fallible in ways that exceed configured constraints (number of keys/providers, size of value). It doesn't allow for any other implementation-defined fallibility semantics.Problem
Due to my premise, where I will have many many keys, and any particular node could serve an arbitrarily high number of these keys (including all of them), I am running into a few issues with the current implementation.
First,
MemoryStore
's default config is obviously too low. Bumping that number up is easy enough. But, it begs the question why have integrated constraints like this at all? Is there protocol-level semantics for "value too large" and "max keys reached" or was that just a rust-libp2p decision, and really there could be any implementation-defined way a key may fail to store/be retrieved/etc.?Second, as mentioned each node will potentially be providing a large number of keys. It is safe to assume that this list of keys will be fairly expensive to re-load into memory every time the node starts up and joins the network. So rather than using the
MemoryStore
, I wanted to implement my ownRecordStore
backed by something persistent, like RocksDB. A node will always have a consistent view of what keys it can provide to the rest of the DHT network, and I thought it might be prudent to have aRecordStore
implementation that could persist and stay consistent with that view; if the node is recycled (e.g. an upgrade to the software) it can pick back up where it left off without an expensive startup to prime the RecordStore.Well, that isn't very conducive given the current
RecordStore
trait definition, going back to my earlier point. The trait methods are defined as if they could never fail, with the exception of the "too large"/"too many" constraints. So if I fail to load up my underlying persistent store, fail to retrieve a key, fail to store a key, etc...with the current trait definition, those errors would have to be silently ignored. And given that a DHT has to be robust and self-correcting anyways, that's probably fine conceptually. But still, I wondered why the trait is designed the way it currently is, if not for that exact reason?Lastly, I wonder if what I am doing is completely ludicrous for KAD-DHT. I figured if it can power IPFS, it would certainly work for my use case. Surely IPFS has some notion of persistent state for its DHT and what keys a node advertises as a provider. Is there another approach I should take with KAD-DHT instead? I essentially am only going to use it for two operations:
Kademlia::start_providing
. And many nodes will do the same for the same key.Kademlia::get_providers
query for the key, and if one is found, will contact that node using a separate protocol to attempt to retrieve the data (and if that fails, try another provider, and so on and so forth).Thanks in advance for any help and advice!
Beta Was this translation helpful? Give feedback.
All reactions