-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHT Optimistic Provide #25
Comments
Copy pasting my first comment from the slack thread for continuity: “Optimistic providing” is a very promising direction and the repo presentation is nice and clear. The provide process visualizations would are great to have going forward. And even the distribution graphs for provide and find would be awesome to have as occasional automated reports in our infra. The project proposes a procedure for estimating the number of peers closer to a key than a given peer, assuming knowledge of the total count of peers. This procedure should be investigated more carefully:
The “optimistic provide” ideation also suggests a slightly different approach to "optimistic providing" which avoids the imprecision in the estimation procedure (above). I would investigate this direction and I think it has good potential. Provide optimistically, using the following algorithm:
The key point here is that when the condition "the two queries intersect" is met, we have probabilistically verified that if a client is to search for the key they would also discover the intersection peers. The second query in the provide algorithm essentially simulated a future client. Doing it this way avoids making estimates, which can be imprecise. |
@yiannisbot mentioned that additional graphs (not yet shown here) suggest that under some parameterization of the "optimistic provide" algorithm (that uses an estimator), clients find the peers where content was provided optimistically in 1 to 2 hops. This is good news, however we need a bit more to trigger potential implementation:
|
Hi @petar, thanks for the quick and thoughtful reply! My overall takeaways are:
Another thing that I'm still thinking about: Would the provide process stop when 20 records (or ten in your case) were put optimistically or would the process still continue. I can imagine that there are cases where peers are not able to find content because of the optimistic nature. I guess, the success metrics measurements could help answering that question. [1] https://eighty-twenty.org/2013/03/26/span-lengths-in-chord-style-dhts |
This is precisely the estimator which suffers from large inaccuracies due to the uneven distribution of random IDs (some IDs are log N-times closer together than others). You are welcome to investigate different estimators and I would be happy to follow your findings. However, my theoretical experience suggests that this is likely not a fruitful direction: I don't think it is possible to estimate network size sufficiently accurately based on local observations (i.e. things like "my distance to nearest peer"). In order to understand this phenomenon of "varying density" of peers in the key space, you could do the following experiment:
This is an alternative way of seeing that the "distance to my closest peer, using the normed XOR distance" varies wildly from peer to peer. (The normed XOR distance between two peers equals 2^{- the depth of the peers in the trie}.) |
The process would stop. It's key to understand the idea driving this algorithm. At high level, both algorithms are trying to ensure that the predicate "if someone looks for the CID I am providing, their search for the CID should lead them to the nodes where I decided to provide the records" is true probabilistically with respect to the key space. The estimator-free algorithm is ensuring this directly by "probabilistically proving" that the provided records will be found by other users:
In fact, this guarantee can be strengthened if necessary by running 3 (instead of 2) queries in parallel and placing the records where they all intersect. |
The big picture here is this: Both estimator-based and estimator-free provide algorithms would work (in theory), when the correct parameters are used. The question is which one is less costly for the benefit of achieving better latency.
The two costs need to be compared to pick a winner. Both algorithms must be proven to work correctly, which must be demonstrated by showing that records that are provided optimistically at most places in the keys pace can be found by a subsequent lookup. |
Thanks again, @petar! I did some simulations based on your input and the links that I've posted above. I definitely could get a feeling for the inaccuracies of the proposed estimator. I basically recreated similar plots to [1]. I'm leaving here some research about how to locally estimate the size of a peer-to-peer network [2] [3]. Regarding performance indicators/success metrics I would consider:
I'll have a chat with @yiannisbot about pushing this topic forward next week. I'm definitely motivated to further work on this and also come up with statistical models that could assess the incurred costs of either approach. [1] https://eli.sohl.com/2020/06/05/dht-size-estimation.html |
For anyone watching this discussion, this functionality is rolling out into the Kubo IPFS implementation (see https://protocollabs.notion.site/Optimistic-Provide-2c79745820fa45649d48de038516b814 and ipfs/kubo#9753) |
Copying the forum post to this repository.
Hi everyone,
Over the last weeks, some ResNetLab collaborators (including me) conducted IPFS uptime and content routing measurements with great support from @yiannisbot. Some results can be found here, here and here. In parallel, I went ahead and built another DHT performance measurement tool for the particular use case of investigating the provide performance as it lags far behind the discovery performance.
While investigating how the whole machinery works I had an idea of how to speed things up that I would like to share with you and also would love to get feedback on.
I would call it an "Optimistic Provide" and the TL;DR would be:
For the long version, I'm copying relevant parts from this repository
https://github.com/dennis-tra/optimistic-provide
which includes more (but partially irrelevant) plots than I'm sharing here in this post.
Motivation
When IPFS attempts to store a provider record in the DHT it tries to find the 20 closest peers to the corresponding
CID
using the XOR distance.To find these peers, IPFS sends
FIND_NODES
RPCs to the closest peers in its routing table and then repeats the process for the set of returned peers.There are two termination conditions for this process:
This can lead to huge delays if some of the 20 closest peers don't respond timely or are straight out not reachable.
The following graph shows the latency distribution of the whole provide-process for 1,269 distinct provide operations.
In other words, it shows the distribution of how long it takes for the
kaddht.Provide(ctx, CID, true)
call to return.At the top of the graph, you can find the percentiles and total sample size. There is a huge spike at around 10s which is probably related to an exceeded context deadline - not sure though.
If we on the other hand look at how long it took to find the peers that we eventually attempted stored the provider records at, we see that it takes less than 1.6s in the vast majority of cases.
Again, sample size and percentiles are given in the figure title. The sample size corresponds to
1269 * 20
as in everyProvide
-run we attempt to save the provider record at 20 peers.The same point can be made if we take a look at how many hops it took to find a peer that we eventually attempted to store the provider records at:
Note the log scale of the
y
-axis. Over 98 % of the time, an appropriate peer to store the provider record at was found in 3 hops or less.Optimistic Provide
The discrepancy between the time the provide operations take and the time it could have taken led to the idea of just storing provider records optimistically at peers.
This would trade storing these records on potentially more than 20 peers for decreasing the time content becomes available in the network.
Further, it requires a priori information about the current network size.
Procedure
Let's imagine we want to provide content with the CID
C
and start querying our closest peers.When finding a new peer with Peer ID
P
we calculate the distance to the CIDC
and derive the expected amount of peersμ
that are even closer to the CID (than the peer with peer IDP
).If we norm
P
andC
to the range from0
to1
this can be calculated as:Where
N
is the current network size and|| . ||
corresponds to the normed XOR distance metric.The logic would be that if the expected value
μ
is less than 20 peers we store the provider record at the peerP
.This threshold could also consider standard deviation etc. and could generally be tuned to minimize falsely selected peers (peers that are not in the set of the 20 closest peers).
Example
The following graph shows the distribution of normed XOR distances of the peers that were selected to store the provider record to the CID that was provided.
The center of mass of this distribution is roughly at
0.1 %
. So if we find a peer that has a distance of|| P - C || = 0.1 %
while the network has a size ofN = 7000
peers we would expect to find7
peers that are closer than the one we just found.Therefore, we would store a provider record at that peer right away.
Network Size
Since the calculation above needs information about the current network size there is the question of how to get to that information locally on every node. I could come up with three strategies:
As said above, measurement methodology and more information are given in this repository.
Generally, I would love to hear what you think about it, if I made wrong assumptions, if I got my maths wrong, if anything else seems fishy, etc 👍
Best,
Dennis
PS: @yiannisbot pointed me to this IPFS Content Providing Proposal. If the proposal outlined in this post proofs to be viable, I would consider it a fifth option of techniques to speed things up.
The text was updated successfully, but these errors were encountered: