-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFM17 - Provider Record Liveness #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good and well detailed report!
Great description of the Hoarder and very interesting results. Would it make sense to add a small section between the results and the conclusion discussing and suggesting which changes could be taken in order to improve IPFS, as you mentioned briefly in the conclusion? This section could contain recommendations for republish intervals, K replication parameter, Hydra nodes and the tradeoffs associated with these suggestions. The conclusion could then sum up the preferred changes described in this section.
The boxplots contain a lot of outliers stacking on top of each other. Would it be possible to have circles of different sizes according to the number of stacked outliers? This would help to visualize a bit better the outliers.
Could you update the folder name implementations/rfm-17-provider-record-liveness
to implementations/rfm17-provider-record-liveness
?
|
||
In Figure 1, we can observe that the CDF follows a linear pattern, with an average of 39.06 CIDs (shown below) in each of the normalized 256 bins displayed. Although the distribution is fairly homogeneous, we can still appreciate in Figure 2 that the PDF's max and min values are, 58 CIDs at 0.38 and 22 CIDs at 0.82, respectively. | ||
|
||
Despite the randomness of the bytes used to generate the _CIDs_ and the homogeneousness that the _SHA256_ encoding function provides might be affected by the relatively short size of the _CID_ dataset (10.000 CIDs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note: you could have directly generated random CIDs without having to generate random content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know it, do you have any link to the method or to an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen an implementation, but generating a random multihash (instead of hashing random data) should work too
|
||
**In-degree ratio** | ||
|
||
Following the same track of the comparison between K=20 with hydras and the current k=20 without hydras, Figure 42 shows the comparison between the in-degree ratio's percentage of the _PR Holders_ when the hydra filter is on and off. In the figure, we can appreciate that the participation follows a similar distribution, where the data set that includes the hydras has a slightly 5% more in-degree ratio. In both cases, the median never drops below 70%, which is achieved later on by the "with-hydras" dataset after ~32 hours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the figure, we can appreciate that the participation follows a similar distribution, where the data set that includes the hydras has a slightly 5% more in-degree ratio
Phrasing not very clear
|
||
The steady in-degree ratio measured for k=20 over +80 hours showed that the initial closest peers keep being the closest ones for more than 48 hours. This measurement dismisses any existing doubt about the healthiness of any existing _PR_, and it opens the possibility of decreasing the overhead of the network by increasing the _PR republish interval_. | ||
|
||
In a currently over-saturated network, where running a DHT Server is way more CPU and bandwidth consuming than a single DHT Client, any window of improvement has to be taken. Although reducing the _K_ value to 15 would imply a 25% overhead reduction, it implies a performance risk that should be considered more carefully. However, increasing the _PR republish interval_ seems a far more reasonable action to reduce the overhead without interfering with the performance and reliability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although reducing the K value to 15 would imply a 25% overhead reduction
25% of the number of Provider Records stored on each DHT server node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say reduction in PR-related processes (connection, bandwidth, CPU, storage related to sending and storing PRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiannisbot I would say that it is only the storage that gets affected. Connections and bandwidth are expected to stay the same, as each node will store 25% less Provider Records, but they will have to respond to +25% requests for each of them, as they are stored as 25% less peers. Overall, the number of requests stays the same for the content.
There will be a small bandwidth reduction for the publish operation (25%) as the content is published x15 and not x20, but I don't think that the publish operation is a big share of all requests.
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
Co-authored-by: Guillaume Michel - guissou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just excellent work @cortze! I've made quite a few suggested edits - please commit directly if you agree. Conclusions are very informative.
One important point that would be great to address is adding a TL;DR, or "Summary of Findings/Results" at the top of the report, probably just before the "Methodology" section. This would roughly be a one-sentence summary of every paragraph in the conclusions section. The report is rather long, so a reader would have to spend a lot of time reading and understanding before getting to the results. Some people might not even be interested in the details and just want to the "Take Home" message. Please address this in the next iteration.
|
||
The steady in-degree ratio measured for k=20 over +80 hours showed that the initial closest peers keep being the closest ones for more than 48 hours. This measurement dismisses any existing doubt about the healthiness of any existing _PR_, and it opens the possibility of decreasing the overhead of the network by increasing the _PR republish interval_. | ||
|
||
In a currently over-saturated network, where running a DHT Server is way more CPU and bandwidth consuming than a single DHT Client, any window of improvement has to be taken. Although reducing the _K_ value to 15 would imply a 25% overhead reduction, it implies a performance risk that should be considered more carefully. However, increasing the _PR republish interval_ seems a far more reasonable action to reduce the overhead without interfering with the performance and reliability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say reduction in PR-related processes (connection, bandwidth, CPU, storage related to sending and storing PRs).
|
||
In Figure 13, we perceive the abrupt drop at hour 24 that was expected and that we previously introduced. As with the activity of the peers, during those first 24 hours, we can find certain stability with the lower Q1 quartile set at 12 peers sharing the _PRs_. The disposition of the outliers also shows that none of the _CIDs_ reached a point where _PRs_ weren't retrievable. | ||
|
||
This last statement is a bit trivial. Although we can assume that if the _PR Holders_ keep the records over the 24 hours the _CIDs_ are reachable, adversary events on the peer-to-peer network, like high sudden node churn, could leave this node isolated from the rest. The bigger impact of this isolation is, in fact, that a given _PR Holder_ could not be included in the rest of the peers' routing table. Therefore, no one would reach out to that isolated peer asking for the _CID_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the phrasing can be softened here to include the cases when that would be possible (i.e., severe network fragmentation, or similar).
![img](../implementations/rfm-17-provider-record-liveness/plots/kcomparison/active_total_pr_holders.png) | ||
<p style="text-align: center;">Figure 22. Comparison of the active PR Holders' for the different K values (median on the left, average on the right)</p> | ||
|
||
This pattern gets even more evident when displaying the percentage of the active _PR Holders_ (see Figure 23). Here we can clearly see that the difference between K=15 and K=40 's median is in order of a 5% of more active peers when we increase the _K_ value to 40 peers. In the graph displaying the averages, we can distinguish with a higher resolution the initial drop (first 10 hours) and the following catch-up (hours 20 to 25) previously mentioned. The spotted pattern has been observed over all the different K values and we address it to a specific set of users in the same time zone that disconnect over a set of hours in a daily period (it could be users shutting down their PCs during the night). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for active PR Holders, so it's intuitive AFAIU. The more peers you store the record with, the more peers will be active (and hold/provide it) after a period of time.
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Co-authored-by: Yiannis Psaras <[email protected]>
Thank you very much @yiannisbot and @guillaumemichel for the support and the feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
This is the first draft of the report for RFM17.
All the feedback is more than welcome, so please, go ahead and leave some comments. 😄
I still have to merge a few branches of the hoarder, so I didn't add the GitHub submodule yet (please be patient 😅, it will come soon).