-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0122] IPFS CID optionally on narinfo in binary caches #122
Conversation
Signed-off-by: lucasew <[email protected]>
We should be able to use the existing CA field for this. That has many other benefits, too. That is what we did in our IPFS Nix work. |
|
||
IPFS is still not a present reality on the mainstream Nix ecosystem, altough it's not reliable to store long term data, it can reduce bandwith costs for both the servers and the clients but the question is where the NAR file could be obtained in IPFS. | ||
|
||
Its not espected that, for example, cache.nixos.org would run a IPFS daemon for seeding but it could just calculate the hash using `ipfs add -nq $file` and provide it on the narinfo so other nodes can figure out alternative places to download the NAR files, even closer than a CDN could be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One little concern is that a given file doesn't have exactly one CID. Depending on how you chunk the file you can get effectively unlimited different CIDs. This isn't a problem when the CID distributor starts the seed and the CID stays live on the network because whatever CID is advertised will be fetched. However for the case like this is matters a lot, because different settings will result in a would-be seeder generating the wrong CID.
IIUC the current default for ipfs add
is fixed-size blocks of 262144B each (aka size-262144
). However for a nixpkgs cache where subsequent versions of a derivation may be largely similar it may make more sense to do a smarter chunker based on a rolling hash.
Anyways, the exact chunking mechanism is bikeshedding, but what do we want to do about this? I see a few main options.
- Put the chunker into the narinfo so it can be reproduced. (I don't know if there is a well defined standard format but current go-ipfs uses strings like
size-262144
andrabin-2048-65536-131072
which are pretty easy to understand and unlikely to be ambiguous.) - Declare a chunker upfront and expect people to use it. (We can revert to 1 in the future by adding the chunker information later).
- Convince cache.nixos.org to also run an IPFS node that advertises the CIDs that are advertised in the narinfo files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rsync has a pretty interesting algorithm for syncing files https://stackoverflow.com/questions/1535017/rolling-checksums-in-the-rsync-algorithm , there maybe something in that, However probably not directly portable to IPFS and chunking.
I'd vote for 3! and get that working today (or perhaps tomorrow) and think about options 1/2 for the day after tomorrow (or some point in the future).
Thanks for your detailed analysis of this, my understanding of Nars on IPFS has increased!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically equivalent to the Rabin chunking. But the biggest problem isn't what algorithm to use but how to know what algorithm was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this we could do like how we already do with hashes, like sha256:something
AFAIK ipfs has symbol friendly names for the chunking methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other possibilities of chunking with casync: https://discourse.nixos.org/t/nix-casync-a-more-efficient-way-to-store-and-substitute-nix-store-paths/16539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't care about the chunking algorithm. Please stop discussing this here.
What I care about is that we record the chunking algorithm in a way that someone who wishes to advertise this path can do so.
This RFC is now open for shepherd nominations! |
I supposed I could shepherd this, but really I want and should soon be able to write a counter-proposal RFC for the work we did in 2020. So perhaps there ought to be one shepherd team for two "competing" RFCs (though it's really more prioritizing features than actually disagreement). |
I'll volunteer as shepherd. (note from RFCSC: need a few more nominations in the next few weeks, otherwise this will be put on standby) |
I am now thinking this is probably fine as a complement. We did a lot of different things in our 2020 IPFS × Nix saga, but thing thing I would like to focus on first is distributing and archiving source code. Conversely, this mainly about build artifact. Thus, no conflict! I am confident the two approaches will bore the "tunnel" from both ends, and so there will be a grand meeting in the middle eventually. The one thing I would do is generalize so instead of thinking of IPFS in particular, we think "narinfo" ( |
You might take a look at NixOS/nix#3727, which I locally fixed conflicts with. (Tests however, are broken. Still debugging, so didn't push yet.) That goes a few steps in trying to put the narinfos in IPFS as IPLD rather than files too, but this should be complementary:
If we do that we can also share lots of code between both approaches:
|
I nominate myself as a shepherd |
Looks like we have the required number! :) |
#nix-rfc-122:matrix.org |
Any updates on the status of this RFC? |
We (or I) need to build a proof of concept. Maybe we will pivot this RFC to an LRU-based cache proxy approach at the beginning and iterate to a p2p approach if necessary, but I am without time to test it now, I am very busy because of the end of the semester. The plan is to apply that prototype to an organization to reduce internet usage with things people often need, so, that prototype should be working until the end of the year, or I definitely will not get my degree by the end of the year xD. |
Sounds good! On behalf of the Steering Committee, I'd like to suggest moving the RFC to draft status until then --- any objections? |
The idea is to provide the CID of the nar file from the binary cache optionally to allow reducing bandwidth costs and in some cases increase efficiency by allowing users to download the binary cache nar files over IPFS
Rendered
This RFC was abandoned by the author as their primary goal was saving upstream bandwidth in a controlled/very limited network with a lot of computers and simpler solutions using the existing binary cache infrastructure, like a local cache, were found.