Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to raw multihashes in datastores #6815

Closed
9 of 12 tasks
Stebalien opened this issue Jan 7, 2020 · 13 comments
Closed
9 of 12 tasks

Switch to raw multihashes in datastores #6815

Stebalien opened this issue Jan 7, 2020 · 13 comments
Assignees
Labels
Milestone

Comments

@Stebalien
Copy link
Member

Stebalien commented Jan 7, 2020

Part of #4143.

Currently, we store blocks by CID in the datastore. However, a single block can have multiple CIDs:

  • CIDv0/CIDv1 -- blocks created as cidv0 can be referenced with v1 CIDs
  • CIDv1-*/CIDv1-raw -- we can treat any block as a "raw" block
  • CIDv1-cbor/CIDv1-dag-cbor -- theoretically, we have non-"dag" versions of our codecs.

Plan:

TODO: put this in the right spot: ipfs/fs-repo-migrations#95


2021-12-02 context on why were' doing this:

  • Helps us to move towards CIDv1 by default
  • Deduplicates data that’s stored multiple times (e.g. as CIDv0, CIDv1-DAG-PB, CIDv1-Raw)
  • Removes the technical debt and tribal knowledge around the different interfaces used by go-ipfs and anything newer (lotus, venus, estuary, …)
    • Unlocks bigger refactors around blockstores going forward
  • Enables any pinning services backed by go-ipfs to serve content for unknown IPLD codecs
    • e.g. all those CAR files filled with Bitcoin blockchain data could be stored by services that don’t have the Bitcoin codec
    • Allows for greater experimentation from groups making their own codecs even before they land their codec in go-ipfs by default
@Stebalien Stebalien added kind/feature A new feature epic labels Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-ds-help that referenced this issue Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-ds-help that referenced this issue Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-blockstore that referenced this issue Jan 7, 2020
Stebalien added a commit that referenced this issue Jan 7, 2020
Stebalien added a commit that referenced this issue Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-blockstore that referenced this issue Jan 7, 2020
@Stebalien
Copy link
Member Author

@lidel if you find yourself with some time, it would be really nice to get this in before 0.5.0. We're already doing one migration, so doing a second isn't a big deal.

@Stebalien
Copy link
Member Author

The final pieces here are to:

  1. Write the actual migration: https://github.com/ipfs/fs-repo-migrations
  2. Take this the rest of the way (write a migration test, write a sharness test to make sure we don't end up storing blocks twice, etc.).
  3. Go over everything I've done to make sure it makes sense. This was thrown together rather quickly.

Stebalien added a commit to ipfs/go-ipfs-ds-help that referenced this issue Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-blockstore that referenced this issue Jan 7, 2020
Stebalien added a commit to ipfs/go-ipfs-blockstore that referenced this issue Jan 7, 2020
Stebalien added a commit that referenced this issue Jan 7, 2020
Stebalien added a commit that referenced this issue Jan 7, 2020
@ianopolous
Copy link
Member

Just a note that this affects us at Peergos as we have just implemented our own blockstore which needs to be compatible with IPFS's. I see your logic and it shouldn't be a problem for us.

@ianopolous
Copy link
Member

One small comment is that GC can significantly benefit in speed by knowing that a block is raw and hence has no links in it - i.e. it doesn't need to retrieve the block at all to follow the links. This benefit is lost in this model. This affects remote blockstores like S3 a lot, especially as the genuinely raw blocks tend to be much larger than cbor blocks.

Although I might be wrong because you are still theoretically starting from a GC root which is a cid, so I think that means you don't lose the benefit I just stated?

@Stebalien
Copy link
Member Author

Thanks for the heads up.

Although I might be wrong because you are still theoretically starting from a GC root which is a cid, so I think that means you don't lose the benefit I just stated?

Correct. GC already implements this optimization, as far as I know. It does two things.

  1. It'll start at the typed root (CID) and walk the graph. If it encounters a "raw" node, it won't bother fetching it.
  2. It'll list all blocks in the datastore (but won't fetch them).
  3. It'll delete everything from step 2 not found in step 1.

@Stebalien
Copy link
Member Author

Update: @lidel is busy.

@hsanjuan how would you feel about picking this up? The main pieces left are in #6815 (comment).

@Stebalien
Copy link
Member Author

Note: #6817 should make fixing the tests easier.

Stebalien added a commit that referenced this issue Jan 8, 2020
@danimesq
Copy link

@Stebalien , could IPFS do inspiration in git for data deduplication?

@Stebalien
Copy link
Member Author

Stebalien commented Jan 13, 2020 via email

@hsanjuan
Copy link
Contributor

Sorry @Stebalien, this only came now to my attention. I would not block release for it or make it depend on me, as I have very little availability in the next two weeks. That said, I could work on it (and would like to), but we are looking at the end of january...

@Stebalien
Copy link
Member Author

That's fine. We'll see if someone has time before then but that's unlikely.

@hsanjuan hsanjuan self-assigned this Jan 27, 2020
Stebalien added a commit to ipfs/go-ipfs-blockstore that referenced this issue Feb 2, 2020
@BigLep BigLep modified the milestones: go-ipfs 0.11, go-ipfs 0.12 Aug 14, 2021
aschmahmann pushed a commit that referenced this issue Dec 2, 2021
@guseggert guseggert mentioned this issue Dec 7, 2021
80 tasks
@BigLep BigLep closed this as completed Dec 14, 2021
@BigLep BigLep moved this to Done in IPFS Shipyard Team Mar 2, 2022
Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this issue Mar 23, 2023
Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this issue Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants