-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorangized and implementation of a basic dag store #2
Conversation
Quick reorganization. Implementation of a simple dag backed store.
var stream = require('stream') | ||
var multihash = require('multihashes') | ||
var base58 = require('base58-native') | ||
var merkledag = require('node-ipfs-mdag') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this module coming from? symlink to node-ipfs
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suppose to be there, I was using it, and might go back to it, but for now it wasn't doing what I wanted. I guess I forgot to delete it.
Wrapped streams might work with: https://github.com/nfroidure/StreamQueue. Idea is to create a front stream which basically contains:
The actually data stream would come in the middle here, and then the tailing stream
I have no idea how well this will work with JSON, but it should work really nicely with protobufs. |
The problem with using protobufs for that, is that nested fields are always length delimited, you could do |
Ya, after thinking it for a bit I kind of decided it's not really a great idea, and it's not hard to buffer something small like this either way. |
if (typeof opts === 'string') opts = {key: opts} | ||
|
||
var bufferStream = new stream.PassThrough() | ||
node.object.data((opts.root || root_hash) + '/' + opts.key, function (err, stream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want node.object.get
here, right? If the write stream stores a JSON object, the read stream should probably output one as well.
@travisperson can I get some enlightenment on how you structured this? Namely, is the code in the PR running and passing some of the tests? I'm trying to but unsuccessfully: TAP version 13
# piping a blob into a blob write stream
ok 1 no setup err
/Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47
node.object.stat(res.Hash, function(err, stat) {
^
TypeError: node.object.stat is not a function
at /Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47:23 And btw, there is a bunch of codestyle errors, being the use of blob as global variable being the most concerning, not sure if it is intended lib/dag.js|39 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|40 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|43 col 47 warning| "blob" is not defined. (no-undef) Also, what is the purpose of |
What version of As far the |
I had 1.2.0 (installed following the package.json semver)
But now updated to 2.3.2 and all tests pass:
This is a good sign, right? :) |
What about block.js? Doesn't look like it is used for anything |
See: https://github.com/ipfs/ipfs-blob-store/blob/feat/dag-store/index.js#L14 They are two different implementations. This isn't a fully featured blob store. It was a quick hack together to get something working. I have a third that I hacked together a while back that implements a blob store using unixfs via a patch system, similar to @whyrusleeping patch/update code (whatever it is called now). |
I'm confused by what is called a 'fully featured blob store', when both of these are passing 100% of the tests. I was expecting that anything that implements abstract-blob-store interface and passes the tests would mean that is 'fully' compatible. Would this be 'enough' to unblock ipfs/notes#2 ? If not, what is missing? Also, if we have 3 implementations, can we list pros and cons? I guess that one of them is the fact that dag and block use the HTTP API, while the third one, not present in the repo, would use system calls and let IPFS handle the syncing. Thanks :) |
They are not passing 100% though are they? I haven't looked at this code in a while. Both the block and the dag implement the API, but I'm pretty sure if you run the tests, block will not run them all. I do not fully understand tape though. But it appears that it's not running through all tests. Block always returns false on a remove. Which to me would say that it would not be passing all tests. Each of these have pitfalls, specifically the dag implementation only stores links to blobs on the root dag node, leading to an absolutely massive root object. The block implementation is simply storing basic block objects. It doesn't do anything fancy and since it's just a raw block it has an upper limit on it's size (if imposed by the backing daemon/api). The third implementation is almost a pure unixfs implementation (I'd have to double check that), but in the blob store itself (does self patching of objects). Instead of sending objects to a daemon to patch and update, it handles that it self. Since it's a unixfs object, you can mount and traverse the blob store. It's kind of in my head and I have a few ideas, tonight I can push the code up (as it's on my laptop and I don't have that with me at the moment). I'll be free in about 2 hours. |
ok product of this etherpad: https://etherpad.mozilla.org/97sGEBwwkH // what we want is the "dag store" in ipfs-blob-store
// to treat the "keys" it receives as a merkledag path,
// and more specifically, as a unixfs path. so something
// like this:
var dagStore = require('ipfs-blob-store').dagStore({ ... })
var ws = dagStore.createWriteStream("foo/bar/baz")
ws.write("hello")
ws.write(null)
// so the above should make a total of 4 objects:
// [ root ] ---foo--> [ A ] ---bar--> [ B ] ---baz--> [ "hello" ]
// so that we can do it again:
var ws = dagStore.createWriteStream("foo/bar/quux")
ws.write("world")
ws.write(null)
// and only _adds one new object, updating the link in the third
// object, and bubbling the changes up. And it's all nicely
// sharded as a filesystem:
// [ root2 ] ---foo--> [ A2 ] ---bar--> [ B2 ] ---baz--> [ "hello" ]
// \--quux-> [ "world" ]
// ((NOTE: though of course, the change bubbles up the merkle
// dag, so technically, four objects are created. typical merkle
// dag update semantics. this means that the "dagStore" thing
// has to always keep the latest _root_.))
// all of this can be done with "ipfs object patch" in a concurrency
// safe manner.
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/
--------
// btw, the way it is now, it makes only TWO objects:
// [ root ] ---foo/bar/baz---> [ "hello" ]
// addint a link with name "foo/bar/baz" which does not scale at all,
// because the root object would get __enormous__. |
So while using |
@travisperson wait so this does handle mutations in a full dag? i thought this only has one massive root object? (i don't see the splitting on a "/", etc) |
btw, in shell i would do this: // keep root, initialized to:
root=$(ipfs object new unix-fs)
blobHash=(cat $data | ipfs add -q | tail -n1)
root=$(ipfs object patch $root add-link $dataPath $blobHash) one thing i'm not sure on is whether ipfs object patch can take paths for $dataPath and bubble the changes up. we may need to look deeper into it and pull out the object we want. (like walk the dag component by component. this is what the "mfs/files" tool is supposed to do, but it hasn't been finalized or merged in go-ipfs. so may need to walk it manually if ipfs-object-patch doesn't do this right now) |
update: apparently it does work with ipfs object patch now, just need to use @travisperson is right though that this can only have one update in flight at a time. |
The current dag implementation here has that pitfall, the other implementation I mentioned (never got around to pushing it sorry), does not have this issue. It takes the resulting hash of a blob and uses it to fan out over the directory structure. So a bit different than what you describe above, where it would be on the user to properly distribute their keys. Which is totally fine, and makes it so the user can specify keys easier. |
(the best solution is to implement the mfs/files tool, which would allow concurrent access, ordered by the daemon itself) |
As i see it we have four options:
|
I'd say 1 and 3 are one and the same in a way. My 1 is basically mfs, or at least it would be very close to it if implemented properly. |
UPDATE Now we have a more robust js-ipfs-api and a working version of ipfs-blob-store with mfs, which you can find in this PR: #5 |
Quick reorganization.
Implementation of a simple dag backed store.
I'm not a huge fan of the way this is currently setup, so this PR Is a work-in-process. A few things I want to change:
See if there is a way to wrap streams in bytes.
Right now we have to buffer all the data before we can add the dag. I think it would be really neat to have a stream that is wrapped in bytes.
Name spaced keys
The cool thing about the dag is that "leafs" (or in this key values) are still dags which can contain links. So we can have a key that is
foo
, which resolves to some value, as well as a keyfoo/bar
, which would resolve to something different.foo
would be the root hash ofbar
.Note: Right now I store the last root hash in the data field of the "root", this is mostly because if you try to remove a link from a dag with a single link (a dag with zero links and no data) ipfs does not like it. Also it's kind of cool because it provides history.