Reorangized and implementation of a basic dag store #2

travisperson · 2015-06-04T06:34:34Z

Quick reorganization.

Implementation of a simple dag backed store.

I'm not a huge fan of the way this is currently setup, so this PR Is a work-in-process. A few things I want to change:

See if there is a way to wrap streams in bytes.
Right now we have to buffer all the data before we can add the dag. I think it would be really neat to have a stream that is wrapped in bytes.

Name spaced keys
The cool thing about the dag is that "leafs" (or in this key values) are still dags which can contain links. So we can have a key that is foo, which resolves to some value, as well as a key foo/bar, which would resolve to something different. foo would be the root hash of bar.

Note: Right now I store the last root hash in the data field of the "root", this is mostly because if you try to remove a link from a dag with a single link (a dag with zero links and no data) ipfs does not like it. Also it's kind of cool because it provides history.

Quick reorganization. Implementation of a simple dag backed store.

jbenet · 2015-06-04T07:05:22Z

lib/dag.js

+var stream = require('stream')
+var multihash = require('multihashes')
+var base58 = require('base58-native')
+var merkledag = require('node-ipfs-mdag')


where is this module coming from? symlink to node-ipfs or something?

Not suppose to be there, I was using it, and might go back to it, but for now it wasn't doing what I wanted. I guess I forgot to delete it.

travisperson · 2015-06-04T07:57:01Z

Wrapped streams might work with: https://github.com/nfroidure/StreamQueue.

Idea is to create a front stream which basically contains:

{
    Links: [{
              Name: "",
              Hash: "",
              Size: 0,
    }...],
    Data: "

The actually data stream would come in the middle here, and then the tailing stream

"
}

I have no idea how well this will work with JSON, but it should work really nicely with protobufs.

whyrusleeping · 2015-06-04T14:54:48Z

The problem with using protobufs for that, is that nested fields are always length delimited, you could do repeated bytes as the type, which would allow you to continue sending bytes as much as you want, but then protobuf doesnt have a way to signal that youre done sending things.

travisperson · 2015-06-04T15:12:45Z

Ya, after thinking it for a bit I kind of decided it's not really a great idea, and it's not hard to buffer something small like this either way.

hackergrrl · 2015-09-18T17:25:05Z

lib/dag.js

+    if (typeof opts === 'string') opts = {key: opts}
+
+    var bufferStream = new stream.PassThrough()
+    node.object.data((opts.root || root_hash) + '/' + opts.key, function (err, stream) {


You want node.object.get here, right? If the write stream stores a JSON object, the read stream should probably output one as well.

daviddias · 2015-09-18T21:37:27Z

@travisperson can I get some enlightenment on how you structured this? Namely, is the code in the PR running and passing some of the tests? I'm trying to but unsuccessfully:

TAP version 13
# piping a blob into a blob write stream
ok 1 no setup err
/Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47
          node.object.stat(res.Hash, function(err, stat) {
                      ^

TypeError: node.object.stat is not a function
    at /Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47:23

And btw, there is a bunch of codestyle errors, being the use of blob as global variable being the most concerning, not sure if it is intended

lib/dag.js|39 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|40 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|43 col 47 warning| "blob" is not defined. (no-undef)

Also, what is the purpose of block.js? Some early experiment? Is it still needed?

travisperson · 2015-09-18T21:55:54Z

What version of ipfs-api is actually installed? It would appear that it's missing the stat method on object. If you npm install [email protected] you can verify that it's there.

As far the blob, it's definitely not suppose to be global, just missing a var statement.

daviddias · 2015-09-18T22:01:32Z

I had 1.2.0 (installed following the package.json semver)

$ head node_modules/ipfs-api/package.json                                                                
{
  "name": "ipfs-api",
  "version": "1.2.0",

But now updated to 2.3.2 and all tests pass:

1..45
# tests 45
# pass  45

# ok

This is a good sign, right? :)

daviddias · 2015-09-18T22:12:16Z

What about block.js? Doesn't look like it is used for anything

travisperson · 2015-09-18T22:19:31Z

See: https://github.com/ipfs/ipfs-blob-store/blob/feat/dag-store/index.js#L14

They are two different implementations. This isn't a fully featured blob store. It was a quick hack together to get something working. I have a third that I hacked together a while back that implements a blob store using unixfs via a patch system, similar to @whyrusleeping patch/update code (whatever it is called now).

daviddias · 2015-09-18T22:26:11Z

I'm confused by what is called a 'fully featured blob store', when both of these are passing 100% of the tests. I was expecting that anything that implements abstract-blob-store interface and passes the tests would mean that is 'fully' compatible.

Would this be 'enough' to unblock ipfs/notes#2 ? If not, what is missing?

Also, if we have 3 implementations, can we list pros and cons? I guess that one of them is the fact that dag and block use the HTTP API, while the third one, not present in the repo, would use system calls and let IPFS handle the syncing.

Thanks :)

travisperson · 2015-09-18T22:58:45Z

They are not passing 100% though are they? I haven't looked at this code in a while. Both the block and the dag implement the API, but I'm pretty sure if you run the tests, block will not run them all. I do not fully understand tape though. But it appears that it's not running through all tests. Block always returns false on a remove. Which to me would say that it would not be passing all tests.

Each of these have pitfalls, specifically the dag implementation only stores links to blobs on the root dag node, leading to an absolutely massive root object. The block implementation is simply storing basic block objects. It doesn't do anything fancy and since it's just a raw block it has an upper limit on it's size (if imposed by the backing daemon/api).

The third implementation is almost a pure unixfs implementation (I'd have to double check that), but in the blob store itself (does self patching of objects). Instead of sending objects to a daemon to patch and update, it handles that it self. Since it's a unixfs object, you can mount and traverse the blob store.

It's kind of in my head and I have a few ideas, tonight I can push the code up (as it's on my laptop and I don't have that with me at the moment). I'll be free in about 2 hours.

jbenet · 2015-09-18T23:25:02Z

ok product of this etherpad: https://etherpad.mozilla.org/97sGEBwwkH

// what we want is the "dag store" in ipfs-blob-store
// to treat the "keys" it receives as a merkledag path,
// and more specifically, as a unixfs path. so something
// like this:
var dagStore = require('ipfs-blob-store').dagStore({ ... })

var ws = dagStore.createWriteStream("foo/bar/baz")
ws.write("hello")
ws.write(null)

// so the above should make a total of 4 objects:
// [ root ] ---foo--> [ A ] ---bar--> [ B ] ---baz-->  [  "hello" ]

// so that we can do it again:

var ws = dagStore.createWriteStream("foo/bar/quux")
ws.write("world")
ws.write(null)

// and only _adds one new object, updating the link in the third
// object, and bubbling the changes up. And it's all nicely 
// sharded as a filesystem:
// [ root2 ] ---foo--> [ A2 ] ---bar--> [ B2 ] ---baz-->  [ "hello" ]
//                                                            \--quux->  [ "world" ]

// ((NOTE: though of course, the change bubbles up the merkle
// dag, so technically, four objects are created. typical merkle
// dag update semantics. this means that the "dagStore" thing
// has to always keep the latest _root_.))

// all of this can be done with "ipfs object patch" in a concurrency
// safe manner.
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ 
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ 


--------

// btw, the way it is now, it makes only TWO objects:
// [ root ] ---foo/bar/baz---> [ "hello" ]

// addint a link with name "foo/bar/baz" which does not scale at all,
// because the root object would get __enormous__.

travisperson · 2015-09-18T23:38:53Z

So while using ipfs object patch is awesome, it's not realistic. You can only ever have a single mutation in flight since you must have the resulting object before you may perform the next operations. This is why I was performing the patch on the library side. I could keep track of mutations down the directory tree and only ever update a directory once all operations bubbled back up. (this is in v3, the one I worked on with you in Seattle).

jbenet · 2015-09-18T23:44:35Z

@travisperson wait so this does handle mutations in a full dag? i thought this only has one massive root object? (i don't see the splitting on a "/", etc)

jbenet · 2015-09-18T23:45:30Z

btw, in shell i would do this:

// keep root, initialized to:
root=$(ipfs object new unix-fs)
blobHash=(cat $data | ipfs add -q | tail -n1)
root=$(ipfs object patch $root add-link $dataPath $blobHash)

one thing i'm not sure on is whether ipfs object patch can take paths for $dataPath and bubble the changes up. we may need to look deeper into it and pull out the object we want. (like walk the dag component by component. this is what the "mfs/files" tool is supposed to do, but it hasn't been finalized or merged in go-ipfs. so may need to walk it manually if ipfs-object-patch doesn't do this right now)

jbenet · 2015-09-18T23:53:44Z

update: apparently it does work with ipfs object patch now, just need to use --create.

@travisperson is right though that this can only have one update in flight at a time.

travisperson · 2015-09-18T23:54:04Z

@travisperson wait so this does handle mutations in a dag? i thought this only has one massive root object?

The current dag implementation here has that pitfall, the other implementation I mentioned (never got around to pushing it sorry), does not have this issue. It takes the resulting hash of a blob and uses it to fan out over the directory structure. So a bit different than what you describe above, where it would be on the user to properly distribute their keys. Which is totally fine, and makes it so the user can specify keys easier.

jbenet · 2015-09-18T23:58:47Z

(the best solution is to implement the mfs/files tool, which would allow concurrent access, ordered by the daemon itself)

jbenet · 2015-09-19T00:00:25Z

As i see it we have four options:

write the new implementation described by @travisperson above.
use ipfs object patch --create (limitation: only one write in flight at a time)
write a proper mfs/files tool in node
finish and merge mfs/files tool in go (and expose them through the API too)

travisperson · 2015-09-19T00:12:20Z

I'd say 1 and 3 are one and the same in a way. My 1 is basically mfs, or at least it would be very close to it if implemented properly.

daviddias · 2015-11-13T00:40:27Z

UPDATE

Now we have a more robust js-ipfs-api and a working version of ipfs-blob-store with mfs, which you can find in this PR: #5

Reorangized and implementation of a basic dag store

204b920

Quick reorganization. Implementation of a simple dag backed store.

jbenet mentioned this pull request Jun 4, 2015

immutable or mutable? #3

Closed

jbenet reviewed Jun 4, 2015
View reviewed changes

travisperson added 2 commits June 7, 2015 13:22

Removed unused module

dd96085

Valid package.json

a87a223

hackergrrl reviewed Sep 18, 2015
View reviewed changes

daviddias added 4 commits September 18, 2015 23:06

codestyle and tests pass, woot!

1b4826a

add gitignore

38d572e

add travis

159eb1b

badgers and typos

7bee9a2

daviddias mentioned this pull request Sep 21, 2015

Sprint Sep 14 ipfs/team-mgmt#32

Closed

51 tasks

daviddias merged commit 7bee9a2 into master Nov 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorangized and implementation of a basic dag store #2

Reorangized and implementation of a basic dag store #2

travisperson commented Jun 4, 2015

jbenet Jun 4, 2015

travisperson Jun 4, 2015

travisperson commented Jun 4, 2015

whyrusleeping commented Jun 4, 2015

travisperson commented Jun 4, 2015

hackergrrl Sep 18, 2015

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

daviddias commented Sep 18, 2015

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 19, 2015

travisperson commented Sep 19, 2015

daviddias commented Nov 13, 2015

Reorangized and implementation of a basic dag store #2

Reorangized and implementation of a basic dag store #2

Conversation

travisperson commented Jun 4, 2015

jbenet Jun 4, 2015

Choose a reason for hiding this comment

travisperson Jun 4, 2015

Choose a reason for hiding this comment

travisperson commented Jun 4, 2015

whyrusleeping commented Jun 4, 2015

travisperson commented Jun 4, 2015

hackergrrl Sep 18, 2015

Choose a reason for hiding this comment

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

daviddias commented Sep 18, 2015

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

daviddias commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 18, 2015

travisperson commented Sep 18, 2015

jbenet commented Sep 18, 2015

jbenet commented Sep 19, 2015

travisperson commented Sep 19, 2015

daviddias commented Nov 13, 2015