-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ADL/single-node-view of a full unixFS file. #14
Conversation
EXCITING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty grand and I'm excited about high level interfaces like this emerging.
I wonder if we'll eventually want something like this generalized on datamodel.Node
. Perhaps a way to get an io.ReadSeeker
. (Whatever that would be: also in an optional way, of course, because we don't want to break existing code that implements that interface. Feature detection would be important.) Would love @mvdan's opinions on the long game there.
Despite mentioning generalizing, I also don't see a major reason we can't merge this sooner than later; we can worry about sussing out and finalizing a generalized form later.
file/file.go
Outdated
} | ||
|
||
func (s *shardNodeFile) Prototype() ipld.NodePrototype { | ||
return basicnode.Prototype__Bytes{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return something more self-descriptive.
What exactly "self-descriptive" means here, I don't know -- it might be something that has to be somewhat best-effort. Maybe it should peek at the root blocks of the thing and try to see if there's any sharding parameters, and return something with those if possible? If it can't infer all the parameters that were used creating this ADL the first time, still returning a best guess that's at least the same ADL but with default parameters is probably better than nothing.
The overall purpose of .Prototype()
is so that things like traversal.FocusedTransform (or the traversal.TransformFn that the user provides, if this is the leaf thing being replaced) can regenerate things correctly while making a copy-on-write update. So if it forgets that something is an ADL we get very different behavior in a way that's probably not desired!
file/file.go
Outdated
return 0, err | ||
} | ||
if pbl, ok := lnk.(dagpb.PBLink); ok { | ||
target, err := s.lsys.Load(ipld.LinkContext{Ctx: s.ctx}, pbl.Hash.Link(), basicnode.Prototype.Any) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably use a prototype that's specific to dag-pb structures, since we know it's that all the way down in unixfsv1? (Or possibly a separate prototype for bytes, sometimes, if the CID indicates raw mode? I'm not familiar with exactly how that part of the protocol works; maybe the dag-pb structures already account for that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaf nodes can potentially be bytes, depending on the type of file encoding. We should know off of the cid codec.
I wonder if there's a way to avoid having this type of logic every time the link system is used?
file/file.go
Outdated
return 0, err | ||
} | ||
|
||
asFSNode, err := NewUnixFSFile(s.ctx, target, s.lsys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting way of recursing. This will work correctly, but may cause long chains of Read
calls delegating to Read
calls delegating to Read
calls if it's a deeply recursive structure for a large file, right?
(I'd have no objection to merging that as a first pass anyway. It's probably dominated by IO costs, etc, and it would be something easy to revisit later without external API change if necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it only does it as the read reaches subsequent nodes that need to be accessed, so the tree is loaded as it's accessed.
I can probably do a bit better at deleting references to the multi-reader upon hitting EOF to help the garbage collector clean up better for big files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more worried about keeping the context associated to the New event with the node for it's whole lifetime. I wonder what the right way to manage that context lifetime is.
The code implementation should be complete now. remaining work is to find a test fixture to help validate the ADL over cidv0 non-raw leaf files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the interface API exposed here is good enough for the foreseeable future. We might want to expose this datamodel.Node interface superset in go-ipld-prime at some point, but we also don't need to.
file/file.go
Outdated
return newWrappedNode(substrate) | ||
} | ||
|
||
return &shardNodeFile{ctx, lsys, substrate, false, nil}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use keyed composite literals for readability? :)
file/file.go
Outdated
return nil, err | ||
} | ||
lli := links.ListIterator() | ||
if lli.Done() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just do if links.Length() == 0
, I think
file/file.go
Outdated
|
||
type singleNodeFile struct { | ||
ipld.Node | ||
ptr int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps offset
or read
, since this is not really a pointer
} | ||
|
||
func (f *singleNodeFile) Read(p []byte) (int, error) { | ||
buf, err := f.Node.AsBytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we worried about repeated calls to AsBytes? it won't be a problem for basicnode.Bytes, but I don't think AsBytes is guaranteed to be trivially expensive. Perhaps we could call it on the first Read call, and hold onto it in a field until we hit EOF.
file/wrapped.go
Outdated
// an empty degenerate one. | ||
return &singleNodeFile{ | ||
Node: basicnode.NewBytes(nil), | ||
ptr: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these can be omitted
file/shard.go
Outdated
if err != nil { | ||
return 0, err | ||
} | ||
if pbl, ok := lnk.(dagpb.PBLink); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, why do we use LookupByString("Links") above, and then a type assertion directly on a dagpb type here? seems like we should consistently be flexible to allow any IPLD node satisfying the same shape, or consistently assume dagpb types for simplicity/speed. Probably the former, to avoid unnecessary ties on go-codec-dagpb.
Relevant for @warpfork:
StreamableByteNode
interface for extending aKind_Bytes
ipld.Node to also be usable as anio.Reader
Remaining work:
NewUnixFSFile
to the top level Reify when files are encountered