Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Files API root as best-effort pin #2872

Merged
merged 2 commits into from
Jun 22, 2016

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Jun 18, 2016

This fixes #2697 and #2698.

if !set.Has(k) {
set.Add(k)
child, err := ds.Get(ctx, k)
if err == ErrNotFound {
Copy link
Member

@whyrusleeping whyrusleeping Jun 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do something like

switch err {
case ErrNotFound:
continue
default:
return err
case nil:
break
}

@kevina kevina changed the title Add Files API root as best-effort pin [WIP] Add Files API root as best-effort pin Jun 20, 2016
@kevina
Copy link
Contributor Author

kevina commented Jun 20, 2016

@whyrusleeping see my line note on your second comment. There are two versions for you to chose from.

Please do not merge this as is. I will squash this all down to one commit once we decide what works best.

@whyrusleeping
Copy link
Member

@kevina i think the latter option of passing the array of best effort root keys is the better of the two for now. My other thought was to enumerate the best effort set before calling gc, and pass that into the gc call as a 'extra keys that arent garbage' parameter. But i think that might be a little more difficult and uglier overall

Closes ipfs#2697.  Closes ipfs#2698.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@kevina kevina changed the title [WIP] Add Files API root as best-effort pin Add Files API root as best-effort pin Jun 20, 2016
@kevina
Copy link
Contributor Author

kevina commented Jun 20, 2016

@whyrusleeping, okay, ready to merge once you are happy with it.

test_expect_success "object not removed after gc" '
echo "hello world" | ipfs files write --create /hello.txt &&
ipfs repo gc &&
ipfs cat QmVib14uvPnCP73XaCDpwugRuwfTsVbGyWbatHAmLSdZUS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add another test_expect_success to make sure that /hello.txt is still accessible through the files api here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... if this is "best effort", under what conditions should gc remove these, and can we make sure to test that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All "best effort" means is not to abort if a child is unreadable. For the GC to remove the object will need to be detached somehow from the MFS Root.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina want to take this one on and add a few more tests?

  • Test that the read test below outputs the right text
  • Exercise a situation that uses the best effort pin:
    • Add a directory containing a file with --pin=false
    • pin the hash directly
    • run a gc so that file is no longer accessible
    • ipfs files cp <dirhash> /somewhere (this should work just fine)
    • ipfs repo gc should succeed
    • re-add the file that was originally in that directory with --pin=false
    • run ipfs repo gc again, that file shouldnt be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Will do.

Copy link
Contributor Author

@kevina kevina Aug 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina want to take this one on and add a few more tests?

@whyrusleeping See #3151

@whyrusleeping whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016
@whyrusleeping
Copy link
Member

This LGTM. @Kubuxu wanna take a look over too?

@Kubuxu
Copy link
Member

Kubuxu commented Jun 21, 2016

SGTM

@whyrusleeping whyrusleeping merged commit 472deb0 into ipfs:master Jun 22, 2016
@kevina kevina deleted the best-effort-pins branch June 22, 2016 05:38
@kevina kevina mentioned this pull request Aug 20, 2016
@@ -69,7 +70,7 @@ func GC(ctx context.Context, bs bstore.GCBlockstore, pn pin.Pinner) (<-chan key.
return output, nil
}

func Descendants(ctx context.Context, ds dag.DAGService, set key.KeySet, roots []key.Key) error {
func Descendants(ctx context.Context, ds dag.DAGService, set key.KeySet, roots []key.Key, bestEffort bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i strongly dislike how this is implemented. Adding bestEffort here to this function ruins the whole abstraction. You can implement this "best effort" notion outside of this function, just in the GC function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbenet in this function (Descendants) and EnumerateChildren all that bestEffort means is to not abort if a child is unreadable. In order to collect the set of keys to keep on a "best effort" bases I needed a way to to collected all the children of the initial best effort set that are local on the node.

I am not entirely clear what abstraction I am ruining so I am not 100% sure how to fix this. I could create a new function to find the Descendants of a keySet that doesn't abort on an error, but to me it seamed better to extend the functionally of the existing function.

@whyrusleeping I could appreciate any insights you can offer.

@jbenet
Copy link
Member

jbenet commented Aug 27, 2016

  • The idea of "best effort" pins is very good and we already want to incorporate it into the pinning model.
  • The idea of making the files api "best effort" makes sense. it solves the problems we had with making it proper full pins.
  • That said, the way this PR is implemented is not great. It muddles the abstractions, threading in a "best effort" flag into things, instead of just treating that set of roots as just another set to mark and consider.
  • It's also not clear when GC will actually remove these "best effort" pins.
  • This is not good quality, and I'm sad this merged as is.
  • To be clear, the idea is very good, and we SHOULD have such an abstraction. It should be implemented better though.

@kevina
Copy link
Contributor Author

kevina commented Aug 28, 2016

So to be clear all "best effort" means (as agreed to by @whyrusleeping and @Kubuxu) is to simply keep all children of the best effort pin that are reachable from the local node. (In other words if a child in unavailable simply skip it, rather than aborting or attempting to download it). Nothing More. The GC will never remove "best effort" pins.

@jbenet jbenet mentioned this pull request Aug 28, 2016
58 tasks
@jbenet
Copy link
Member

jbenet commented Aug 29, 2016

(Thinking out loud)

Oh wow, that means something totally different, then. "best effort" in computer science tends to mean "up to best abilities, but not unreliable". Specifically, "best effort" in networks. Applied to pins, i took it to mean "pin a subgraph while you can, but GC if you need to", sort of like a "gc these last" type of thing.

What you describe ("pin the graph objects currently present in the node")

It's a bit concerning, too because it has unpredictable guarantees and consequences, in a very different way than "unreliable (best effort)"-- because how much is pinned forever depends entirely on order of events, which is very hard to reason about.

  • What happens if more objects in that graph are downloaded, as a result of other operations? are they suddenly pinned too? forever? and why forever?
  • It seems to me this is trying to approximate some use case that is better served by other mechanisms. For example, i might "pin local objects" that are part of a VM i want to boot, leaving the rest of the objects to be fetched later and gc-ed differently (i.e. i want to pin only the critical parts). Or similarly, I might pin the "index sub graph" and first few objects in a video, so i can always start playing it quickly and seek quickly, but let the retrieval of the big data blocks to when they're needed.
  • Though in these use cases, i dont actually want the mechanism you describe, i want a different pin which can be configured to pin a chosen subgraph (i.e. what i really want is a pin with an associated IPLD selector).

As it is now, I think your "objects that are currently local" pins can be turned into a other tools, and use regular pin sets:

  • first, create a tool that list all local refs for a graph. eg. ipfs refs local [-r] <root> or ipfs refs [-r] --local <root> which would list out all refs from <root> available locally. Can pin those directly.
  • Potentially create a recursive pin with a depth parameter (again, covered by IPLD selectors later), so all you need to do is create one object that points to all the local refs you listed, and pin that with the appropriate depth.

Thinking about it another way, let's rename "best effort" into "incremental". Because that's what it really is, right? it's an incremental pin that over time will pin more and more of a subgraph, as it's accessed and retrieved. This has different use cases than i mentioned above (eg "incrementally pin" all my local maps' area in a google maps like thing, or "incrementally pin" a dataset i'm exploring, or a dungeon in a dungeon crawler). Is this more what you mean?

Would be very useful to me to hear what use cases you have in mind for this.

@kevina
Copy link
Contributor Author

kevina commented Aug 29, 2016

@jbenet, please see #2698 for context. @whyrusleeping actually came up with the name "best effort" and I just used it.

@whyrusleeping
Copy link
Member

@jbenet this isnt a separate pin type. All this is is a way to make the files api space not break when you run a GC, but also not require that everything in the files API be local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Files added via files API get garbage collected
4 participants