Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and clarify the virtual object implementation #1960

Merged
merged 2 commits into from
Nov 3, 2020

Conversation

FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Nov 2, 2020

No description provided.

@FUDCo FUDCo added documentation Improvements or additions to documentation SwingSet package: SwingSet labels Nov 2, 2020
@FUDCo FUDCo requested a review from warner November 2, 2020 21:26
@FUDCo FUDCo self-assigned this Nov 2, 2020
@FUDCo FUDCo force-pushed the document-and-clarify-virtual-objects branch from e07c615 to 9efd717 Compare November 2, 2020 21:30
packages/SwingSet/docs/virtual-objects.md Outdated Show resolved Hide resolved

A virtual object is an object with a durable identity whose state is state is automatically and transparently backed up in secondary storage. This means that when a given virtual object is not in active use, it need not occupy any memory within the executing vat. Thus a vat can manage an arbitrarilly large number of such objects without concern about running out of memory.

A virtual object has a "kind", which defines what sort of behavior and state it will possess. You can think of a kind somewhat like a data type, but without direct linguistic support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the following is too detailed or obscure for what you're trying to do, ignore. But "data type" is wrong, as a type can have multiple implementations. The "kind" here is a specific implementation. In conventional oo terms, it is more like a concrete class than like an interface type.

Suggested change
A virtual object has a "kind", which defines what sort of behavior and state it will possess. You can think of a kind somewhat like a data type, but without direct linguistic support.
A virtual object has a "kind", which defines what sort of behavior and state it will possess. You can think of a kind somewhat like an object implementation in the objects-as-closure pattern, but with explicit state management rather than lexically captured variables.

Copy link
Contributor Author

@FUDCo FUDCo Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was a little hesitant about my phrasing also. You're right that it's not a data type, which is why I described it as like a data type, hoping to draw an analogy. It is a type of type, but it's not abstract. One might almost want to use the world "class" if it weren't already hopelessly overloaded. "Kind" means "the things with this implementation", but I don't care for the description in terms of the objects-as-closure pattern, because I don't want to have to describe the objects-as-closure pattern nor assume people know what that means (a lot of people know it, but don't have a name for it, and this doesn't seem like the right place to try to first establish that connection). I think this paragraph still needs work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this on for size:

A virtual object has a "kind", which defines what sort of behavior and state it will possess. A kind is not exactly a data type, since it comes with a concrete implementation, but it indicates a family of objects that share a set of common behaviors and a common state template.


The `bodyMaker` function is passed the virtual object's state as a parameter, and is expected to return a new Javascript object with methods that close over this state.

One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
Although this `initialize` method is a method of the object returned by the `bodyMaker` it is not a method of the actual object being made. The maker returned by `makeKind` on makes objects with the remaining methods.

The initialize method is so special that I wonder whether it should be called, for example __initialize__. It is weird that you cannot use this system to make objects that happen to have a normal method named initialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had some way to declare initialize as private that would do 90% of the job, as the main purpose of the magic is to ensure that handing out a reference to the object is not also handing out a capability to reset all its state. If nobody tries to call initialize a second time they'll never notice the magic is even there. (The other magic is the convention that the initialize method is named initialize, but that's more of an aesthetic concern. I'd be open to marking it in some other way, though the specific suggestion of __initialize__ is so horribly butt ugly it makes me want to scream.)

Also (and this is really more of a pedantic point) initialize is a method of the actual object being made; it's just that it's removed after it's used for safety reasons. In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just that it's removed after it's used

With delete? God that's ugly. We should definitely find a way to express the initialization function other than by making it a method of the object.

In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

That implies that you start executing before deleting it. That in turn implies that you start executing before hardening. We cannot do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just that it's removed after it's used

With delete? God that's ugly. We should definitely find a way to express the initialization function other than by making it a method of the object.

I'm open to any suggestion that preserves reasonable ergonomics. Keep in mind that, as @warner has pointed out, the initializer needs to able to see the object that the body maker returns (or will return), not just the state, because it may need to pass it to somebody else as part of initialization.

In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

That implies that you start executing before deleting it. That in turn implies that you start executing before hardening. We cannot do that.

What do you mean by "start executing"? Keep in mind that I'm talking here about what happens during initialization, when the state is still malleable because it hasn't been entirely established yet. We're still on the initialize call stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "start executing"?

That you call initialize while the object still has an initialize method. This is fatally dangerous if

initializer needs to able to see the object that the body maker returns (or will return), not just the state, because it may need to pass it to somebody else as part of initialization.

because then it is giving other things access to the object a) before it is hardened, and b) while it still has an initialize method that may be abused.

Copy link
Member

@erights erights Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__initialize__ is so horribly butt ugly it makes me want to scream

That problem goes away if we stop confusing the namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized a very important implementation detail that you missed (maybe because you're looking at the documentation PR instead of the code?) and that I straight up forgot (meaning, among other things, that my remarks above about whacky recursive initialization are just totally, utterly, and embarrassingly wrong): the initialize function is extracted and deleted from the object, and then the object is hardened, before initialize is called. Nobody gets to see the object before it's hardened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking at the code, just found a bug in same, which I will now proceed to fix. Grrr.

packages/SwingSet/docs/virtual-objects.md Outdated Show resolved Hide resolved

However, the vat secondary storage system's `makeWeakStore` augments the one provided by the `@agoric/store` package in two important ways:

- Weak store operations may use virtual object representatives as keys, and if you do this the corresponding underlying virtual object instance identities will be used for indexing rather than the object identities of the representative objects. (I.e., two object representives for which `sameKey` is true will operate on the same weak store entry).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided to fix this using WeakRefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me, but I totally don't understand what you mean by that. Fix what using weakrefs how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to weaken object identity if we're avoiding weakrefs. If we are now allowing ourselves to use weakrefs as long as we don't leak non-determinism, for each virtual object, we can ensure there is at most one javascript instance. When we unserialize a reference to that virtual object, we reuse the existing javascript object if it has not yet been collected. Virtual object identity and JavaScript object identity can remain exactly one-to-one, which helps us avoid a wide range of JavaScript programming hazards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will still rename sameStructure to sameKey and explain it as a less precise equivalent class. But it is less precise for pass-by-copy objects, comparing them only by contents, not identity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I recall the discussion now. That would allow us to reduce the current 3-layer implementation to a 2-layer implementation, which would simplify some things (though it would also require doing delicate WeakRef things in exchange, so I'm not sure it's a net win on implementation complexity -- though enabling users to use === probably pays for it). Did we eventually decide we were OK committing to Node 14 as the minimum required version?

BTW, I've implemented a placeholder sameKey function in the virtual object manager (PR forthcoming) but it only understands the inner key magic for virtual objects (everything else goes through object identity) and I was wondering about the work of integrating it with the current sameStructure implementation. Presumably this effort can disappear in a puff of logic if === works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed disappear.

It's not just the user using ===. It is all the hidden ways users depend on JS object identity which we'd need to teach them to avoid.

SES-shim needs to support earlier versions of Node. agoric-sdk does not. Endo will be somewhere between the two attn @kriskowal . This persistence system is right on the edge between Endo and agoric-sdk. Not sure yet which way it should go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This persistence system is very tightly integrated with the SwingSet kernel and liveslots, so I don't think it winds up as an Endo thing per se. It's plausible we'll want something like it for Endo, and also plausible that the something like it, whatever it proves to be, can cannibalize the current implementation, but I think we first have to answer some questions about the role of CapTP in Endo. Off topic here, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warner I think we need to have a conversation about the priority of reworking things using weak refs, relative to the other things on our plate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now have the option of mapping one-to-one with JS object identity, I strongly encourage us to.


The `bodyMaker` function is passed the virtual object's state as a parameter, and is expected to return a new Javascript object with methods that close over this state.

One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
Copy link
Member

@erights erights Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__initialize__ is so horribly butt ugly it makes me want to scream

That problem goes away if we stop confusing the namespaces.

Comment on lines +31 to +56
function counterBody(state) {
return {
initialize(name = 'thing') {
state.counter = 0;
state.name = name;
},
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
}
Copy link
Member

@erights erights Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function counterBody(state) {
return {
initialize(name = 'thing') {
state.counter = 0;
state.name = name;
},
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
}
function makeCounterKit(state) {
const init = (name = 'thing') => {
state.counter = 0;
state.name = name;
};
const body = {
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
return harden({
init,
body,
});
}

makeCounterKit makes a counter kit, which is a record with an init function and a body object. You can write it as above, which also gives a name to the body object that the init function and body methods can use. Since none of these do, this one could be written with everything inline

function makeCounterKit(state) {
  return harden({
    init(name = 'thing') {
      state.counter = 0;
      state.name = name;
    },
    body: {
      inc() {
        state.counter += 1;
      },
      dec() {
        state.counter -= 1;
      },
      reset() {
        state.counter = 0;
      },
      rename(newName) {
        state.name = newName;
      },
      getCount() {
        return state.counter;
      },
      getName() {
        return state.name;
      },
    },
  });
}

Both of these harden before anything happens, protecting ourselves from undesired mutation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kit properties should probably be named {init, self} rather than {init, body}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still too complicated and ugly to my eyes and full of places to make mistakes.

Would you feel better if instead of deleting initialize we just replaced it with ()=>{}?

BTW, this example illustrates an interesting difference between our respective API intuitions. I tend to see any place where the user needs to add a harden before passing to our SDK as an ergonomics problem. Basically, if they need to put in a harden to be safe and they don't, then they've unwittingly made themselves vulnerable. If the context is such that the harden should always be there, then the SDK should do it for them. (Better would be a language where every API surface is hardened at birth, but that's not the language we have.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you feel better if instead of deleting initialize we just replaced it with ()=>{}?

Worse. This goes in the wrong direction. Digs the hole deeper.

Better would be a language where every API surface is hardened at birth, but that's not the language we have

Agreed on that. I also agree it is not clear how to best recover from this mistake. But the recovery should not require subtle reasoning on the part of the programmer to stay safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code should say what it means. In that regard, my code is much simpler than your's because they both describe an object with no initialize method but only my code looks like it does that. Your code is smaller but misleads.

@erights
Copy link
Member

erights commented Nov 3, 2020

__initialize__ is so horribly butt ugly it makes me want to scream

That problem goes away if we stop confusing the namespaces.

Patient: Doctor, it makes me want to scream when I confuse namespaces.
Doctor: ...

@FUDCo FUDCo force-pushed the document-and-clarify-virtual-objects branch from f9990b1 to e407e6b Compare November 3, 2020 20:56
@FUDCo
Copy link
Contributor Author

FUDCo commented Nov 3, 2020

Since this is a documentation PR that documents what's currently on master, I'm going to go ahead and merge it, but doing this is not a claim that we've resolved the API design discussion.

@FUDCo FUDCo merged commit 76e0723 into master Nov 3, 2020
@FUDCo FUDCo deleted the document-and-clarify-virtual-objects branch November 3, 2020 21:10
@erights
Copy link
Member

erights commented Nov 3, 2020

Since this is a documentation PR that documents what's currently on master, I'm going to go ahead and merge it, but doing this is not a claim that we've resolved the API design discussion.

@FUDCo
Gotcha. Please file two issues with pointers into the threads above. One on mixing initialization with object methods, and one on canonicalizing object identity. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants