Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and clarify the virtual object implementation #1960

Merged
merged 2 commits into from
Nov 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions packages/SwingSet/docs/virtual-objects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Vat Secondary Storage

The kernel currently provides two secondary storage mechanisms for the use of (application) code running in a vat:

- Virtual objects
- Persistent weak stores

Each of these is accessed via a global made available to vat code.

## Virtual Objects

A virtual object is an object with a durable identity whose state is automatically and transparently backed up in secondary storage. This means that when a given virtual object is not in active use, it need not occupy any memory within the executing vat. Thus a vat can manage an arbitrarily large number of such objects without concern about running out of memory.

A virtual object has a "kind", which defines what sort of behavior and state it will possess. A kind is not exactly a data type, since it comes with a concrete implementation, but it indicates a family of objects that share a set of common behaviors and a common state template.

A vat can define new kinds of virtual object by calling the `makeKind` function provided as a vat global:

`makeKind(bodyMaker)`

The return value from `makeKind` is a maker function which the vat can use to create instances of the newly defined object kind.

The `bodyMaker` parameter is a user provided function that binds an in-memory instance of the virtual object with the virtual object's state, associating such instances with the virtual object's behavior. It is called whenever a new virtual object instance is created, whenever such an instance is swapped in from secondary storage, and whenever a reference to a virtual object is received as a parameter of a message and deserialized.

The `bodyMaker` function is passed the virtual object's state as a parameter, and is expected to return a new Javascript object with methods that close over this state.

One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
One distinguished method, named `initialize`, is called when new instances are first created, and is expected to initialize the state object. Any parameters passed to the maker function returned by `makeKind` are passed directly to the `initialize` method.
Although this `initialize` method is a method of the object returned by the `bodyMaker` it is not a method of the actual object being made. The maker returned by `makeKind` on makes objects with the remaining methods.

The initialize method is so special that I wonder whether it should be called, for example __initialize__. It is weird that you cannot use this system to make objects that happen to have a normal method named initialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had some way to declare initialize as private that would do 90% of the job, as the main purpose of the magic is to ensure that handing out a reference to the object is not also handing out a capability to reset all its state. If nobody tries to call initialize a second time they'll never notice the magic is even there. (The other magic is the convention that the initialize method is named initialize, but that's more of an aesthetic concern. I'd be open to marking it in some other way, though the specific suggestion of __initialize__ is so horribly butt ugly it makes me want to scream.)

Also (and this is really more of a pedantic point) initialize is a method of the actual object being made; it's just that it's removed after it's used for safety reasons. In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just that it's removed after it's used

With delete? God that's ugly. We should definitely find a way to express the initialization function other than by making it a method of the object.

In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

That implies that you start executing before deleting it. That in turn implies that you start executing before hardening. We cannot do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just that it's removed after it's used

With delete? God that's ugly. We should definitely find a way to express the initialization function other than by making it a method of the object.

I'm open to any suggestion that preserves reasonable ergonomics. Keep in mind that, as @warner has pointed out, the initializer needs to able to see the object that the body maker returns (or will return), not just the state, because it may need to pass it to somebody else as part of initialization.

In particular, if you had some kind of weird initialization dance that worked by having initialize hand the object to some external worker code that called initialize recursively, you could do that and it would work.

That implies that you start executing before deleting it. That in turn implies that you start executing before hardening. We cannot do that.

What do you mean by "start executing"? Keep in mind that I'm talking here about what happens during initialization, when the state is still malleable because it hasn't been entirely established yet. We're still on the initialize call stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "start executing"?

That you call initialize while the object still has an initialize method. This is fatally dangerous if

initializer needs to able to see the object that the body maker returns (or will return), not just the state, because it may need to pass it to somebody else as part of initialization.

because then it is giving other things access to the object a) before it is hardened, and b) while it still has an initialize method that may be abused.

Copy link
Member

@erights erights Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__initialize__ is so horribly butt ugly it makes me want to scream

That problem goes away if we stop confusing the namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized a very important implementation detail that you missed (maybe because you're looking at the documentation PR instead of the code?) and that I straight up forgot (meaning, among other things, that my remarks above about whacky recursive initialization are just totally, utterly, and embarrassingly wrong): the initialize function is extracted and deleted from the object, and then the object is hardened, before initialize is called. Nobody gets to see the object before it's hardened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking at the code, just found a bug in same, which I will now proceed to fix. Grrr.


For example:

```javascript
function counterBody(state) {
return {
initialize(name = 'thing') {
state.counter = 0;
state.name = name;
},
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
}
Comment on lines +31 to +56
Copy link
Member

@erights erights Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function counterBody(state) {
return {
initialize(name = 'thing') {
state.counter = 0;
state.name = name;
},
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
}
function makeCounterKit(state) {
const init = (name = 'thing') => {
state.counter = 0;
state.name = name;
};
const body = {
inc() {
state.counter += 1;
},
dec() {
state.counter -= 1;
},
reset() {
state.counter = 0;
},
rename(newName) {
state.name = newName;
},
getCount() {
return state.counter;
},
getName() {
return state.name;
},
};
return harden({
init,
body,
});
}

makeCounterKit makes a counter kit, which is a record with an init function and a body object. You can write it as above, which also gives a name to the body object that the init function and body methods can use. Since none of these do, this one could be written with everything inline

function makeCounterKit(state) {
  return harden({
    init(name = 'thing') {
      state.counter = 0;
      state.name = name;
    },
    body: {
      inc() {
        state.counter += 1;
      },
      dec() {
        state.counter -= 1;
      },
      reset() {
        state.counter = 0;
      },
      rename(newName) {
        state.name = newName;
      },
      getCount() {
        return state.counter;
      },
      getName() {
        return state.name;
      },
    },
  });
}

Both of these harden before anything happens, protecting ourselves from undesired mutation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kit properties should probably be named {init, self} rather than {init, body}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still too complicated and ugly to my eyes and full of places to make mistakes.

Would you feel better if instead of deleting initialize we just replaced it with ()=>{}?

BTW, this example illustrates an interesting difference between our respective API intuitions. I tend to see any place where the user needs to add a harden before passing to our SDK as an ergonomics problem. Basically, if they need to put in a harden to be safe and they don't, then they've unwittingly made themselves vulnerable. If the context is such that the harden should always be there, then the SDK should do it for them. (Better would be a language where every API surface is hardened at birth, but that's not the language we have.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you feel better if instead of deleting initialize we just replaced it with ()=>{}?

Worse. This goes in the wrong direction. Digs the hole deeper.

Better would be a language where every API surface is hardened at birth, but that's not the language we have

Agreed on that. I also agree it is not clear how to best recover from this mistake. But the recovery should not require subtle reasoning on the part of the programmer to stay safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code should say what it means. In that regard, my code is much simpler than your's because they both describe an object with no initialize method but only my code looks like it does that. Your code is smaller but misleads.

```

This function defines the body for a simple counter object with two properties in its state, a count and a name.

You'd define the corresponding virtual object kind with:

`const counterMaker = makeKind(counterBody);`

and then use it like this:

```javascript
const fooCounter = counterMaker('foo');
const barCounter = counterMaker('bar');

fooCounter.inc();
fooCounter.inc();
barCounter.inc();
barCounter.rename('new bar');
console.log(`${fooCounter.getName()} count is ${fooCounter.getCount()`); // "foo count is 2"
console.log(`${barCounter.getName()} count is ${barCounter.getCount()`); // "new bar count is 1"
```

Additional important details:

- The set of state properties is determined by the `initialize` method. The properties that exist when `initialize` returns _are_ the properties of the object. State properties cannot thereafter be added or removed.

- The values a state property may take are limited to things that are
serializable and which may be hardened (and, in fact, _are_ hardened and
serialized the moment they are assigned). That is, you can replace what value
a state property _has_, but you cannot modify a state property's value in
place. In other words, you can do things like:

```state.zot = [1, 2, 3];```

but you can't do things like:

```state.zot.push(4);```

- A virtual object can be passed as a parameter in messages to other vats. It will be passed by presence, just like any other non-data object you might send in a message parameter.

- A virtual object's state may include references to other virtual objects.

- The in-memory manifestations of a virtual object (which we call its "representatives") are not necessarily (or even usually) `===` to each other, even when they represent the same object, since a new representative is created for each act of deserialization. They do, however, all operate on the same state, so changes to the state made by a method call to one representative will be visible to the other representatives. A soon-to-arrive future version of the virtual object system will add a `sameKey` predicate that will allow you to compare two virtual object representatives to determine if they refer to the same virtual object instance.

- The object returned by the `bodyMaker` function _is_ the representative, so you can capture it inside any of the body methods (including `initialize`) in order to pass the virtual object to somebody else. So you can write things like the following:

```javascript
function thingBody(state) {
const self = {
initialize(someParam) {
otherObject.doSomethingWithMe(self, someParam);
state.foo = 47;
},
sendMeTo(somebodyElse) {
somebodyElse.hereIAm(self);
},
... // and so on
};
return self;
}
```

## Persistent Weak Stores

The vat secondary storage system also includes a `makeWeakStore` function, which is identical in API to the `makeWeakStore` function provided by the `@agoric/store` package. I.e., you call:

`const ws = makeWeakStore();`

to obtain a new weak store instance, which you can then access using the `has`, `init`, `set`, `get`, and `delete` methods just as you would if you had imported `makeWeakStore` from `@agoric/store`.

However, the vat secondary storage system's `makeWeakStore` augments the one provided by the `@agoric/store` package in two important ways:

- Weak store operations may use virtual object representatives as keys, and if you do this the corresponding underlying virtual object instance identities will be used for indexing rather than the object identities of the representative objects. (I.e., two object representives for which `sameKey` is true will operate on the same weak store entry).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided to fix this using WeakRefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me, but I totally don't understand what you mean by that. Fix what using weakrefs how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to weaken object identity if we're avoiding weakrefs. If we are now allowing ourselves to use weakrefs as long as we don't leak non-determinism, for each virtual object, we can ensure there is at most one javascript instance. When we unserialize a reference to that virtual object, we reuse the existing javascript object if it has not yet been collected. Virtual object identity and JavaScript object identity can remain exactly one-to-one, which helps us avoid a wide range of JavaScript programming hazards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will still rename sameStructure to sameKey and explain it as a less precise equivalent class. But it is less precise for pass-by-copy objects, comparing them only by contents, not identity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I recall the discussion now. That would allow us to reduce the current 3-layer implementation to a 2-layer implementation, which would simplify some things (though it would also require doing delicate WeakRef things in exchange, so I'm not sure it's a net win on implementation complexity -- though enabling users to use === probably pays for it). Did we eventually decide we were OK committing to Node 14 as the minimum required version?

BTW, I've implemented a placeholder sameKey function in the virtual object manager (PR forthcoming) but it only understands the inner key magic for virtual objects (everything else goes through object identity) and I was wondering about the work of integrating it with the current sameStructure implementation. Presumably this effort can disappear in a puff of logic if === works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed disappear.

It's not just the user using ===. It is all the hidden ways users depend on JS object identity which we'd need to teach them to avoid.

SES-shim needs to support earlier versions of Node. agoric-sdk does not. Endo will be somewhere between the two attn @kriskowal . This persistence system is right on the edge between Endo and agoric-sdk. Not sure yet which way it should go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This persistence system is very tightly integrated with the SwingSet kernel and liveslots, so I don't think it winds up as an Endo thing per se. It's plausible we'll want something like it for Endo, and also plausible that the something like it, whatever it proves to be, can cannibalize the current implementation, but I think we first have to answer some questions about the role of CapTP in Endo. Off topic here, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warner I think we need to have a conversation about the priority of reworking things using weak refs, relative to the other things on our plate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now have the option of mapping one-to-one with JS object identity, I strongly encourage us to.


- The values set for weak store entries will be saved persistently to secondary storage. This means that such weak stores can grow to be very large without impacting the vat's memory footprint.

An important caveat is that the latter entries are currently not actually weak, in the sense that if a virtual object becomes unreferenced, any corresponding weak store entries will not go away. This is not semantically visible to vat code, but does impact the disk storage footprint of the vat. Since virtual object representatives may come and go depending on the working state of the vat, there is no graceful way to garbage collect the underlying objects or corresponding weak store entries; a future system which can do robust distributed garbage collection should eventually rectify this, but for now you should not rely on any such garbage collection happening.
Loading