Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall.resolve* removes liveslots+clist entries (sending side) #767

Closed
warner opened this issue Mar 24, 2020 · 5 comments
Closed

syscall.resolve* removes liveslots+clist entries (sending side) #767

warner opened this issue Mar 24, 2020 · 5 comments
Assignees
Labels
SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Mar 24, 2020

The second step of #675 is to remove table entries from the sending side of the Promise resolution format. When a vat calls one of the resolution flavors (syscall.fulfillToPresence, syscall.fulfillToData, syscall.reject, and the as-yet-unimplemented syscall.forward), the liveslots code should first serialize the body of the resolution, which uses the valToSlot table to map Promise objects into vpid (vat promise identifier) strings. Then liveslots should delete the entries from valToSlot and slotToVal (and probably from importedPromisesByPromiseID too). Finally liveslots can invoke the syscall. This allows the vat to uphold its pledge that, once resolved, it will never speak of the vpid identifier again.

The kernel, after translating the slots through the vat's c-list (mapping each vpid to a kpid), must delete the c-list entry for the target of the resolution (which may or may not also appear in the body). This lets the kernel uphole its own pledge.

Note that the same javascript Promise object might be sent out of the Vat again in the future. If so, liveslots will assign it a new vpid (as the Promise is not present in valToSlot), and the kernel will see it as a brand new promise, which just so happens to get resolved on the very next turn.

@warner
Copy link
Member Author

warner commented Mar 25, 2020

One test case to keep in mind: while the recipient of a message does not directly get access to the Promise that represents their result, the vat which sent the message could always send it to them. And they might send it elsewhere, and/or subscribe to hear about it's resolution. We think this will work out ok, but we can imagine some improvements or optimizations which might change that, so we should be sure to have a good test case.

Alice does:

const p1 = bob~.one();
bob~.two(p1);

while Bob does :

let p;
let pr;
function one() {
  pr = makePromise();
  return pr.p;
}
function two(p1) {
  pr.res(value); //  bob resolves one() result
  carol~.foo(p1); // bob sends it elsewhere
  p1.then(stuff); // wants to hear about resolution
  Promise.resolve().then(carol~.bar(p1));
}

Bob's liveslots layer will see a dispatch.deliver() with a result= for p1 in the call to one(), and then it will see an argument referencing the same vpid in the call to two(). Within Bob, pr.p and p1 are not the same Promise object, but they're both associated with the same abstract "Promise Unum" (Chip's term, which I'm starting to love), and they'll both be resolved by the same party (Bob). A fancier implementation might manage to use the same actual Promise for both.

The full transcript for bob should look like:

  • dispatch.deliver(one, result=vpid1)
    • (internally, something like pr.p.then(thenResolve(vpid1), thenReject(vpid1)) is called to prepare for the result's eventual resolution)
    • no syscalls are made
  • dispatch.deliver(two, args.slots=[vpid1])
    • syscall.subscribe(vpid1) (happens during deserialization)
    • syscall.send(carol, foo, args.slots=[vpid1])
    • on the second turn of the two crank:
    • syscall.resolveToData(vpid1, value)
    • on the third turn
    • syscall.send(carol, bar, args.slots=[??])

The tricky bit is that the vat is supposed to stop talking about vpid1 right after the syscall.resolveToData is sent, but the Promise object (p1) is stlll hanging around, and could be referenced in subsequent messages.

With our current plan, p1 is removed from all the liveslots tables just before the second turn calls syscall.resolveToData, so when it does carol~.bar(p1) (on the third turn), liveslots doesn't recognize the promise, and allocates a new vpid2 for it.

But in our current implementation, Bob's p1.then must wait for a kernel roundtrip (indeed it must wait for everything else currently on the run-queue) before the callback can be run. If we were to optimize this by having thenResolve/thenReject notice that the vpid it's mentioning to the kernel is also in the liveslots tables, and calling the stored resolver directly, then something might happen in a different order. I don't know that this would cause a problem, but it's a case we should examine closely if/when we want to change this stuff.

@warner
Copy link
Member Author

warner commented Mar 25, 2020

Another thing to pay attention to is that subscribe. We currently subscribe during deserialization, because we have no way to sense when/if someone does a .then later on, so we must always subscribe just in case they do. #745 will change this to remove syscall.subscribe and have the kernel auto-subscribe vats when the c-list entry is added. We must then decide whether Bob is auto-subscribed to vpid1 just because it appeared as a result of an inbound dispatch.deliver (which seems wasteful in the normal case, where Bob never cares about the promise resolving), or Bob is only subscribed if the promise appears as an argument or as the result of a syscall.send. I think I can imagine a case where Bob resolves the result, deletes vpid1 as they should, and then a crank later gets a useless syscall.notifyResolve with a brand-new vpid2 that nobody cares about.

@warner
Copy link
Member Author

warner commented Mar 28, 2020

Chip and I walked through two approaches for retiring the resolver side of these tables, and we're coming around to preferring the second of the two.

As a refresher, any number of vats (or none) might have a c-list entry that references a given kernel promise identifier (kpid). If a vat has one, it will only have one: we never have two vpids for the same kpid in a single vat. Zero or one of the referencing vats will be the Decider. The identity of the Decider will move around over time, since vats can transfer their decision-making authority to a given promise by putting that promise ID in the result= field of a syscall.send. They can also consume that authority by putting it in the target= of one of the syscall.resolve* calls.

Vats currently notify the kernel (via syscall.subscribe(vpid)) if/when they care to hear about a promise being resolved. Vats currently do this the moment a new Promise is deserialized in the body of a dispatch.deliver() or dispatch.notifyResolve*() flavor, because this is when the real javascript Promise object is created and revealed to userspace code (which might call .then() on it). A vat which receives a promise via result= will not immediately subscribe, because that does not cause a Promise object to be created. If, however, the same vpid arrives later, in the body of a message (or the resolution-to-data of some other promise), now the vat is both Decider and follower, and it will subscribe(). And the vat might then shed its Decider status, while retaining its follower status. The vat has no way to shed its follower status: the javascript Promise API has no way to undo a .then() call.

I'm assuming that we'll follow Dean's recommendation (#745) to remove this syscall and treat any c-list reference to a kpid as meaning that vat should be notified about resolution. This will necessarily include vats who only learned about the promise in the result= of an inbound dispatch.deliver. This presents an awkward situation: the vat is itself the one to resolve the promise, so having the kernel notify it a moment later about the same resolution is wasteful.

Our performance goal is to remove any record of the Promise once it is no longer needed. Vats and kernels follow matching rules about when they can use a vpid in their syscalls and dispatches, and we'll achieve the goal by updating those rules to have both sides agree to stop using a vpid after certain events. We must ensure that the table-pruning this implements is done exactly once.

In the first approach, the vat-kernel rules state that c-list entries are removed during dispatch.notifyResolve*, and nowhere else. If Alice sends a message to Bob, both Alice and Bob will be subscribed to the promise. Later, when Bob resolves it, the kernel will queue dispatch.notifyResolve* to both vats. Bob's c-list entry will be removed as that notification is delivered. If Bob had liveslots table entries for the vpid, they will be removed upon receipt of the notifyResolve*, just after deserialization of the body (which could reference the same vpid). Bob may not have these table entries, if bob never received the Promise outside of a result= field. If the same vat is resolving and following, the local Promise object isn't resolved until receipt of the dispatch.notifyResolve*, which necessarily happens on a later Crank than the one in which the vat called syscall.resolve*.

In the second approach, we remove c-list entries during both dispatch.notifyResolve* and syscall.resolve*. The kernel subscribes vats as the kpid lands in their c-lists as before, but the kernel also unsubscribes the vat that calls syscall.resolve* as it removes the c-list entry. Vats which are both resolvers and followers now do not receive a dispatch.notifyResolve*, which means their liveslots layer must be responsible for resolving their local Promise objects themselves. This allows those resolution callbacks to happen during the same Crank as the resolution itself, rather than a later Crank (although of course they will happen in a different turn). This is an observable ordering difference between the two approaches.

I think we can assume the vast majority of Promises will be created as the result of a syscall.send, and will only be followed by the sending vat (so it will be rare for a single vat to both follow and resolve). The first approach imposes an extra Crank on this common case, for which the spurious dispatch.notifyResolve* will be ignored. The second approach is a bit more complicated, but avoids the extra Crank.

If we go with the second approach, we must start with a test case that sets up Bob as both follower and resolver, and makes sure Bob's .then gets called correctly. Then the changes to make are:

  • kernel should remove c-list entries just after translating/mapping the slots array inside syscall.resolve* (note that we check the invoking vat is really the Decider first)
  • kernel should remove the resolving vat from the subscriber list (if we find a clever way to synthesize the subscriber list from the c-lists entries, this will happen automatically)
  • as liveslots calls syscall.resolve*, it should check importedPromisesByPromiseID and invoke any resolver for the vpid it just resolved. Then it should remove that vpid from both importedPromisesByPromiseID and the valToSlot/slotToVal tables

@warner
Copy link
Member Author

warner commented Apr 7, 2020

Chip and I talked about the second approach and realized that we can make it better by replacing "unsubscribe the deciding vat" to just "don't notify the deciding vat". The kernel already knows which vat is calling syscall.resolve*, because it has to check that they're the decider anyways, so at that point it can notify all other subscribers and omit the decider. The decider won't always be a subscriber, in fact that will be pretty rare (but possible).

So our planned approach is:

  • For now, liveslots continues to call syscall.subscribe when they receive a vpid as an argument (in a dispatch.deliver* or dispatch.notify*), or when they emit a vpid as the result= of a syscall.send. They do not subscribe in response to receiving a vpid as the result= of an inbound dispatch.delive. They never call subscribe more than once.
  • liveSlots changes thenResolve() and thenReject() to check importedPromisesByPromiseID for the promiseID being resolved, and if present, they call the associated resolver/rejector. Both functions add a retirePromiseID(promiseID) call at the end, which removes the entry from all three tables: importedPromisesByPromiseID, vatToSlot, and slotToVat.
  • The kernel removes the clist entry upon receipt of syscall.resolve* and syscall.reject. The kernel queues notifications for all subscribed vats other than the decider vat (i.e. whichever one called syscall.resolve*). (dispatch.notifyResolve* removes clist+liveslots entries (receiving side) #766 should already be landed by this point, which means the kernel is also removing clist entries upon delivery of dispatch.notify*)

This avoids the need to unsubscribe/resubscribe as the decision-making authority wanders from vat to vat.

Later, when #745 removes syscall.subscribe, the kernel will auto-subscribe all vats the first time their c-list acquires a reference to the given kernel promise, and the only change to liveSlots will be the removal of syscall.subscribe() calls in importPromise and queueMessage. This will subscribe slightly more vats than before (since it will include the decider-only vats who only heard about the promise as an inbound result=), but it won't affect behavior because the kernel is already skipping notifications of the decider. We'll have an invariant that the set of subscribers is exactly equal to the set of vats which have the kpid in their c-list. As a result, we can optimize kernel state a bit and maybe keep that mapping in RAM, populating it from c-lists at kernel startup. Or we can come up with a more-clever way to store the kernel state, in some normalized form that can efficiently satisfy c-list lookups, subscriber identification, and reference counts (to then delete kpid entries when nothing else references them).

warner added a commit that referenced this issue May 7, 2020
rewrite liveslots use of HandledPromise, remove deliver() stall

refs #894
refs #930 (should unblock, although rewrites will be needed)
refs #767 refs #766 (should unblock)
closes #1053 (includes rebased form of the PR)
closes #769 (incidentally)
@FUDCo
Copy link
Contributor

FUDCo commented May 20, 2020

Closing as part of refactoring the tickets for #675. The underlying work has largely been done, except for promises that resolve to data that reference other promises, which is a project unto itself (see #1049)

@FUDCo FUDCo closed this as completed May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants