Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kernel->vat promise notification interface to support aggregate notifications #2065

Closed
FUDCo opened this issue Dec 4, 2020 · 1 comment
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@FUDCo
Copy link
Contributor

FUDCo commented Dec 4, 2020

What is the Problem Being Solved?

When the kernel notifies the vat of the resolution of a promise that is resolved to data (or rejected with data), in some cases promise cross-references contained in the resolution value can trigger a pathological chain of promise retirements followed by subscribe syscalls followed by re-notifications of the resolution, that continue ad infinitum. And, of course, infinite loops are bad.

Description of the Design

The cure is to atomically notify the vat of the resolution of a promise together with the (previously existing) resolutions of any other promises contained in the resolution value's reference graph that are unknown to the vat (i.e., not in the vat's c-list) at the time of notification. Currently the notifications for the other promises in the reference graph are triggered by the vat noticing, in course of processing the initial notification, that some of the promises referenced in the data are unknown, and in response issuing subscribe syscalls for those promises. This results in the notifications for these dependent promises being delivered in later cranks, in turn resulting in the cascading failure mentioned above, since other intermediate promises may have been retired in the meantime.

Delivering a collection of promise resolutions in a single notification requires a change to the kernel->vat notification interface. A complication is that we currently have three different kinds of resolution (resolve-to-data, resolve-to-presence, and reject) realized by three different notification calls, but since an aggregate resolution notification referencing multiple promises might reference promises that fall into more than one of these cases, the three cases will need to be unified. (Such unification is a cleanup notion we've been considering for a while, but largely on aesthetic grounds. Since there wasn't any pressing need driving it, it has been low enough on our priority list to never make it to an actionable state.)

More specifically, a notify delivery to a vat currently only contains the kpid of the single promise whose resolution is being notified; the implementation then interrogates the kernel's known state for that promise to decide what kind of action to take. This unfolds into one of three calls, notifyFulfillToPresence, notifyFulfillToData, or notifyReject, according to the state of the promise, extracting the remaining necessary parameters directly from the kernel's record of the promise itself. Each of the three individual notification functions takes different parameters specific to the kind of resolution that it is performing; in particular, both notifyFulfillToData and notifyReject represent the resolution value as a capdata object, whereas notifyFulfillToPresence simply passes the id of the presence directly, since that form is easier to handle and considerably more compact than the capdata representation.

Each of the three handlers both handles the specific case it implements and then does some post notification cleanup (such as retiring the now-resolved promise, which is specifically the thing that's getting us into the trouble motivating this refactoring). The actual case-specific handling logic can most likely be retained with minimal adaptation, but the overall processing of the aggregate notification will need to first handle the resolutions of each the notified promises and only when this is complete do the subsequent cleanup for all of them.

The plan for implementing this is as follows:

  1. Change both sides of the kernel->vat notification interface to allow the passage of a list of kpids in place of the single kpid passed now. The kernel-side notification code will behave as it did before by passing a list of length 1. The vat-side notify handler will iterate over the list but simply invoke the existing, otherwise unchanged, individual case handlers as it currently does.

  2. Split the individual case handlers and the code that iterates over the notified promise list so that we do all the notifications in one pass then all the post-notification cleanups in a second pass.

  3. Modify the kernel side of the notification (i.e., the code that issues the notify request) to walk the reference graph of the promise that it is given, remembering all the promises it finds that are not presently in the target vat's c-list. This entire list of promise ids will then be included in the notify request it sends.

Security Considerations

Since the intended semantics of notification remain unchanged, this refactoring should not introduce any new security considerations. However, the infinite request sequence that can be triggered using the older (i.e., current) implementation represents a possible denial of service attack vector. If we implement this refactoring successfully and it works the way we intend, that potential attack vector should be eliminated. We should regard that elimination as a primary goal of this effort and a requirement for its successful completion.

Both @warner and I have an intuition that the mechanism outlined here may only address one side of the recursive-runaway-on-promise-resolution problem, namely the kernel->vat side. We think there may be a corresponding issue in the vat->kernel side, wherein a single vat generating a set of circularly linked promise resolutions and then communicating them outward may trigger a similar runaway. We do not understand this case nearly as well, not even well enough to know if such a problem actually exists -- what we have at this point is just a suspicion. It is our hope that by implementing the fix described here we will improve our understanding of the problem space. Additionally, it may turn out that some of the examples we have in hand of runaway promise resolutions are actually manifestations of a vat->kernel problem; if so, we will discover this.

Test Plan

Obviously, all the existing tests of promise resolution, promise retirement, and notification should continue to pass successfully. In addition, I have a collection of swingset-runner demos that manifest the problematic behavior we are attempting to address here. Once we have verified that these examples no longer produce a runaway, they will be converted to unit tests and added to our test repertoire.

@warner
Copy link
Member

warner commented Feb 8, 2021

closed by #2087

@warner warner closed this as completed Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants