Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish kernel-side retirement of C-list entries for resolved promises #2110

Merged
merged 3 commits into from
Jan 22, 2021

Conversation

FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Dec 21, 2020

These changes implement step 3 of the plan laid out in #2065

@FUDCo FUDCo added enhancement New feature or request SwingSet package: SwingSet labels Dec 21, 2020
@FUDCo FUDCo requested a review from warner December 21, 2020 21:48
@FUDCo FUDCo self-assigned this Dec 21, 2020
@FUDCo FUDCo marked this pull request as draft December 22, 2020 23:41
@FUDCo FUDCo marked this pull request as ready for review December 24, 2020 08:15
@FUDCo FUDCo marked this pull request as draft December 24, 2020 08:15
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. The main change I want to make is to remove the clist entries before giving the vat control, which will also cause a few of the tests to change slightly (expectRetirement). The secondary change would be to simplify/augment the unit tests to exercise the resolution cycles more directly (using a technique I didn't think of the first time through, so we'll have to write some new code together).

}

scanKernelPromise(rootKPID, rootKernelData);
return Array.from(seen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me thinking about determinism and how this list will be ordered, and whether we ought to consider sorting the seen array. I believe Array.from(set) will use the insertion order from the set, which means the depth-first traversal order of the resolved-promise graph (filtered by promises known to the vat). This should be consistent across replicas, and doesn't reveal any information that the vat isn't about to learn anyways. But in terms of a specification (and thinking about a non-JavaScript implementation of the kernel), is the insertion-order behavior sufficiently well-established that we could replicate it in something other than javascript?

My hunch is that we don't need to sort this, but I wanted to hear your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is also being used to build the resolutions object. And we need to think about the order in which the vat observes the resolutions it processes: the vat needs to hear about a new promise before it hears about its resolution. If it got the resolution first (for a promise it didn't know about), .. hm, I don't know what that would do, it might work, but I think all of our existing code expects to be told about a promise in one crank, and then get the fulfillment in a subsequent one.

In the new world, a single crank gets a batch of resolutions, but the vat is still processing those resolutions in a particular order, and the same expectations apply.

This might be a reason to change resolutions from an object (in which the vpids are the keys) into an array (of [vpid, rejectionFlag, resolutionData] 3-tuples), just to make it clear that there's an expected order in which they should be processed. I don't know that I'd want to rely upon JS's insertion order being carried all the way through (JSON serialization, separate processes, potentially multiple languages, etc).

Copy link
Contributor Author

@FUDCo FUDCo Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.from(set) is actually Array.from(iterable) and relies on Set's iteration order for the array order. The iteration order of a Set (as well as the various other collection classes and collection-like things) is specified in ECMA-262 to be insertion order, which is the order we want here because it corresponds to graph traversal order, which also means dependency order, which I think is necessary for correct functioning. If you were to sort the array as a separate step, you'd need a comparator that could compare dependency order, which I don't see how to implement short of keeping an explicit record of the depth-first traversal order, which Set already does (you could, for example, accumulate the array as you go instead of doing an explicit sort at the end, but the scanning algorithm needs to do random-access lookups, which would require maintaining the Set also for queries, but if you do that you don't need the array due to the aforementioned Set iteration semantics).

I'd note that while depth-first graph traversal order yields a dependency ordering, depth-first order is unique while dependency order is not necessarily unique since dependency order is a lattice, so if you were to implement this in another language that didn't have something like Set with JavaScript's ordering rules, you might have to do the work yourself -- so we might need to explicitly specify that the intervat representation of a notification presents the promises in some defined dependency ordering like depth-first, even though that's just what the JavaScript path of least resistance delivered us for free.

My intuition now leans towards thinking the resolutions object should be changed to an array. (I actually had the intuition it should be an array at the beginning but I let you talk me out of it because my sense that an array was better was due to a sort of cargo-cult notion that all the other things we send are arrays so this one should be too; I wasn't really thinking about the importance of ordering -- and random-access is convenient.) As long as we are in the JavaScript universe, we are OK because JavaScript objects also obey the insertion-order-is-iteration-order rule, but once we encode things to JSON it's possible that a non-JavaScript JSON implementation might re-order things, since JSON objects are explicitly unordered (or, more pedantically, the JSON spec takes no position on whether or not a JSON implementation will impute meaning to the ordering of the properties in a JSON object) whereas JSON arrays are explicitly ordered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye. I guess it's fine to have the order determined by the graph being traversed, that's plenty deterministic.

I suppose the one benefit from using an Object or Map or something else key-value-ish is the enforced property that each promise-id can only be included once, whereas with an Array a misbehaving function might produce one with duplicate keys. But I think the fact that ordering matters here should push us into preferring the Array despite that minor drawback.. using an Array makes it abundantly clear to the reader that the creator of resolutions has a specific order in mind, whereas relying upon JS's (well-specified but non-obvious) insertion ordering is not nearly as discoverable. Some auditor down the road will save time if we use the Array for this, and that's enough reason to do it.

@@ -472,7 +472,16 @@ export default function buildKernel(
kernelKeeper.incStat(statNameForNotify(p.state));
const kd = harden(['notify', kpid, p]);
const vd = vat.translators.kernelDeliveryToVatDelivery(kd);
await deliverAndLogToVat(vatID, kd, vd);
if (vd) {
await deliverAndLogToVat(vatID, kd, vd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should delete the clist entries here: after translation but before calling deliverAndLogToVat, because otherwise a vat which attempts to reference the vrefs that shall not be uttered will succeed, when that ought to be a fatal error. We don't have to trigger the GC walk of what the perhaps-now-entirely-unreferenced promises pointed to, but I think we need to at least delete the clist entries before giving control to the vat.

It might also make sense to augment kernelDeliveryToVatDelivery somehow, to return a list of kpids rather than digging the vpids out of the vatDelivery object and then translating them back into krefs. Or maybe vatKeeper.deleteCListEntry could take just one kind and look up the matching one on its own. OT3H our vat-side nondeterministic GC code might introduce unidirectional clist entries (deleting one direction but leaving the other in place until some further event happens), so maybe it's better to not introduce an assumption that both entries are always present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, the re-lookup of the kpid in the vatKeeper's vat-to-kernel mapping table kind of bugged me too, but it seemed like a lot of work to get rid of it so I didn't bother. But if it's done through a lookup like this, it strikes me that once we are properly handling the cleanup on the liveslots side, the code here simply won't work for deleting the clist entries after calling deliverAndLogToVat because after it is done, the requisite mapping table entries will be gone and the mapVatSlotToKernelSlot lookups will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, as discussed in the meeting we should change vatKeeper to have a method likedeleteCListsForKpids(kpids) or something, to allow the translation to take place as close as possible to the data it needs

Copy link
Contributor Author

@FUDCo FUDCo Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've been working through this change, I discovered that the notion of deleting the clist entries after translation but before delivery will not work given the current implementation of liveslots, because during the processing of the notify, the vat generates ephemeral subscribe requests that mention promises that are going away. If the rule is "never speak of this again after this crank", then deleting the clist entries after the notify delivery (i.e., the way it works now) is fine, but if we want to have the vpids immediately become unmentionable, we'll have to devise a different way to manage these ephemeral subscribe syscalls. I'm inclined to want to do that, since these subscribes are kinda stupid, but the mechanism is not obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we worked this out, by having liveslots keep track of which promises were introduced during deserialization and which are being resolved by the crank, and only subscribing to the remainder. So the vat should never be subscribing to vpids that are introduced-and-resolved.

@@ -79,12 +79,27 @@ function makeTranslateKernelDeliveryToVatDelivery(vatID, kernelKeeper) {

function translateNotify(kpid, kp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm vaguely unsettled that "translate" now changes the shape of the thing so completely. I wonder if it would be tidyier to build a kernelDelivery(type=notify) with a full resolutions batch (all using kpids), by crawling the Promise graph and checking kpids against the target vat's clist (for the "do we need to tell them now" question), and then give translateNotify the one job of converting all the krefs to vrefs.

It would be kind of silly, in that we'd be checking the same vat's clist twice (once from the kref side and again in the translation function), but it would maybe make things easier to read. And the first clist check would only be looking for presence-vs-absence ("is this kpid in this vat's clist?"), while the second check is the one that actually translates krefs to vrefs.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also remove the slightly-squirrly double "do nothing" pathways in kernel.js processNotify(), where it bails out either because 1: the target vat is dead, or 2: the translator discovered that the target vat no longer cares. If we rewrote this as above, processNotify would learn that resolutions.length === 0 before calling the translator, so the translator would no longer have a null return case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I discovered when I was writing the slogulator was that it turns out that lots of these translations change the shape of the thing being translated. At first it bugged me and I was going to "fix" them, but the phenomenon was sufficiently widespread that changing the situation proved difficult and pointless and so I gave up. So adding one more example of the phenomenon didn't seem to me to be incrementally worse. The notify which does nothing (because the thing it was going to notify already has been) is a little whacky, but I don't see that it's a problem per se that needs to be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the shape we came up with in the meeting, let's see how that feels once it gets implemented. I kind of wish the translators were less mutating, but of course their primary purpose is to add entries to the clist when necessary. I do like the thing we came up with where the translator might add an entry, but never deletes them on its own; rather the non-vat-specific kernel code which creates the delivery object is responsible for any deletion that needs to happen afterwards.

}
resolveFromKernel(promiseID, resolution);
// XXX question: do we need to call retirePromiseIDIfEasy (or some special
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be ok for now: if the kernel retires the vpid as it resolves the promise into the comms vat, but the comms vat doesn't forget the vpid the way it's supposed to, and then later somebody on the far end of the wire re-mentions that same promise, the comms vat will re-introduce it to the kernel. That will re-export the promise.

The main problem I can imagine is if the comms vat doesn't realize that the kernel expects a resolution for the new copy of the old promise. We'll be doing a pass through the comms vat before this whole task is done, and I think we'll fix any problems during that pass.

// get a bogus dispatch.notify. Currently we throw an error, which is
// currently ignored but might prompt a vat shutdown in the future.

// TODO: The following goofiness, namely taking apart the capdata object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during our upcoming "make the comms protocol support batched resolutions too" task (or however we do it), this goofiness might be replaced with something less goofy

@@ -595,7 +578,7 @@ async function doTest4567(t, which, mode) {
t.is(clistKernelToVat(kernel, vatA, p1kernel), undefined);
t.is(clistVatToKernel(kernel, vatA, p1VatA), undefined);
} else {
t.is(inCList(kernel, vatA, p1kernel, p1VatA), true);
t.is(inCList(kernel, vatA, p1kernel, p1VatA), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird.. if the item should be missing from the kernel clist in both expectRetirement and !expectRetirement cases, then let's move this clause out of the if block, rather than making the same test in both arms. But how is it that inCList() is expected to return false at the same time that clistKernelToVat/clistVatToKernel are expected to return values? I don't see how that could ever pass, which makes me think that we're no longer exercising the !expectRetirement case at all (which is what we want in the long run: we should retire every resolved promise, so sooner or later we should remove support for testing the non-retiring case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, expectRetirement is now unconditionally true, so this second clause isn't being exercised, so the mismatch is not causing a failure.

Let's remove the expectRetirement flag entirely, and delete else arm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectRetirement flag is a vestigial leftover from when we were first implementing promise retirement stuff, so that we had the option to run with and without promise retirement (without breaking the tests) until we were satisfied that it was solid. Now that we've firmly embraced it, I think it can be safely discarded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we discard it now? or do you have a followup PR to do that?

@@ -575,7 +558,7 @@ async function doTest4567(t, which, mode) {
};
onDispatchCallback = function odc1(d) {
t.deepEqual(d, resolutionOf(p1VatA, mode, targetsA));
t.is(inCList(kernel, vatA, p1kernel, p1VatA), !expectRetirement);
t.is(inCList(kernel, vatA, p1kernel, p1VatA), expectRetirement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird, and I suspect that expectRetirement is no longer being exercised correctly. If we expect the item to be retired, why would we assert that it's still in the clist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, this is related to my recommendation that we remove the clist entry before giving control to the vat. Since the current branch is not doing that, this callback (which runs while the vat has control) will observe the clist entry to be present (independent of expectRetirement). Then, a few lines later, after the dispatch has completed, we test the clist again, and expect the entry to be removed.

To match the behavior of the code, this line should change to use true instead of expectRetirement.

If we can remove the clist entry before giving control to the vat, then this test should change to false instead of expectRetirement.

data: capargs([slot0arg], [pbA]),
});
t.deepEqual(log, []);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is incomplete.. let's talk this afternoon about what needs to be exercised here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, did we write a more complete test for this? I remember we walked through some code, just want to make sure that effort didn't get lost.

@@ -608,3 +591,240 @@ for (const caseNum of [4, 5, 6, 7]) {
});
}
}

test(`kernel vpid handling crossing resolutions`, async t => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this to use just a single vat, and build promise data up in the kernel directly (rather than using a separate vat to construct it). Let's walk through it today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we write up a simpler test to exercise this?

@FUDCo FUDCo force-pushed the promise-gc-phase2-step3 branch 2 times, most recently from a031034 to f3d0c76 Compare January 12, 2021 00:01
@FUDCo FUDCo marked this pull request as ready for review January 12, 2021 23:13
@FUDCo FUDCo requested a review from warner January 12, 2021 23:15
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few minor comment suggestions. I would like to make sure we aren't losing the new/simpler tests we walked through, though.

@@ -595,7 +578,7 @@ async function doTest4567(t, which, mode) {
t.is(clistKernelToVat(kernel, vatA, p1kernel), undefined);
t.is(clistVatToKernel(kernel, vatA, p1VatA), undefined);
} else {
t.is(inCList(kernel, vatA, p1kernel, p1VatA), true);
t.is(inCList(kernel, vatA, p1kernel, p1VatA), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we discard it now? or do you have a followup PR to do that?

@@ -608,3 +591,240 @@ for (const caseNum of [4, 5, 6, 7]) {
});
}
}

test(`kernel vpid handling crossing resolutions`, async t => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we write up a simpler test to exercise this?

data: capargs([slot0arg], [pbA]),
});
t.deepEqual(log, []);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, did we write a more complete test for this? I remember we walked through some code, just want to make sure that effort didn't get lost.

@FUDCo FUDCo merged commit dd5f7f7 into master Jan 22, 2021
@FUDCo FUDCo deleted the promise-gc-phase2-step3 branch January 22, 2021 01:55
@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 22, 2021

Closes #2065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants