use method schemas to limit inbound argument shapes #5955

warner · 2022-08-13T04:32:05Z

What is the Problem Being Solved?

#5954 seeks to reduce memory-consumption attacks by unilaterally rejecting old unresolved inbound argument promises. It assumes that most inbound argument promises are short-lived, so any that remain long enough are part of an attack. If userspace did not see the doomed promise, it will not notice the (surprising, not kernel-sanctioned) rejection.

If an attacker sends:

function makeEmptyPromise() {
  return new Promise(resolve => undefined);
}
const emptyPromises = [];
for (let i = 0; i < 1000; i++) {
  emptyPromises.push(makeEmptyPromise());
}
E(victim).normalAPIMethod(normalArgument, ...emptyPromises);

(or, to avoid a per-message size limit constraint):

for (let i = 0; i < 1000; i++) {
  E(victim).normalAPIMethod(normalArgument, makeEmptyPromise());
}

and the victim implements:

const root = Far('root', {
  normalAPIMethod(normalArgument) {
    ..

then those extra promises will be ignored, and we use the #5954 technique to eventually reject them and shed the memory obligations. This wouldn't kick in until some number (perhaps 10,000) promises had been accumulated that way, to reduce (but not eliminate) the risk of rejecting an innocent promise that just took too long to resolve.

But, it would be great if we could learn that they were unused by userspace earlier, so we could remove them from importedVPIDs right away, during the dispatch.deliver that unserialized them a moment before.

@erights is working on a scheme to allow Far objects to declare a schema. This would name all of their exposed methods, and for each one it would define a pattern that limits the acceptable arguments it accepts. On one level, this acts as a standardized declaration of argument preconditions: the kind of thing we usually implement with a batch of asserts at the top of the function. The primary goal is safety and brevity: methods could declare their schema instead of writing a set of asserts (which might be incomplete). It might be possible to extract TypeScript data about the method and semi-automatically construct the schema record.

But if this structure is visible to liveslots (or other holders of the Far object), then it could also be used to test recently-unserialized arguments before invoking the target. If liveslots knew that the schema was not met, it would also know that the userspace function was never invoked. Therefore any new promises in those arguments could not possibly have had their .then invoked, removing liveslots' obligation to track and resolve them. The liveslots dispatch.deliver code would look something like:

unserialize arguments, identify new promises (instrument makePipelinedPromise somehow)
look up target object, target method, method schema
check arguments against schema
if (target method missing) or (schema check fails):
- unsubscribe or abandon new promises
else:
- subscribe to new promises
- deliver message to target

This could either complement the #5954 work, or serve as an alternative. If attackers cannot submit extraneous arguments or properties to userspace, they can't trick it into retaining extra data (depending upon how tight the schema limits are), and if liveslots can learn that it doesn't need to track the promise, they can't force liveslots to retain it either. If both were implemented, we might hope that innocent promises would never get rejected.

One concern is how practical it is for a vat to fully-schema-limit every externally-visible object. These are all defined by Far, but since schemas aren't even implemented yet, it may be a while until all vats (and contracts, and supporting libraries) have added schemas for all their Far objects. Until that point, any such unadorned object would be a vector for spurious arguments, and thus excess memory consumption.

Foolscap "Constraints"

Incidentally, this overlaps precisely with my own Foolscap serialization work, some 15-ish years ago. I designed the constraints system to work closely with the deserializer, so the inbound wire parser (tokenizing length-prefixed atoms) could reject a token that was too large to meet the constraints of the next node of the expected object graph. This allowed the deserializer to compute the largest memory burden an attacker could impose before either 1: the token was accepted as meeting the schema and added to the growing argument object graph, or 2: the connection was dropped.

In practice, I found it difficult and annoying to impose meaningful constraints on enough exposed objects to keep this largest-burden limit below infinity. And tight constraints were more likely to accidentally fire on moderate-sized lists, than to catch actual attackers. I was advised to replace the one-token-at-a-time defense with a per-user memory-limited proxy process, basically pre-deserializing the arguments for any particular method in process A (which operated under a ulimit memory contraint), then only if that didn't kill the process, the original serialized data would be sent to the real process B (where now we know an upper bound on the memory required to deserialize).

I never got around to implementing that scheme (nor ripping out the earlier one entirely), partially because it required coordination beyond a mere communications library (spawning processes, setting OS-specific ulimits), and partially because it exposed a difficult question of who the guilty principal should be. Which messages should share a process A, such that one oversized message would prevent the others from being delivered?

In Foolscap the easy answer was the connection: Tub X sends messages to Tub Y, and Y would react to an oversized message by severing the connection from X. And that's how the Constraint/per-token scheme did it. But that gets us into the same sort of provoked-attack-hurts-innocent-victim issue that #5953 is concerned about. If X is relaying (or incorporating data from) messages from W, and W is the one who decides to attack, then X may lose the connection even though W is to blame. We could say that X should look at its messages more carefully, but then we're also obligated to provide tools to help it do so.

In Foolscap, this was improved somewhat because each link between Tubs applied the constraints on both ends: during serialization on the sender, as well as during deserialization on the receiver. A well-behaving client would get a local error if it violated the constraints, which would reject the message but not drop the connection. (The object graph was serialized in a streaming fashion, so the good tokens were already sent over the wire, but the protocol included a cancellation/pop-the-stack atom, so the recipient could be told to discard the partially-received-but-not-yet-schema-violating object graph before the memory limits were exceeded: this salvaged the connection for later messages). Of course, the real protection is applied when the message is checked again, by the remote side during receipt, and the consequences for violation at that point are more severe/traumatic. In that scheme an unwitting relay could still be saved against connection drops by their own outgoing checks.

Description of the Design

underlying mechanism
update contracts to use it
~~liveslots~~ (later)

I don't know what @erights 's schema is going to look like, but what I heard sounded like it would overlap with the Patterns we currently use on Stores (either to limit what data can be put in them, or to write queries for .keys() and .values() that only return matching items). And I think they would be added as part of an augmented Far() function, either an extra argument, or the iface argument might grow from just a string to some larger string+schema object. And there would be some API for taking a Remotable and submitting arguments for consideration.

Liveslots would need to know how much to trust this schema-checking facility. Is it under userspace control? Could it call back into liveslots (reentrantly)? Will it hold onto the arguments we submit for conformance? Is it guaranteed to be synchronous?

This will influence how (and when) liveslots will call it. Obviously we must wait until after we've identified the target (which might be a promise) and deserialized the arguments. If we aren't worried about reentrance or async, then we'd probably wait until after the schema check to subscribe to the surviving imported promises. If we can't do that, then we'll need some sort of syscall.unsubscribe to unwind the kernel-facing obligations that a rejected message should not have incurred.

Security Considerations

On the whole this should improve security, both by making it easier (and more legible/auditable) to define and enforce a method's expectations, and to give code earlier in the delivery process (e.g. liveslots) an opportunity to drop or reject things that won't end up being used. In the long run, we might expose these schemas to the kernel (they're a great candidate for #2069 auxdata, assuming we could avoid tracking a separate copy for each instance), from which we could push them upstream into other vats (as part of their Presence), which could do a schema check before even sending the message. That would result in faster (local) errors, better-correlated with the sender (still not in the same turn, but at least on the same vat).

This is unlikely to help with return values, or the resolution values of inbound Promises. Or, if the method schema includes things like "This argument must be a Promise that resolves to an Amount record", enforcement of the schema will cause a delay and/or ordering difference: we can't deliver that message until after the input promise is resolved. So, do we allow messages to be delivered out-of-order, to reduce the buffering we do? Or do we delay all subsequent messages, so they'll all be delivered in order? How many messages are we willing to buffer? Does that open up a memory attack against the schema-checking code itself?

Performance Considerations

I am worried about the performance cost of doing this schema enforcement. I think we have a history of adding nominally-invisible mechanisms like this, assuming they'll be cheap enough (based on our experience with "big" engines like V8/Node.js), and then not measuring the actual peformance impact on a "small" engine like XS (which, in the end, is the one we actually care about). So I'd want us to build a benchmark and look at the numbers before enabling this in production.

The Constraint checking I added to Foolscap had a pretty significant performance impact (admittedly, some of that was overengineering in the serialization code).

Test Plan

Lots of unit tests, first for the underlying mechanism, and then extensive tests of the liveslots integration.

The text was updated successfully, but these errors were encountered:

warner · 2022-08-17T21:51:52Z

idea: liveslots gives the schema checker the serialized arguments and a function named subscribe, the checker either throws or invokes subscribe() with all of the Promises that were in the args. Liveslots defers its syscall.subscribe() until that function is called.

One downside is a reduction in generality of our platform: promises-in-arguments becomes the exception, rather than the rule, and userspace must do something explicit to accept them. Objects without a schema would receive lame Promises, which would never fire, and that would hard to discover.

OTOH, if the schema is expressed as part of the Far() declaration, we could say that the default schema just accepts everything and subscribes to all promises it gets, maintaining the status quo (and local method-bearing objects cannot be serialized at all unless they're explicitly marked as Far). We'd be increasing the coupling of liveslots and userspace a bit: liveslots currently accepts anything that marshal will accept, and inbound messages are delivered with HandledPromise, but this would change liveslots to look more closely at the target, so it could extract a schema to interrogate (or pass the subscribe power to).

Ideally all of this would be driven by userspace calling .then on the promises, but as we've discussed, await (which bypasses .then in favor of the internal implemention slot) blocks one approach. Giving a non-Promise thennable to userspace had other problems (Promise.resolve(x)===x reports false), but maybe not unsurmountable.

dckc · 2022-08-22T17:55:19Z

absent mailbox access, the scope goes down
users send strings to smart wallet, then it's our code.

turadg · 2022-08-31T20:00:24Z

So I'd want us to build a benchmark and look at the numbers before enabling this in production.

From Wallet meeting today:

we need a perf measure comparing objects with vs without input validation
@Tartuffo will create an issue to track this need, blocking this issue

Tartuffo · 2022-08-31T20:39:27Z

Done: #6098

Tartuffo · 2022-09-05T17:54:01Z

@erights Is your side of this done?

erights · 2022-09-05T18:32:51Z

In review at #6057

warner added enhancement New feature or request SwingSet package: SwingSet labels Aug 13, 2022

warner mentioned this issue Aug 13, 2022

liveslots/marshal option to reject oversized inbound messages #5956

Open

dckc added the pso label Aug 22, 2022

Tartuffo assigned erights Aug 22, 2022

Tartuffo removed the pso label Aug 24, 2022

erights mentioned this issue Aug 26, 2022

fix: patterns impose resource limits #6057

Merged

mhofman mentioned this issue Aug 26, 2022

Extraneous imported promises cause memory growth #6074

Open

Tartuffo mentioned this issue Aug 30, 2022

Validate Zoe inputs #4068

Closed

Tartuffo added this to the Mainnet 1 RC0 milestone Aug 31, 2022

Tartuffo closed this as completed Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use method schemas to limit inbound argument shapes #5955

use method schemas to limit inbound argument shapes #5955

warner commented Aug 13, 2022 •

edited by dckc

Loading

warner commented Aug 17, 2022 •

edited

Loading

dckc commented Aug 22, 2022

turadg commented Aug 31, 2022

Tartuffo commented Aug 31, 2022

Tartuffo commented Sep 5, 2022

erights commented Sep 5, 2022

use method schemas to limit inbound argument shapes #5955

use method schemas to limit inbound argument shapes #5955

Comments

warner commented Aug 13, 2022 • edited by dckc Loading

What is the Problem Being Solved?

Foolscap "Constraints"

Description of the Design

Security Considerations

Performance Considerations

Test Plan

warner commented Aug 17, 2022 • edited Loading

dckc commented Aug 22, 2022

turadg commented Aug 31, 2022

Tartuffo commented Aug 31, 2022

Tartuffo commented Sep 5, 2022

erights commented Sep 5, 2022

warner commented Aug 13, 2022 •

edited by dckc

Loading

warner commented Aug 17, 2022 •

edited

Loading