feat: value quota based idl decoder limiting #4657

crusso · 2024-08-09T12:05:14Z

Simplifies #4624 to a simple linear limit on the number of decoded values as a function of decoded payload size,
instead of using two linear functions on perfcounter (simulated or real) and allocation counter.

The function is:

value_quota(blob) : Nat64 = blob.size() * (numerator/denominator) + bias

where blob is the candid blob to be decoded, and numerator (default 1), denominator (default 1) and bias (default 1024) are Nat32s.

Much simpler than #4624 and doesn't depend on vagaries of instruction metering and byte allocation which varies with gc and compiler options, but is it good enough?

The constants can be (globally) modified/inspected using prims (Prim.getCandidLimits/Prim.setCandidLimits) which will need to get exposed in base eventually.

The quota is decremented on every call to deserialise or skip a value in vanilla candid mode (destabilization is not metered).
The quota is eagerly checked before deserializing or skipping arrays.

One possible refinement would be to combine the value quota with a memory quota (though the latter would still vary with gc flavour and perhaps word-size unless we count logical words)

Disable for destabilization (iff Registers.get_rel_buf_opt is zero)
Port new candid spacebomb test suite to drun-tests, to test against real perf counter provided by drun.
Bump candid dependency to most recent
Pass new spacebomb tests, both in candid test suite on wasmtime using value counter.

…ug); modify skip_any_vec to do limit checks

rts/motoko-rts/src/idl.rs

src/codegen/compile.ml

src/mo_values/prim.ml

luc-blaeser · 2024-08-13T09:15:26Z

One thing that I am unsure or I have probably missed (sorry for the potentially stupid question): Does the metering limit also need to be checked on ordinary decoding of Candid when it is not skipped and not recursively called?

luc-blaeser

Very nice PR!
I have only some minor comments and one question (probably not relevant).

crusso · 2024-08-13T16:04:07Z

One thing that I am unsure or I have probably missed (sorry for the potentially stupid question): Does the metering limit also need to be checked on ordinary decoding of Candid when it is not skipped and not recursively called?

Actually, I'm not sure I understand the question (which worries me). I think all calls, initial, skip and recursive should be doing the check, but I maybe you've seen something I haven't.

crusso · 2024-08-13T16:19:30Z

@luc-blaeser PTAL (and thanks for the review!)

chenyan-dfinity

Can we document the cost model somewhere, so that people can roughly know how to tune the parameters?

chenyan-dfinity · 2024-08-14T00:53:20Z

rts/motoko-rts/src/idl.rs

@@ -314,6 +319,7 @@ unsafe fn skip_any_vec(buf: *mut Buf, typtbl: *mut *mut u8, t: i32, count: u32)
        // makes no progress. No point in calling it over and over again.
        // (This is easier to detect this way than by analyzing the type table,
        // where we’d have to chase single-field-records.)
+        idl_limit_check((count - 1) as u64);


Can count be 0?

No, because we return on previous line 311 when count == 0

luc-blaeser · 2024-08-14T09:27:18Z

@luc-blaeser PTAL (and thanks for the review!)

Not that I have any concrete concern, I was just curious whether I got all aspects.
If I understood correctly, the metering covers skipping and recursive calls. Also, the recursive call is done for each decoded Candid value.
I was wondering about the following cases of whether metering could be relevant there or would already be implicitly handled (by the recursion metering).

If a value would have dynamic encoded length such as an array of listed elements, I guess the recursive function would be called for each sub-element. This would be fine.
Would there be a need for size-dependent metering of blobs? I guess there is no recursive call involved per blob byte.
IIRC, some constant Candid values can inflate to variable-sized objects (arrays, maybe also blob). This would be a non-linear effort (initialization) that is maybe not metered.
These are just thoughts of mine, probably not relevant.

crusso · 2024-08-14T10:11:13Z

@luc-blaeser PTAL (and thanks for the review!)

Not that I have any concrete concern, I was just curious whether I got all aspects. If I understood correctly, the metering covers skipping and recursive calls. Also, the recursive call is done for each decoded Candid value. I was wondering about the following cases of whether metering could be relevant there or would already be implicitly handled (by the recursion metering).

If a value would have dynamic encoded length such as an array of listed elements, I guess the recursive function would be called for each sub-element. This would be fine.

Yes, that's the idea.

Would there be a need for size-dependent metering of blobs? I guess there is no recursive call involved per blob byte.

Fortunately, the blob decoder first checks that the blob does not exceed the length of the candid payload, which is limited to 10MB by the IC (in the worst case). The Nat and Int decoders also check they don't decode past the candid payload, so I was hoping that would limit the allocation.

IIRC, some constant Candid values can inflate to variable-sized objects (arrays, maybe also blob). This would be a non-linear effort (initialization) that is maybe not metered.

Hmm, for arrays, maybe one could decrement the quota by number of elements before allocation to trigger then check, and then bulk-increment the quota before deserializing the element (to compensate for the earlier bulk check). Not sure.

These are just thoughts of mine, probably not relevant.

github-actions · 2024-08-14T15:50:39Z

Comparing from 367145d to 8e07a63:
In terms of gas, 5 tests regressed and the mean change is +0.0%.
In terms of size, 5 tests regressed and the mean change is +0.2%.

crusso · 2024-08-16T16:07:24Z

Can we document the cost model somewhere, so that people can roughly know how to tune the parameters?
Done in Changelog for now. Will add to base eventually...

crusso added 30 commits July 22, 2024 18:54

perf counter based idl decoder limiting

4c48d22

meter skip; add test

b7a4df2

more tests (some broken?)

7c87222

more tests

8d0e223

improver earlier tests (reporting cycles)

868d7e6

finish port of tests

1831132

assert low cost

d2cb2d8

cleanup tests

4d31ba4

radical refactoring

f63dfa8

update test output, refactr

cbb3d0b

don't meter during destabilization

7c8955c

up limits

f132d3b

fix test values acc to dfinity/candid#564

e6a75ed

cleanup test; add blurb

73a8667

add simulated (idl) perf counter on non-IC targets|

443d366

set up registers just once, including for skipped arguments (tricky b…

adaea7c

…ug); modify skip_any_vec to do limit checks

update test output

f9a8fdb

Rust formatting

809b5e2

allowing option fields of type Null too, to pass extended candid-tests

97bea78

update test output

c15b6e2

Update rts/motoko-rts/src/idl.rs

7053400

test we can decode reasonable binary sizes

b79337c

set limit as multiple of payload size

9dc5a3c

add test output

7cb0afd

up the limit to 1K

c7d51c5

add bias term

bcc059c

update bench numbers

ba4c267

update idl-sub test to allow optional fields of type Null

8ebe1f9

remove noisy output

4279856

test get/setCandidLimits

06dfcfe