-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement limiting WebAssembly execution with fuel #2611
Conversation
I'll take a look -- thanks for putting this together so quickly! A few quick notes on the isel thoughts:
The former is an RMW compound op and could be done pretty reasonably with the current load-op merging framework; we haven't gotten around to it yet, but it's clear how to do so if we decide it's important. The latter is actually a bit more interesting: I recently fixed a bug (#2576) that arose because of certain combinations of load-and-compare that would sink a load beyond its first use by simply disallowing compare-from-memory. That too can be re-optimized but would take more machinery. Anyway, all that to say that there are reasons you couldn't get either of those instructions to come out the backend :-) |
Oh also to clarify, I don't think it's just isel that can be improved, but rather I think the frontend codegen could also be improved if we decide the time should be invested. For example an empty function 0000000000000000 _wasm_function_4:
;; prologue stack-check to ensure we're not out of stack space
;; note that this is only here because function calls are made internally
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: 4c 8b 17 movq (%rdi), %r10
7: 4d 8b 12 movq (%r10), %r10
a: 49 39 e2 cmpq %rsp, %r10
d: 0f 86 02 00 00 00 jbe 2 <_wasm_function_4+0x15>
13: 0f 0b ud2
;; allocating stack space, saving registers
15: 48 83 ec 10 subq $16, %rsp
19: 4c 89 24 24 movq %r12, (%rsp)
;; load vminterrupts_ptr into r12
1d: 4c 8b 27 movq (%rdi), %r12
;; increment the fuel consumed by 1 where 8(%r12) is where the previous count is
20: be 01 00 00 00 movl $1, %esi
25: 49 03 74 24 08 addq 8(%r12), %rsi
;; check if fuel is gerater than zero
2a: 48 31 c0 xorq %rax, %rax
2d: 48 39 c6 cmpq %rax, %rsi
30: 0f 8c 13 00 00 00 jl 19 <_wasm_function_4+0x49>
;; save the fuel count, load the intrinsic address, call it
36: 49 89 74 24 08 movq %rsi, 8(%r12)
3b: 48 8b b7 f0 06 00 00 movq 1776(%rdi), %rsi
42: ff d6 callq *%rsi
;; restore the fuel_var for this function (unnecessary instruction, happens uncondititonally after call)
44: 49 8b 74 24 08 movq 8(%r12), %rsi
;; save fuel_var at the end of the function (unnecessary instruction, happens unconditionally at function end)
49: 49 89 74 24 08 movq %rsi, 8(%r12)
;; function epilogue
4e: 4c 8b 24 24 movq (%rsp), %r12
52: 48 83 c4 10 addq $16, %rsp
56: 48 89 ec movq %rbp, %rsp
59: 5d popq %rbp
5a: c3 retq Larger functions like https://github.com/bytecodealliance/sightglass/blob/faf101e50f7af6243fe3e910cf9455eecd9e1368/benchmarks-next/shootout-ackermann/benchmark.wat#L25-L49 end up having a lot more code generated with instrumentation enabled. I haven't yet gone through instruction-by-instruction yet, though, to see what could be improved. |
Subscribe to Label Actioncc @fitzgen, @peterhuene
This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "fuzzing", "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice overall!
My main question below has to do with the way in which we actually handle out-of-fuel conditions, and whether this implementation as-is allows us to resume to do timeslicing. It's possible that more will come later, in which case, I'm perfectly happy to see this landed now; just want to understand how things will (eventually) work.
crates/wasmtime/src/store.rs
Outdated
pub fn set_fuel_remaining(&self, remaining: u64) { | ||
assert!(self.engine().config().tunables.consume_fuel); | ||
let remaining = i64::try_from(remaining).unwrap(); | ||
self.inner.fuel_adj.set(remaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we can't add to the existing fuel_adj
value here, in order to continue accumulating the consumed-fuel count and return the true total from fuel_consumed()
?
(In that case I might also call this add_fuel()
, and adjust existing rather than overwrite the fuel-consumed counter...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a tiny bit worried that a long-lived store might overflow the i64
counter but that may not be too realistic. Do you think that'd be rare enough that we should just switch this to an add instead of a set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it's nicer to not have the "fuel-used value that we return is only since the last set" property -- it has the potential to become a subtle stats bug later.
Doing some quick math, a 2^63 max count, at 1B Wasm ops per second, gives us 2^33 or 8B seconds of runtime before overflow, which is ~250 years. Sometime before the year 2270 we can come back and upgrade to an i128
:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh good point!
This commit adds codegen infrastructure necessary to instrument wasm code to consume fuel as it executes. Currently nothing is really done with the fuel, but that'll come in later commits. The focus of this commit is to implement the codegen infrastructure necessary to consume fuel and account for fuel consumed correctly.
This commit enables wasm code to periodically check to see if fuel has run out. When fuel runs out an intrinsic is called which can do what it needs to do in the result of fuel running out. For now a trap is thrown to have at least some semantics in synchronous stores, but another planned use for this feature is for asynchronous stores to periodically yield back to the host based on fuel running out. Checks for remaining fuel happen in the same locations as interrupt checks, which is to say the start of the function as well as loop headers.
The location of the shared interrupt value and fuel value is through a double-indirection on the vmctx (load through the vmctx and then load through that pointer). The second pointer in this chain, however, never changes, so we can alter codegen to account for this and remove some extraneous load instructions and hopefully reduce some register pressure even maybe.
Use fuel to time out modules in addition to time, using fuzz input to figure out which.
Ok this should be rebased and updated to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.