-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Fuzzing Infrastructure #611
Comments
In Lucet, I wrote a simple fuzzing script that uses Csmith-generated C programs: https://github.com/bytecodealliance/lucet/blob/master/lucet-wasi-fuzz/src/main.rs The approach is to run each program via Lucet on WASI:
Then compare the
It's pretty bare-bones, other than the ability to run a creduce loop when a failure is found, but it should be possible to hook it up to libfuzzer and wasmtime. |
@fitzgen I haven't done a lot of fuzzing in the past, but I'll be more than happy to learn on the job and help out any way I can in testing out our WASI implementation. @acfoltzer lemme know if you need any help in potentially reusing your Lucet fuzzing harness in Wasmtime! |
For interface types specifically I suspect that the generator won't be too too different than what wasm generator we might have, unless we heavily base it on For an oracle I think our best bet will be to have someone entirely disconnected from the wasmtime interface types work to write an interpreter, and then we'd compare the two implementations against each other. I suspect we'd discover bugs in both, but I don't think we have much of an oracle otherwise right now. |
Thanks, @kubkon! I'm actually not going to have time to work on this for a few weeks at least, so if you're feeling eager, don't worry about jumping in and pinging me if you need any support. |
On the topic of oracles, the Another option is to use |
Good point that there are a few different tools we have at our disposal to observe syscalls. There is probably some eBPF APIs and I would lean towards whatever is both
Unless I'm mistaken, |
This crate is intended to hold all of our various test case generators and oracles. The fuzz targets we have at `wasmtime/fuzz/fuzz_targets/*` will eventually be ~one-liner glue code calling into this crate. Part of bytecodealliance#611
This is a good idea. We did something similar for a cranelift fuzz target. One downside to this approach is that the fuzz target cannot be seeded with a distilled corpus of valid-ish Wasm modules (since the input is a bitstring). Likewise, corpuses that are accumulated as fuzzers run will not be readily recyclable between generators that consume bitstrings (IIUC). These are not good reasons not to take this approach, but something to consider for future work. Overall this looks great. I like the equivalence checking idea. |
FYI, some discussion about this over here: rust-fuzz/cargo-fuzz#194 |
Hi guys, i'm planning to do some fuzzing on lightbeam in the next weeks ;) Just to give you a bit of context about me, I'm the guy behind webassembly-security.com and I'm teaching WebAssembly security and Rust security. I'm focused on fuzzing and vulnerability research on both WebAssembly (module & VM) and Rust code, so don't hesitate to ping me if needed ;) I agree with @jfoote, regarding using binaryen @fitzgen Regarding where to store fuzzing corpus, i would suggest a specific repo or server not link to this one to prevent user to download all those files accidentally. Also, corpus need to be minimize before being pushed in this storage repo. In general, you should have one fuzz target per APIs and per backends since corpus will evolved differently depending of the code triggered. |
|
I've set up a repo for the libFuzzer corpora here: https://github.com/bytecodealliance/wasmtime-libfuzzer-corpus |
Right ;) Regarding the libfuzzer corpus, have you evaluate the actual code coverage? |
Right. I've never seen the same input to
I have not. So far, I haven't been focused on doing the fuzzing itself so much as setting up the infrastructure, implementing oracles, etc. |
Hello all. I looked into using oss-fuzz for continuous fuzzing of libFuzzer/ Here are my notes on a basic few gaps that would need to be addressed to integrate with oss-fuzz as-is:
Last year there was some discussion in the oss-fuzz project of supporting Rust targets directly, where a maintainer (kcc) mentioned deviating from the norm and not supporting coverage builds, etc. If we want to pursue oss-fuzz for fuzzing Rust targets directly we could engage with the team to see if we might be able to do something less than ideal to get started, or if they are planning to change the interface to support This seemed like the right place to share and discuss this; if I am off-topic here just let me know (and please pardon me!). |
Thanks for looking into this @jfoote!! There are a couple projects already that use
I think we can work around this in the export CUSTOM_LIBFUZZER_PATH="$LIB_FUZZING_ENGINE" See https://github.com/rust-fuzz/libfuzzer-sys/blob/master/build.rs#L2 for details.
Yep, we should fix this issue by adding a new Overall, for our next steps, I think it makes sense to
Sound like a plan? I can take the first bullet point, and also continue working on the other bits mentioned in this issue. Can you take on the last two bullet points @jfoote? |
At first blush my sense was these projects might be using cargo to build non-instrumented dependencies that are linked into the fuzz targets. I didn't dive into them though.
Excellent, TIL.
SGTM. Even if the compile/instrumentation flags are not passed as expected I think a basic PR will be a good way to get the conversation started with the oss-fuzz team.
Sure thing. I am in a pre-US-holiday crunch right now so there might be a little delay, but I will get to this ASAP. |
Great -- thanks! I don't think there is any giant rush here, so if this gets bumped to after the holidays, that seems 100% OK with me :) |
This is done, and part of the new |
Quick update here: I was able to link the oss-fuzz build environment libfuzzer library ( Building with asan (the default) is OK, but specifying The other sanitizer that oss-fuzz can optionally build with is ubsan, but it is not supported by our toolchain here at this time AFAIK. My recommendation (and plan at this point, unless directed otherwise) is to ignore the sanitizer flag supplied by oss-fuzz, set the fuzz target configs to use only asan for good measure, and proceed to write a build script for the wasmtime/fuzz targets. I'll then make a PR to oss-fuzz after rust-fuzz/libfuzzer#56 is merged to get the conversation started. |
To support oss-fuzz PoC, see bytecodealliance#611
Hello @fitzgen! I have the strawman PR for the wasmtime oss-fuzz integration staged. Before we move forward with that, can you take a look at the project acceptance PR diff (jfoote/oss-fuzz@06542db) and see if it looks OK to you? Basically I set myself as the maintainer for now and added an email alias for you as well as [email protected]. Those addresses are used to get notifications when the fuzzers find something or the build breaks. Note that if aliases listed there have associated google accounts they will get access to the oss-fuzz dashboard and bug tracker. Should we add anyone else initially?
|
@jfoote looks great! 👍 I left a couple comments on the draft text. Everything else looks ready to go! |
Quick update for posterity and onlookers: we've successfully integrated the wasmtime fuzz targets with oss-fuzz, with the caveats outlined in the comments and referenced PRs above. Thanks to @fitzgen and @alexcrichton for making this happen! |
@fitzgen sir this was a gsoc2020 project idea, I worked in the application period and submitted a proposal. Given the time I had at I hand i wasn't able to get complete idea about the different vulnerabilities like ABI abstractions, Heap and Stack safety. I want to voluntarily contribute for the idea, but couldn't do the same before I clear out some doubts. |
https://bytecodealliance.zulipchat.com/ is the primary discussion channel. |
I think this can be closed. |
I plan on laying out some foundational fuzzing infrastructure for Wasmtime in the next few weeks. I'd like to use this issue as a kind of meta issue to keep track of this work. I'd also appreciate feedback on the plan from anyone with experience fuzzing or domain knowledge of a particular thing we plan on fuzzing.
Goals
Find bugs!
Make bugs (fuzzer-found or otherwise) easier to debug via automatic test case reduction.
Strategy
Breadth not Depth
At least initially, let's build out a few different fuzzing approaches enough that they start identifying bugs, but not spend a ton of time building bespoke tools tailored for exactly the problems we have at hand.
My assumptions are that
Therefore, by making a bunch of different just-good-enough fuzzers, we will repeatedly discover new, unique low-hanging fruit bugs.
Additionally, this gives us a nice foundation that we can spring board off of in the future when we decide to go deeper in any particular direction.
Decouple Generators and Oracles
A generator creates test cases (usually given an RNG or a random byte stream input). An oracle determines if executing a test case uncovered a bug. In general, it is good software engineering to separate concerns, but separating these two parts specifically allows us to:
creduce
), andImplementation
In general, I recommend that we use
libFuzzer
to drive our fuzzing. It is coverage-guided, which means it can find interesting code paths more quickly than testing purely random inputs will. It also has a nice Rust interface in the form ofcargo-fuzz
.Any custom generators we create should take
libFuzzer
-provided input bytes and then re-interpret that as a sequence of random values to drive choices inside the generator. This lets us combine the benefits of smart, structure-aware generators with those of coverage-guided fuzzing. We can implement this by implementing our custom generators in terms of thearbitrary
crate'sArbitrary
trait.As far as test case reduction goes, when a generator is creating Wasm files, it should be relatively easy to use binaryen's
wasm-reduce
on the Wasm file, or usecreduce
on the WAT disassembly. We can, however, do some small things to make the process turnkey:wasm-reduce
and/orcreduce
on a Wasm test case with any of our various oraclesFor generators that are creating custom in-memory data structures by implementing the
Arbitrary
trait, test case reduction requires we implement some custom logic. TheArbitrary
trait supports defining a customshrink
method that takes&self
and returns an iterator of smaller instances ofSelf
. We can use this to create custom test case reduction for each of our custom test case generators.Finally, any custom generator we create (and any generator we wrap that supports turning the generation of individual test case features on/off) should support swarm testing. Swarm testing is where we randomly turn on/off the generation of various test case features (such as, should a generator create Wasm test cases that use
call_indirect
or not?) so that we are more likely to generate pathological test cases where bugs are more likely to be found. This is relatively easy implement and should yieldFuzzing Wasmtime's Embedding API
This is a case where, unfortunately, we can't really use existing off-the-shelf solutions.
Generators
Oracles
Wasm Execution Fuzzing
We should fuzz our execution of Wasm. Yes, Cranelift has some fuzzing in SpiderMonkey, but we should also make sure that all of our Wasmtime-specific JIT'ing machinery is well fuzzed, as well as our WASI implementation and sandboxing.
Generators
Use
wasm-opt -ttf
to generate random, valid Wasm files.Write a custom generator that creates Wasm files that make sequences of WASI syscalls.
Oracles
Execute the file and ensure Wasmtime doesn't panic, fail any
assert!(..)
s, or segfault regardless if executing the Wasm traps.strace
the process or something and ensure it doesn't do any syscalls outside the preopened directory given to the WASI sandbox or something?Differential fuzzing where we compare the observable results of execution between:
More Stuff to Explore in the Future
Add support for code-coverage in Cranelift and leverage it to build equivalence-module-inputs testing and coverage-guided fuzzing for Wasmtime
Create test case generators and oracles for our Wasm interface types support? What would be involved here is not super clear to me yet.
Questions
Should the fuzzing corpus be committed into the git repo? Or perhaps should it be a separate repo that we include as a git submodule?
What work here should we prioritize?
Is there anything here you think we should not implement?
Are there any other WASI-targeted oracles we can create? The
strace
idea is pretty half-baked right now. I'd appreciate some more ideas from folks more involved in the WASI side of things than I am...The text was updated successfully, but these errors were encountered: