-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions: Structure aware fuzzing, using internal functions and more #32
Comments
I think that given everything thus far is in Automake, adding CMake is going to be a tricky sell. I have no issue with moving towards a more RAII-style setup, however; when I wrote this originally I was coming from a very C oriented background.
Unsure about this. You might be used to this but most libcurl examples use select(). I'm basically worried about maintaining and extending any code added here.
We have I think one internal fuzzer already (curl_fnmatch) which doesn't need anything special patching in libcurl. I'd obviously like to see more internal fuzzers! Hopefully these can be added in a similar way to the fnmatch fuzzer.
At this point in time it might be worth looking into data formats that are easier to fuzz. One of the nice things about TLVs obviously is that they allow null bytes, which are historically fun to fuzz with. Are you talking about adding a custom mutator as per https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md ? If so, that is quite interesting. Would love to see a PR on that.
Agreed
Honestly, not a fan of this. This just appears to be adding cmake into the mix because you want to use cmake, not for any technical reason. Having had exposure to all sorts of build systems, I'm reasonably certain that anything you can do in cmake you can do in Automake.
I think we should start with http, and in automake we just create a
Assuming we can write these in such a way that doesn't require curl hackery, it'd be good to get these into the ossfuzz build regardless.
I can perhaps update the corpus in the repository; it's been a while since I've done that. Alternatively you could just monitor the coverage report from oss-fuzz: https://storage.googleapis.com/oss-fuzz-coverage/curl/reports/20190927/linux/report.html. These URLs are apparently unsecured, so you could monitor the coverage in response to any fixes that get made. |
The internal fuzzersThese have to go "inside" curl, since they need internal headers which are hidden otherwise (I might be wrong, but I am completely new to curl until I started fuzzing this). I use functions from doh.h, for instance, which are inside UNITTEST macros. I think I tried to hack around it, but then I needed curlconfig.h etc so I did it this way. No big harm, see below under the CMake section below. I took a look at the fnmem and it seemed to be copied, I tried to mimick that for the doh fuzzer but could not get it working. Instead I went for adding to the existing cmake system in curl. CMake
There is already CMake support in place in curl, although a bit sketchy, but it works well enough. To be able to get this to work with building the fuzzers from inside instead of outside like the existing, I had to do the following:
option(ENABLE_FUZZING "enables fuzzing" Off) in curl/CMakeLists.txt
if(ENABLE_FUZZING)
include(${CMAKE_SOURCE_DIR}/../curl-fuzzer/intree_fuzzer/CMakeLists.txt)
else() The changes above can even be patched dynamically in the fuzzer build script, if one does not want to change curl in any way due to this. I do not know how many users the cmake curl build has and if they would be upset by adding C++ as a language? The cmake for building the fuzzers (in curl-fuzzer/intree_fuzzer/CMakeLists.txt ) is not hard at all, it is about 100 sparse lines of vanilla cmake. I understand the resistance to yet another build system but cmake enables easy working in an IDE as well as having separated build/source directories. It makes developing fuzzers much easier when having debugging, IDE and code completion working straight out of the box. Now, curl already has cmake going and if we agree that the existing fuzzers should continue living in parallel, I do not see much harm in having these extra fuzzers build with cmake? The benefit for me is huge. I use autotools in another project I am the author and maintainer of, and it has taught me that cmake is a better option for me. Will an auto tools based solution be less than 100 lines of code? Considering that the new fuzzers are hooking into the existing build system of curl instead of being a parallel build conf as the existing fuzzers, is there any benefit of writing a new autotools based build to avoid 100 lines of cmake? How do you develop and debug the fuzzers, do you use an IDE or perhaps an emacs/gdb style setup? Everyone is different, I settled for using an IDE and that works good for me and won't stop anyone using gdb on the command line or some other editor of their choice. TLV and custom mutation
Boost AsioI understand having the select based approach is good for maintenance. But if it would be possible to get https fuzzing and higher performance, wouldn't it be worth it? I image this would be a separate fuzzer, keeping the existing ones as is. So it is more of adding than replacing. Asio is not just another random library, it is being standardized into C++ and it is very well proven. Corpus
Thanks for that link, I did not know those were public! That helps a lot, but is there anything preventing you from downloading the corpus and providing me with a copy? It makes things much easier for me. Or allowing me to add myself as cc on the oss-fuzz config, then I can make a PR on oss fuzz and then download it myself. I am not sure having the corpus in the repo is good, I do this myself but it tends to become huge diffs which makes it harder to navigate around the git diffs (suddenly there is 1000 pages of corpus change I have to ignore). Http fuzzerI can see from the link you sent that the the following areas are not fuzzed
I have added proxy and time myself, but I need a good corpus to know if I can't reach the right places because of a poor corpus/lack of cpu time, or the fuzzer target is not correct. |
I just saw that the existing fuzzers have made doh requests which I added 14 days ago #30, but still have not gotten a working reply! That says something about how long time it takes to fuzz. It would be lovely to see if structured aware fuzzing would be able to find that faster. |
If you can provide the source of one of these fuzzers I can see if it can be added as is?
Unknown, @bagder might know.
Sorry, but personally I don't think it flies to introduce a new build system which is the only way of building a set of fuzzers. I'd be happy if you wanted to provide a CMake file as well as Automake rules, because that's how it's done in the main curl repository. I'm also happy if you want to provide the source of the fuzzers and have me write the Automake rules to compile them. I'm not entirely sure why CMake requires you to hack around the
I think it's more like 5 lines per fuzzer assuming the fuzzers can be written like fnmatch.
Personally, gdb and Sublime.
If it's a separate fuzzer then I don't really see a problem here apart from maintenance. Almost certainly worth getting in.
Probably worth getting @bagder's input here because I don't know what the policies are surrounding oss-fuzzing and HackerOne bounties. I've personally recused myself from submitting faults in curl due to fuzzing discovered issues; if I didn't I'd certainly have a head on the competition by having access to the corpus.
In general I hand craft corpus entries myself if I know I'm trying to hit something specific; have you tried doing this? |
As per above - have you tried handcrafting a corpus file here? |
Thanks, that works fine with me!
I understand your position on this. Not much to do then. In case you have an old corpus lying around, that would help as well.
Yes, and while possible I prefer to have the fuzzer being able to detect new input for itself - if it can't find input easily, it will probably not find other interesting input easily. Which was shown by the doh response example, which still had not found the reply even after 14 days.
I might misremember since I have worked with X things in different directions, but I think I temporarily restricted the fuzzer to force use of doh and fuzzing only the response. This is a prime example of how setting a breakpoint in the debugger at the doh parse function makes it very nice to confirm that the fuzzer actually reaches there. |
And while polishing up the fuzz data, I discovered another issue I had not seen before: |
Thanks for this - sorry I haven't responded sooner. The autoformatter has made comparing the code a bit more tricky than expected; the current code is "curl"-style, so I don't know whether we want to continue to enforce that in this repository or not (I think we do; I should see if there's a clang-format that will enforce "curl"-style). Regardless, I think there's definitely a bunch of solid ideas here that can be worked on. From my perspective I think I'd like to see a starting series of PRs here:
Separately I'd like to see if these internal function fuzzers can be simply added within the current architecture. Actions: I think I may have some spare time this week on Wednesday evening to be able to look at the internal function fuzzers. I'd like to see whether it's possible to add them in a similar way to the existing fnmatch fuzzer (that is currently disabled, but should still compile). If so, getting these in ASAP as examples feels like a really good start. |
No worries, I have been busy myself!
Did you have a go at this? I guess the outcome of that influences decisions on going forward with cmake? I am stuck on curl/curl#4463. I just added a much smaller test case to make it easier to troubleshoot (I don't know how to fix it myself). I think it is good to have proxy fuzzing so I do not want to disable it if I can, but I suspect it does not fit very well into the current structure of the fuzzer since the number of connections may be exhausted, especially if doh is used. With the asio fuzzer, it should work easier but that has other problems (ad hoc structure of the input data since I wanted to try some ideas). |
Will follow up shortly here - have not been well recently so am only just getting back to tech things. |
Feeling well is more important than fuzzing curl. Hope you get better soon, take your time! |
Unsure to be honest. I have only used the perl script that does style-checking.
It'd be great if you could - it makes life a lot easier to reason about changes in small batches of well defined functional commits!
The curl-fuzzer repo. Ideally we'd have some way of compiling using CMakeLists if possible. Unfortunately this repository does share runners with Travis, so we'll still hit the problem of having too many runners in use.
Aiming to do this today, I know I've been slack on this. |
I've had a look at the UT fuzzing; it looks like the fnmatch code was done pretty hackily (copy a header file into a shared folder). What I might propose here is actually adding a feature to curl so that we expose an interface to a bunch of internal functions when a feature flag is passed at configuration time. This would require a (hopefully) simple bit of curl function, with a header file that gets installed to access all of these fuzzing interfaces. I'll propose it on IRC and see if @bagder is interested. |
Hi!
I have played around with the fuzzers and have several variations and new fuzzers on my private repo. I wanted to file an issue here to have a discussion if and how to upstream this. I also have some thoughts, so I figured I post everything here in case anyone want to comment.
To prove this is not just hot air, here are the issues I have found, all luckily pretty harmless:
Cmake and C++
To make debugging and development easy, I use C++ (it's my primary language) and cmake because it is easy to get an IDE (I use qtcreator) to navigate the code, debug and build it. It is easy to work with, since one can easily set a breakpoint in the debugger when trying to get the fuzzer to reach a certain place in the code. I also refactored the fuzzer to use RAII style so the cleanup is easier.
Using boost asio instead of cstyle select
I use asio to asynchronously wait for network events. Since I am used to it and it is the most well known way to network in C++, it is a good choice. I think I also managed to handle timeouts in a way such that the fuzzing speed is improved, but this is a tricky topic since it involves interaction between curl, asio and waiting for the OS.
I think it is a good foundation to build on, since it is possible to for instance interface openssl through asio, which would open up for fuzzing the contents of https instead of the encrypted layer as the current fuzzers do.
Internal fuzzers
The existing fuzzers use curl as just any libcurl user. I wanted to stress test some functions directly, so I put some fuzzers which access curl from within. I do this by adding an optional section inside a curl CMakefile, which includes files from the curl-fuzzer repo. This will uglify curl, but it is by default off and placed within
if (ENABLE_FUZZING)
sections.The benefit is that these fuzz targets can be very focused, and do not need to go through being invented by the TLV fuzzer and sent over the socket. Not all functions can be reached by the existing fuzzers.
I add internal fuzzers for
I think it may be a good idea to fuzz setting curloptions here, so one does not have to let that happen only through the existing fuzzers.
Fuzzing a single function like this, for instance the doh parser, is what libFuzzer is excellent at. It will not be limited by network timeouts or tearing up/down sockets.
These internal fuzzers cover almost everything within a few hours, so these are nice to have, but could perhaps run through a CI build instead of wasting slots at oss fuzz (unless they schedule fuzzing effort to avoid it, I do not know).
Structure aware fuzzing
The fuzz data uses TLV (type length value) which means the default mutation strategy will very likely break the content. Most of the input the fuzzer engine generates is most likely garbage, and has to be rejected while unpacking the TLV during setting curl options.
This is inefficient, but it also makes it difficult for the fuzzing engine to make meaningful input since both the type, content and alignment with other blocks have to match.
So I wrote a custom mutator, which parses the TLV while throwing away parts that don't make sense. With a list of blocks, it can now apply mutation on a single block, then serializing it again. It also implements the custom crossover, for mixing two test cases.
This works, but I have a hard time evaluating it since it takes a very long time to build up the corpus for the existing fuzzers. It sure finds new paths when starting from empty, so I figure if it can do that, it will also be good at exposing cases not yet known.
A proposal
I think it is good to discuss before sending pull requests. Here is my suggestion:
What do I want help with now
I would like to get either access to the corpus from oss fuzz or a copy of it, so I can see which parts of curl are not fuzzed and evaluate the new fuzzers I write. I think the link is this one, based on another project I work with. It needs login through one of the admins: https://console.cloud.google.com/storage/browser/curl-backup.clusterfuzz-external.appspot.com
Thanks,
Paul
The text was updated successfully, but these errors were encountered: