Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up instrumentation #25

Open
jwilk opened this issue Aug 24, 2022 · 3 comments
Open

Speeding up instrumentation #25

jwilk opened this issue Aug 24, 2022 · 3 comments

Comments

@jwilk
Copy link
Owner

jwilk commented Aug 24, 2022

As indicated in README, the instrumentation is slow at the moment.

Here are some rough ideas how to speed it up:

  • Replace sys.settrace() with lower-level PyEval_SetTrace.

  • Rewrite bytecode to inject instrumentation. (Perhaps use the bytecode module?)

  • Rewrite AST to inject instrumentation. (See how it's done in pytest.)

(I don't plan to work on any of these, unless there's funding for the work.)

@tovmeod
Copy link

tovmeod commented Feb 23, 2023

From what I understand the trace function determines which files are covered, and the current implementation just track coverage for all python code, including installed libraries and the standard library.
I believe it could skip by default the libraries (or at least be configurable) or maybe be possible to configure which modules should be covered.
How should this be defined? from a environment variable, defining the path prefix of the module?
MODULE_TO_COVER='/home/user/projectsrc/myapp'
and then in the trace function it should have:
if filename.startswith(os.environ["MODULE_TO_COVER"]): return None

With that I think it should speed up the run, it should give afl a smaller attack area.

@jwilk
Copy link
Owner Author

jwilk commented Feb 24, 2023

From what I understand the trace function determines which files are covered, and the current implementation just track coverage for all python code, including installed libraries and the standard library.

This is correct.

I believe it could skip by default the libraries (or at least be configurable) or maybe be possible to configure which modules should be covered.

There's a TODO for this:

    # TODO: make it configurable which modules are instrumented, and which are not

But I'm afraid that the cost of the extra check could ealisy exceed the savings from skipping instrumentation.

@tovmeod
Copy link

tovmeod commented Mar 7, 2023

I'm making some tests with a sample project, this is basically what I changed on the trace function:

     if _module_path is not None:
        if filename.startswith("."):
            filename = filename[1:]
        elif filename.startswith(_module_path):
            filename = filename[len(_module_path):]
        elif filename[-9:] != "fuzzer.py":
            pass
        else:
            return trace

where module_path is a global variable, the value is passed to _init, it is expected to be something like: "/home/user/projroot"

note that I remove the prefix from the filename, I noticed that sometimes the filename uses the full path and sometimes ./, meaning the traces would be different, I didn't really debug this to understand why or when.

"fuzzer.py" is the fuzzer.py file passed to py-afl-fuzz, I don't really care about the coverage for the wrapper, but if it is not traced then afl thinks the binary has no instrumentation.

I'm getting exec speeds of up to ~5k, but the stability is very low (less than 5%), and it says "no new instrumentation output" for a lot of the initial seed corpus. Maybe I'm doing something wrong here.

I also changed the trace to something more naive:
afl_area[location] += 1
hoping it would get some interesting inputs faster and maybe use it as input for a run with the regular trace.
It does improve coverage faster but still only get 30 favored items and 34 new edges after 3.5M execs.

The project has ~90k LOC, so I'm thinking I should increase the map size, I see python-afl uses a 32 bit uint (a lot less then 90k).
I couldn't find what's the default map size for afl or how to set it's size.
I also see I should use a 64 bit hash function.

From what I understand afl expects to map blocks of code, not each line, so could we use a deterministic way to map each filename:lineno instead of hashing and truncating the hash?

Maybe I'm thinking all this wrong, I'm currently fuzzing the whole project, should I be fuzzing each function separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants