make literal collection more precise #247

josharian · 2019-05-19T18:41:30Z

I have work in progress improving literal collection. This issue is to discuss design decisions in advance of sending PRs.

The current design converts int literals to strings during go-fuzz-build. I'd like to change that, so that the metadata json contains strings and ints, and do the int-to-string conversion lazily on the go-fuzz side. This gives us flexibility about encodings (little-endian, big-endian, varint, ascii, hex) without having to decode and re-encode. Step one would be no behavioral changes but simply moving the conversion. Thoughts or concerns?
The current design encodes ints in the smallest number of bytes possible. Thus a uint64 with value 1 gets encoded as a uint8. Now that we use go/packages, we have type information available, so we could encode that 1 as a uint64. Is that preferable? It might mean having multiple 1s of various widths, but it might also increase the chance of matching the underlying structure of the program. It would also mean having to track more precise type in the metadata.

That's a start. I may add questions as I work on the PRs.

cc @dvyukov

The text was updated successfully, but these errors were encountered:

dvyukov · 2019-05-20T08:47:11Z

The current design converts int literals to strings during go-fuzz-build. I'd like to change that, so that the metadata json contains strings and ints, and do the int-to-string conversion lazily on the go-fuzz side.

No concerns.
But we should mostly ignore declared literal type I think. This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

dvyukov · 2019-05-20T08:49:50Z

The current design encodes ints in the smallest number of bytes possible.

Why can it increase chances of matching the underlying structure of the program? I think we should ignore the exact type in the program. This means that if we have, say int16(42), we should consider we actually have all of int64(42), int32(42), int16(42) and int8(42). It means there is little point in storing more than 1 version of 42 in the file. What am I missing?

josharian · 2019-05-20T18:42:22Z

This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

I don't understand what this means. Can you expand or give an example?

josharian · 2019-05-20T18:52:37Z

This means that if we have, say int16(42), we should consider we actually have all of int64(42), int32(42), int16(42) and int8(42).

Sounds good to me. This significantly increases the number of literals, but that's ok.

dvyukov · 2019-05-21T11:32:47Z

Sounds good to me. This significantly increases the number of literals, but that's ok.

I think we should not put them all into the file. There is no point. We should just apply transformations at runtime as if they all are there.

dvyukov · 2019-05-21T11:34:20Z

This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

I don't understand what this means. Can you expand or give an example?

I mean that the set of transformations we apply to a literal should not depend on the spelled type of the literal. So int8(42), int64(42) and "42" should be transformed the same say.

dvyukov · 2019-05-21T11:38:11Z

I am not sure if this literal collection is a good idea at all.
The alternative would be to extract constants from comparison operations at runtime. And this way we (1) extract only the ones that are actually used (rather then thousands of uninteresting literals that just happen to be in some dependencies, or even if they are relevant may be we have not yet get to the part of the program that uses them); (2) is may simplify integration with some build systems, in some contexts; this .zip artifact is a bit weird; if we have just a binary, it would be much more normal output of a build system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make literal collection more precise #247

make literal collection more precise #247

josharian commented May 19, 2019

dvyukov commented May 20, 2019

dvyukov commented May 20, 2019

josharian commented May 20, 2019

josharian commented May 20, 2019

dvyukov commented May 21, 2019

dvyukov commented May 21, 2019

dvyukov commented May 21, 2019

make literal collection more precise #247

make literal collection more precise #247

Comments

josharian commented May 19, 2019

dvyukov commented May 20, 2019

dvyukov commented May 20, 2019

josharian commented May 20, 2019

josharian commented May 20, 2019

dvyukov commented May 21, 2019

dvyukov commented May 21, 2019

dvyukov commented May 21, 2019