-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code fastpath scanning for valid jump destinations #348
Conversation
saves 5% on the maze benchmark
…implify saves 1.2s on the maze benchmark
no effect on benchmark
|
shaves another second from the maze bench (52.56s -> 51.67s)
profiling showed that the last remaining major cost in jumpi_id was actually the cost of unbox_int This improves the maze bench by another 1.5s (51.67s -> 50.10s), about 3% faster
@@ -297,9 +297,6 @@ def unbox_int(x: Any) -> Any: | |||
if is_bv_value(x): | |||
return x.as_long() | |||
|
|||
if is_bv(x): | |||
x = simplify(x) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine if the result of unbox_int is immediately pushed to the stack. but, there are cases where a simplified result is needed, e.g.,:
Line 760 in c3f45dd
and unbox_int(ex.context.output.data) == ASSERT_FAIL Line 704 in c3f45dd
return unbox_int(data) get_word()
is better to be simplified, as it is used in many cheatcodes, and passed through further logic.
simplifying symbolic terms into their normalized form is leveraged by several other routines, so passing around unsimplified terms may cause unexpected issues, which may be difficult to debug later. this also increases code maintenance cost, while we still need further large iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's weird for unbox_int
to try to convert to an int, but also simplify if it's not.
I think simplify should happen at well defined boundaries, like:
- in and out of the stack
- in and out of memory
- not in an intermediate place like unbox_int
unbox_int(ex.context.output.data) == ASSERT_FAIL
☝️ this is not a great use of unbox_int, if the result is a bv we produce a new cond rather than evaluating a bool. In this case we probably want to call simplify first, then try to unbox, and skip if the unboxed output is still a bv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the result of get_word() is better to be simplified, as it is used in many cheatcodes, and passed through further logic
I agree, that's why the output of unwrap
is simplified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we probably want to call simplify first, then try to unbox, and skip if the unboxed output is still a bv
i agree. could you please fix that as you suggested? btw, for the long term, we'd need to generalize this condition to cover arbitrary Panic(k)
where k
is given by users.
I agree, that's why the output of unwrap is simplified
thanks for clarification, now i see that it's already simplified before being passed into unbox_int
--
i agree that it would make more sense to not simplify inside unbox_int. then, let me double-check other places to make sure:
-
Lines 727 to 728 in c3f45dd
def current_opcode(self) -> UnionType[int, BitVecRef]: return unbox_int(self.pgm[self.pc]) self.pgm[self.pc]
is already simplified, right? -
Lines 265 to 269 in c3f45dd
def extract_funsig(calldata: Bytes) -> Any: """Extracts the function signature (first 4 bytes) from calldata""" if hasattr(calldata, "__getitem__"): return unbox_int(calldata[:4]) return extract_bytes(calldata, 0, 4) calldata[:4]
may not be simplified, right? for now, it's fine because the extract_funsig() result is immediately passed toint_of
, which would revert anyway if it's a bv. (there is another use of extract_funsig(), but it will be removed anyway in feat: branching over symbolic call addresses #349, so it doesn't matter.) however, for the longer term, should we simplifycalldata[:4]
here? wdty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please fix that as you suggested?
yea 👍
self.pgm[self.pc] is already simplified, right?
it defers to ByteVec.get_byte()
, which calls chunk.get_byte()
- for a concrete chunk, we get an int
- for a symbolic chunk, it relies on
extract_bytes
, which does return simplified bitvecs
so rn yes, but it's not strongly part of the contract (it just so happens that extract_bytes does simplify)
should we simplify calldata[:4] here?
yes, instead of "trying" to unbox it, returning calldata[:4].unwrap()
would do it (return either a concrete value or a simplified bv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please fix that as you suggested?
more robust version in af3df48 and test in afd5c11. I confirmed that unbox_int(...) == ASSERT_FAIL
does produce a BoolRef of the form Concat(1313373041, p_x_uint256) == 152078208365357342262005707660225848957176981554335715805457651098985835139029979365377
, which for some reason evaluates to False in the condition (instead of generating an exception as I was expecting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we simplify calldata[:4] here?
yes, instead of "trying" to unbox it, returning calldata[:4].unwrap() would do it (return either a concrete value or a simplified bv)
great, let's do it! actually, i realized that unbox_int() calls unwrap() inside, so it's already simplified there. also, if i understand correctly, unwrap() may return a bv_value, so calldata[:4].unwrap()
may not be fully equivalent to unbox_int(calldata[:4])
. while it might be fine for extract_funsig() to return a bv_value, i'd like to just leave it as is.
|
||
# bytevec equality check, will take care of length check, bv vs symbolic, etc. | ||
return returndata == expected | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! but why do we need to compare only the prefix of data, not the full data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equality check fails if length is different, do you think that's closer to what we want?
jumpi_id()
keeps showing up when I profile tests, because it causes us to eagerly computevalid_jump_destinations()
Fundamentally,
valid_jump_destinations()
does a linear scan of the whole code for a contract. Even though ByteVec avoids z3 operations for this, it's still not ideal to do 10k+ individual byte lookups in a bytevec for almost contiguous data.The insight here is that it is quite common to process contracts that have a large concrete prefix, sometimes followed by some symbolic data. In this case, we extract the concrete prefix in
_fastcode
and scan that first for valid jump destinations.With this change, the regression tests run about 15% faster for me (44s to 37s)