Code fastpath scanning for valid jump destinations #348

karmacoma-eth · 2024-08-16T00:06:32Z

jumpi_id() keeps showing up when I profile tests, because it causes us to eagerly compute valid_jump_destinations()

Fundamentally, valid_jump_destinations() does a linear scan of the whole code for a contract. Even though ByteVec avoids z3 operations for this, it's still not ideal to do 10k+ individual byte lookups in a bytevec for almost contiguous data.

The insight here is that it is quite common to process contracts that have a large concrete prefix, sometimes followed by some symbolic data. In this case, we extract the concrete prefix in _fastcode and scan that first for valid jump destinations.

With this change, the regression tests run about 15% faster for me (44s to 37s)

saves 5% on the maze benchmark

…implify saves 1.2s on the maze benchmark

no effect on benchmark

src/halmos/sevm.py

karmacoma-eth · 2024-08-19T22:46:38Z

main baseline for my maze benchmark: 62.89s
new time: ~~52.56s (-16.4%)~~ 50.10s (-21%)

shaves another second from the maze bench (52.56s -> 51.67s)

profiling showed that the last remaining major cost in jumpi_id was actually the cost of unbox_int This improves the maze bench by another 1.5s (51.67s -> 50.10s), about 3% faster

valid_jumpdests are enumated when symbolic jumps are enabled, so let's separate it into an int set and a string set

daejunpark · 2024-08-21T01:13:53Z

src/halmos/utils.py

@@ -297,9 +297,6 @@ def unbox_int(x: Any) -> Any:
    if is_bv_value(x):
        return x.as_long()

-    if is_bv(x):
-        x = simplify(x)
-


this is fine if the result of unbox_int is immediately pushed to the stack. but, there are cases where a simplified result is needed, e.g.,:

halmos/src/halmos/__main__.py

Line 760 in c3f45dd

and unbox_int(ex.context.output.data) == ASSERT_FAIL

the structural equality test may fail against unsimplified results, while it would pass with simplified ones.

halmos/src/halmos/bytevec.py

Line 704 in c3f45dd

return unbox_int(data)

the result of get_word() is better to be simplified, as it is used in many cheatcodes, and passed through further logic.

simplifying symbolic terms into their normalized form is leveraged by several other routines, so passing around unsimplified terms may cause unexpected issues, which may be difficult to debug later. this also increases code maintenance cost, while we still need further large iterations.

I think it's weird for unbox_int to try to convert to an int, but also simplify if it's not.

I think simplify should happen at well defined boundaries, like:

in and out of the stack

in and out of memory

not in an intermediate place like unbox_int

unbox_int(ex.context.output.data) == ASSERT_FAIL

☝️ this is not a great use of unbox_int, if the result is a bv we produce a new cond rather than evaluating a bool. In this case we probably want to call simplify first, then try to unbox, and skip if the unboxed output is still a bv

the result of get_word() is better to be simplified, as it is used in many cheatcodes, and passed through further logic

I agree, that's why the output of unwrap is simplified

In this case we probably want to call simplify first, then try to unbox, and skip if the unboxed output is still a bv

i agree. could you please fix that as you suggested? btw, for the long term, we'd need to generalize this condition to cover arbitrary Panic(k) where k is given by users.

I agree, that's why the output of unwrap is simplified

thanks for clarification, now i see that it's already simplified before being passed into unbox_int

--

i agree that it would make more sense to not simplify inside unbox_int. then, let me double-check other places to make sure:

halmos/src/halmos/sevm.py

Lines 727 to 728 in c3f45dd

def current_opcode(self) -> UnionType[int, BitVecRef]:

return unbox_int(self.pgm[self.pc])

this is fine because self.pgm[self.pc] is already simplified, right?

halmos/src/halmos/utils.py

Lines 265 to 269 in c3f45dd

def extract_funsig(calldata: Bytes) -> Any:

"""Extracts the function signature (first 4 bytes) from calldata"""

if hasattr(calldata, "__getitem__"):

return unbox_int(calldata[:4])

return extract_bytes(calldata, 0, 4)

if i understand correctly, calldata[:4] may not be simplified, right? for now, it's fine because the extract_funsig() result is immediately passed to int_of, which would revert anyway if it's a bv. (there is another use of extract_funsig(), but it will be removed anyway in feat: branching over symbolic call addresses #349, so it doesn't matter.) however, for the longer term, should we simplify calldata[:4] here? wdty?

could you please fix that as you suggested?

yea 👍

self.pgm[self.pc] is already simplified, right?

it defers to ByteVec.get_byte(), which calls chunk.get_byte()

for a concrete chunk, we get an int

for a symbolic chunk, it relies on extract_bytes, which does return simplified bitvecs

so rn yes, but it's not strongly part of the contract (it just so happens that extract_bytes does simplify)

should we simplify calldata[:4] here?

yes, instead of "trying" to unbox it, returning calldata[:4].unwrap() would do it (return either a concrete value or a simplified bv)

could you please fix that as you suggested?

more robust version in af3df48 and test in afd5c11. I confirmed that unbox_int(...) == ASSERT_FAIL does produce a BoolRef of the form Concat(1313373041, p_x_uint256) == 152078208365357342262005707660225848957176981554335715805457651098985835139029979365377, which for some reason evaluates to False in the condition (instead of generating an exception as I was expecting).

should we simplify calldata[:4] here?

yes, instead of "trying" to unbox it, returning calldata[:4].unwrap() would do it (return either a concrete value or a simplified bv)

~~great, let's do it!~~ actually, i realized that unbox_int() calls unwrap() inside, so it's already simplified there. also, if i understand correctly, unwrap() may return a bv_value, so calldata[:4].unwrap() may not be fully equivalent to unbox_int(calldata[:4]). while it might be fine for extract_funsig() to return a bv_value, i'd like to just leave it as is.

src/halmos/sevm.py

daejunpark · 2024-08-22T04:17:03Z

src/halmos/sevm.py

+
+        # bytevec equality check, will take care of length check, bv vs symbolic, etc.
+        return returndata == expected
+


nice! but why do we need to compare only the prefix of data, not the full data?

equality check fails if length is different, do you think that's closer to what we want?

karmacoma-eth added 5 commits August 15, 2024 16:58

WIP

523a7b3

delete unused definitions

981fb45

add optional dependencies for benchmarking

fddfc1c

refactor fast path and insn_len

8c80246

Contract instances are immutable, avoid copies

ca12f87

karmacoma-eth changed the title ~~Code fastpath rebased~~ Code fastpath scanning for valid jump destinations Aug 16, 2024

karmacoma-eth marked this pull request as draft August 16, 2024 00:19

karmacoma-eth added 7 commits August 15, 2024 17:25

cleanup

a7ff795

fix tests (since we removed contract iteration)

fca8066

unused import

f891e82

DUPn: avoid resimplifying stack elements

85d963f

saves 5% on the maze benchmark

stack push: fast path for concrete values, no need to check size of s…

8316f62

…implify saves 1.2s on the maze benchmark

use constant ZERO instead of con(0)

dc85678

no effect on benchmark

avoid str(cond) pattern

b29b766

karmacoma-eth commented Aug 19, 2024

View reviewed changes

src/halmos/sevm.py Show resolved Hide resolved

simplify PUSHn execution

c878a86

karmacoma-eth marked this pull request as ready for review August 19, 2024 22:22

karmacoma-eth requested a review from daejunpark August 19, 2024 22:22

karmacoma-eth commented Aug 19, 2024

View reviewed changes

src/halmos/sevm.py Show resolved Hide resolved

karmacoma-eth commented Aug 19, 2024

View reviewed changes

src/halmos/sevm.py Show resolved Hide resolved

karmacoma-eth commented Aug 19, 2024

View reviewed changes

src/halmos/sevm.py Show resolved Hide resolved

karmacoma-eth added 8 commits August 19, 2024 16:07

remove spurious simplify call from unbox_int

4079178

shaves another second from the maze bench (52.56s -> 51.67s)

perf: new method to compute jumpi_id

616a8c4

profiling showed that the last remaining major cost in jumpi_id was actually the cost of unbox_int This improves the maze bench by another 1.5s (51.67s -> 50.10s), about 3% faster

fix tests

a292df6

valid_jumpdests are enumated when symbolic jumps are enabled, so let's separate it into an int set and a string set

better type annotations

de8d7c2

fix type syntax for 3.11

20ea62a

fix HalmosLogs.bounded_loops

b279253

fix more uses of HalmosLogs.bounded_loops

6ebdaaa

consistency: con(1) -> ONE

61343e2

daejunpark reviewed Aug 21, 2024

View reviewed changes

karmacoma-eth added 3 commits August 21, 2024 14:54

clean up types a bit

6dd088d

fix revert condition check

af3df48

add symbolic revert test

afd5c11

daejunpark reviewed Aug 22, 2024

View reviewed changes

daejunpark approved these changes Aug 22, 2024

View reviewed changes

daejunpark merged commit 536299a into main Aug 22, 2024
57 checks passed

daejunpark deleted the code-fastpath-rebased branch August 22, 2024 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code fastpath scanning for valid jump destinations #348

Code fastpath scanning for valid jump destinations #348

karmacoma-eth commented Aug 16, 2024

karmacoma-eth commented Aug 19, 2024 •

edited

Loading

daejunpark Aug 21, 2024

karmacoma-eth Aug 21, 2024

karmacoma-eth Aug 21, 2024

daejunpark Aug 21, 2024 •

edited

Loading

karmacoma-eth Aug 21, 2024

karmacoma-eth Aug 21, 2024 •

edited

Loading

daejunpark Aug 22, 2024 •

edited

Loading

daejunpark Aug 22, 2024

karmacoma-eth Aug 22, 2024

	def current_opcode(self) -> UnionType[int, BitVecRef]:
	return unbox_int(self.pgm[self.pc])

	def extract_funsig(calldata: Bytes) -> Any:
	"""Extracts the function signature (first 4 bytes) from calldata"""
	if hasattr(calldata, "__getitem__"):
	return unbox_int(calldata[:4])
	return extract_bytes(calldata, 0, 4)


		# bytevec equality check, will take care of length check, bv vs symbolic, etc.
		return returndata == expected

Code fastpath scanning for valid jump destinations #348

Code fastpath scanning for valid jump destinations #348

Conversation

karmacoma-eth commented Aug 16, 2024

karmacoma-eth commented Aug 19, 2024 • edited Loading

daejunpark Aug 21, 2024

Choose a reason for hiding this comment

karmacoma-eth Aug 21, 2024

Choose a reason for hiding this comment

karmacoma-eth Aug 21, 2024

Choose a reason for hiding this comment

daejunpark Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

karmacoma-eth Aug 21, 2024

Choose a reason for hiding this comment

karmacoma-eth Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

daejunpark Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

daejunpark Aug 22, 2024

Choose a reason for hiding this comment

karmacoma-eth Aug 22, 2024

Choose a reason for hiding this comment

karmacoma-eth commented Aug 19, 2024 •

edited

Loading

daejunpark Aug 21, 2024 •

edited

Loading

karmacoma-eth Aug 21, 2024 •

edited

Loading

daejunpark Aug 22, 2024 •

edited

Loading