Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve batch copy performance #3483

Merged
merged 52 commits into from
Jul 15, 2023
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
56ae5f8
feat: use mcopy for copy bytes
charles-cooper Jun 8, 2023
75ee17a
add mcopy optimization to ir optimizer
charles-cooper Jun 8, 2023
ffadff9
fix lint
charles-cooper Jun 8, 2023
a4d1515
update test_opcodes
charles-cooper Jun 8, 2023
d82970a
fix versioning for MCOPY opcode
charles-cooper Jun 8, 2023
fdbffa2
Merge branch 'master' into feat/mcopy
charles-cooper Jun 8, 2023
393f2a6
remove `-v` from era tester
charles-cooper Jun 8, 2023
6e4b221
remove dead note
charles-cooper Jun 8, 2023
d31a2e8
fix typo
charles-cooper Jun 8, 2023
ee2a57a
fix tload/tstore availability
charles-cooper Jul 6, 2023
719419c
fix abi decoder
charles-cooper Jul 6, 2023
1fbe59c
add load/mstore optimization for dload
charles-cooper Jul 6, 2023
8339d41
don't always use copy_bytes
charles-cooper Jul 6, 2023
8877e6c
fix lint
charles-cooper Jul 6, 2023
a1b97fa
don't use copy_bytes with storage
charles-cooper Jul 6, 2023
a988712
fix lint
charles-cooper Jul 6, 2023
8ed424b
feat: add optimization flag to vyper compiler
charles-cooper Jul 7, 2023
017a19f
fix lint
charles-cooper Jul 7, 2023
37d7e64
fix typo
charles-cooper Jul 7, 2023
df8d642
fix some tests
charles-cooper Jul 7, 2023
0640d7e
source code pragma for compiler modes
charles-cooper Jul 7, 2023
2c7d696
fix mypy and some lint
charles-cooper Jul 10, 2023
be5f36a
fix tests
charles-cooper Jul 10, 2023
d2be8f5
remove evm_version from bitwise op tests
charles-cooper Jul 10, 2023
524c50f
fix lint
charles-cooper Jul 10, 2023
a6caacf
relax a test
charles-cooper Jul 10, 2023
6084de4
raise instead of warning
charles-cooper Jul 10, 2023
3091ae3
fix lint
charles-cooper Jul 10, 2023
3cb1d5c
update mypy
charles-cooper Jul 10, 2023
6658dae
use `OptimizationLevel.default()` in some places
charles-cooper Jul 10, 2023
48c2611
fix no-optimize tests
charles-cooper Jul 10, 2023
1d3dc48
fix a comment
charles-cooper Jul 10, 2023
214274d
add some tests for new pragma directives
charles-cooper Jul 10, 2023
a102aca
fix test_grammar.py
charles-cooper Jul 11, 2023
01910df
update docs
charles-cooper Jul 11, 2023
38951d3
update docs
charles-cooper Jul 11, 2023
a3bc3c2
docs: formatting
charles-cooper Jul 11, 2023
11e678b
fix lint
charles-cooper Jul 11, 2023
49006f6
Merge branch 'feat/optimize-codesize' into feat/mcopy
charles-cooper Jul 11, 2023
dc261e1
improve batch copy heuristic depending on opt mode
charles-cooper Jul 11, 2023
2ff2a25
Merge branch 'master' into feat/mcopy
charles-cooper Jul 11, 2023
90a65fc
move slice tests from `fuzzing` to `not fuzzing` to ensure testing
charles-cooper Jul 11, 2023
cbd2aed
reduce num examples
charles-cooper Jul 11, 2023
9065689
fix a slice test
charles-cooper Jul 11, 2023
6e70dad
improve cost function for storage batch copy
charles-cooper Jul 14, 2023
e40da91
use property instead of cached_property
charles-cooper Jul 14, 2023
b383e41
improve cost estimate for pre-cancun memory copies
charles-cooper Jul 14, 2023
6ac32d7
add --optimize "none"
charles-cooper Jul 14, 2023
87fb202
add comments on costing
charles-cooper Jul 14, 2023
ea2cbb1
update a comment
charles-cooper Jul 14, 2023
bfa53aa
fix fuzzer test
charles-cooper Jul 14, 2023
1c64f2c
add a sanity check
charles-cooper Jul 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/era-tester.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ jobs:
if: ${{ github.ref != 'refs/heads/master' }}
run: |
cd era-compiler-tester
cargo run --release --bin compiler-tester -- -v --path=tests/vyper/ --mode="M0B0 ${{ env.VYPER_VERSION }}"
cargo run --release --bin compiler-tester -- --path=tests/vyper/ --mode="M0B0 ${{ env.VYPER_VERSION }}"

- name: Run tester (slow)
# Run era tester across the LLVM optimization matrix
if: ${{ github.ref == 'refs/heads/master' }}
run: |
cd era-compiler-tester
cargo run --release --bin compiler-tester -- -v --path=tests/vyper/ --mode="M*B* ${{ env.VYPER_VERSION }}"
cargo run --release --bin compiler-tester -- --path=tests/vyper/ --mode="M*B* ${{ env.VYPER_VERSION }}"
1 change: 0 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ addopts = -n auto
--cov-report html
--cov-report xml
--cov=vyper
--hypothesis-show-statistics
python_files = test_*.py
testpaths = tests
markers =
Expand Down
7 changes: 5 additions & 2 deletions tests/compiler/test_opcodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,8 @@ def test_get_opcodes(evm_version):
assert "PUSH0" in ops

if evm_version in ("cancun",):
assert "TLOAD" in ops
assert "TSTORE" in ops
for op in ("TLOAD", "TSTORE", "MCOPY"):
charles-cooper marked this conversation as resolved.
Show resolved Hide resolved
assert op in ops
else:
for op in ("TLOAD", "TSTORE", "MCOPY"):
charles-cooper marked this conversation as resolved.
Show resolved Hide resolved
assert op not in ops
82 changes: 44 additions & 38 deletions tests/parser/functions/test_slice.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import hypothesis.strategies as st
import pytest
from hypothesis import given, settings

from vyper.exceptions import ArgumentException

Expand All @@ -9,14 +11,6 @@ def _generate_bytes(length):
return bytes(list(range(length)))


# good numbers to try
_fun_numbers = [0, 1, 5, 31, 32, 33, 64, 99, 100, 101]


# [b"", b"\x01", b"\x02"...]
_bytes_examples = [_generate_bytes(i) for i in _fun_numbers if i <= 100]


def test_basic_slice(get_contract_with_gas_estimation):
code = """
@external
Expand All @@ -31,12 +25,16 @@ def slice_tower_test(inp1: Bytes[50]) -> Bytes[50]:
assert x == b"klmnopqrst", x


@pytest.mark.parametrize("bytesdata", _bytes_examples)
@pytest.mark.parametrize("start", _fun_numbers)
# note: optimization boundaries at 32, 64 and 320 depending on mode
_draw_1024 = st.integers(min_value=0, max_value=1024)
_draw_1024_1 = st.integers(min_value=1, max_value=1024)
_bytes_1024 = st.binary(min_size=0, max_size=1024)


@pytest.mark.parametrize("literal_start", (True, False))
@pytest.mark.parametrize("length", _fun_numbers)
@pytest.mark.parametrize("literal_length", (True, False))
@pytest.mark.fuzzing
@given(start=_draw_1024, length=_draw_1024, length_bound=_draw_1024_1, bytesdata=_bytes_1024)
@settings(max_examples=25, deadline=None)
def test_slice_immutable(
get_contract,
assert_compile_failed,
Expand All @@ -46,47 +44,48 @@ def test_slice_immutable(
literal_start,
length,
literal_length,
length_bound,
):
_start = start if literal_start else "start"
_length = length if literal_length else "length"

code = f"""
IMMUTABLE_BYTES: immutable(Bytes[100])
IMMUTABLE_SLICE: immutable(Bytes[100])
IMMUTABLE_BYTES: immutable(Bytes[{length_bound}])
IMMUTABLE_SLICE: immutable(Bytes[{length_bound}])

@external
def __init__(inp: Bytes[100], start: uint256, length: uint256):
def __init__(inp: Bytes[{length_bound}], start: uint256, length: uint256):
IMMUTABLE_BYTES = inp
IMMUTABLE_SLICE = slice(IMMUTABLE_BYTES, {_start}, {_length})

@external
def do_splice() -> Bytes[100]:
def do_splice() -> Bytes[{length_bound}]:
return IMMUTABLE_SLICE
"""

def _get_contract():
return get_contract(code, bytesdata, start, length)

if (
(start + length > 100 and literal_start and literal_length)
or (literal_length and length > 100)
or (literal_start and start > 100)
(start + length > length_bound and literal_start and literal_length)
or (literal_length and length > length_bound)
or (literal_start and start > length_bound)
or (literal_length and length < 1)
):
assert_compile_failed(
lambda: get_contract(code, bytesdata, start, length), ArgumentException
)
elif start + length > len(bytesdata):
assert_tx_failed(lambda: get_contract(code, bytesdata, start, length))
assert_compile_failed(lambda: _get_contract(), ArgumentException)
elif start + length > len(bytesdata) or (len(bytesdata) > length_bound):
# deploy fail
assert_tx_failed(lambda: _get_contract())
else:
c = get_contract(code, bytesdata, start, length)
c = _get_contract()
assert c.do_splice() == bytesdata[start : start + length]


@pytest.mark.parametrize("location", ("storage", "calldata", "memory", "literal", "code"))
@pytest.mark.parametrize("bytesdata", _bytes_examples)
@pytest.mark.parametrize("start", _fun_numbers)
@pytest.mark.parametrize("literal_start", (True, False))
@pytest.mark.parametrize("length", _fun_numbers)
@pytest.mark.parametrize("literal_length", (True, False))
@pytest.mark.fuzzing
@given(start=_draw_1024, length=_draw_1024, length_bound=_draw_1024_1, bytesdata=_bytes_1024)
@settings(max_examples=25, deadline=None)
def test_slice_bytes(
get_contract,
assert_compile_failed,
Expand All @@ -97,9 +96,10 @@ def test_slice_bytes(
literal_start,
length,
literal_length,
length_bound,
):
if location == "memory":
spliced_code = "foo: Bytes[100] = inp"
spliced_code = f"foo: Bytes[{length_bound}] = inp"
foo = "foo"
elif location == "storage":
spliced_code = "self.foo = inp"
Expand All @@ -120,31 +120,37 @@ def test_slice_bytes(
_length = length if literal_length else "length"

code = f"""
foo: Bytes[100]
IMMUTABLE_BYTES: immutable(Bytes[100])
foo: Bytes[{length_bound}]
IMMUTABLE_BYTES: immutable(Bytes[{length_bound}])
@external
def __init__(foo: Bytes[100]):
def __init__(foo: Bytes[{length_bound}]):
IMMUTABLE_BYTES = foo

@external
def do_slice(inp: Bytes[100], start: uint256, length: uint256) -> Bytes[100]:
def do_slice(inp: Bytes[{length_bound}], start: uint256, length: uint256) -> Bytes[{length_bound}]:
{spliced_code}
return slice({foo}, {_start}, {_length})
"""

length_bound = len(bytesdata) if location == "literal" else 100
def _get_contract():
return get_contract(code, bytesdata)

length_bound = len(bytesdata) if location == "literal" else length_bound
if (
(start + length > length_bound and literal_start and literal_length)
or (literal_length and length > length_bound)
or (literal_start and start > length_bound)
or (literal_length and length < 1)
):
assert_compile_failed(lambda: get_contract(code, bytesdata), ArgumentException)
assert_compile_failed(lambda: _get_contract(), ArgumentException)
elif len(bytesdata) > length_bound:
# deploy fail
assert_tx_failed(lambda: _get_contract())
elif start + length > len(bytesdata):
c = get_contract(code, bytesdata)
c = _get_contract()
assert_tx_failed(lambda: c.do_slice(bytesdata, start, length))
else:
c = get_contract(code, bytesdata)
c = _get_contract()
assert c.do_slice(bytesdata, start, length) == bytesdata[start : start + length], code


Expand Down
12 changes: 3 additions & 9 deletions tests/parser/types/test_dynamic_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import pytest

from vyper.compiler.settings import OptimizationLevel
from vyper.exceptions import (
ArgumentException,
ArrayIndexException,
Expand Down Expand Up @@ -1585,14 +1584,9 @@ def bar2() -> uint256:
newFoo.b1[1][0][0].a1[0][1][1] + \\
newFoo.b1[0][1][0].a1[0][0][0]
"""

if optimize == OptimizationLevel.NONE:
# fails at assembly stage with too many stack variables
assert_compile_failed(lambda: get_contract(code), Exception)
else:
c = get_contract(code)
assert c.bar() == [[[3, 7], [7, 3]], [[7, 3], [0, 0]]]
assert c.bar2() == 0
c = get_contract(code)
assert c.bar() == [[[3, 7], [7, 3]], [[7, 3], [0, 0]]]
assert c.bar2() == 0


def test_tuple_of_lists(get_contract):
Expand Down
100 changes: 84 additions & 16 deletions vyper/codegen/core.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
import contextlib
from typing import Generator

from vyper import ast as vy_ast
from vyper.codegen.ir_node import Encoding, IRnode
from vyper.compiler.settings import OptimizationLevel
from vyper.evm.address_space import CALLDATA, DATA, IMMUTABLES, MEMORY, STORAGE, TRANSIENT
from vyper.evm.opcodes import version_check
from vyper.exceptions import CompilerPanic, StructureException, TypeCheckFailure, TypeMismatch
from vyper.semantics.types import (
AddressT,
Expand All @@ -19,13 +24,7 @@
from vyper.semantics.types.shortcuts import BYTES32_T, INT256_T, UINT256_T
from vyper.semantics.types.subscriptable import SArrayT
from vyper.semantics.types.user import EnumT
from vyper.utils import (
GAS_CALLDATACOPY_WORD,
GAS_CODECOPY_WORD,
GAS_IDENTITY,
GAS_IDENTITYWORD,
ceil32,
)
from vyper.utils import GAS_COPY_WORD, GAS_IDENTITY, GAS_IDENTITYWORD, ceil32

DYNAMIC_ARRAY_OVERHEAD = 1

Expand Down Expand Up @@ -90,12 +89,16 @@ def _identity_gas_bound(num_bytes):
return GAS_IDENTITY + GAS_IDENTITYWORD * (ceil32(num_bytes) // 32)


def _mcopy_gas_bound(num_bytes):
return GAS_COPY_WORD * ceil32(num_bytes) // 32


def _calldatacopy_gas_bound(num_bytes):
return GAS_CALLDATACOPY_WORD * ceil32(num_bytes) // 32
return GAS_COPY_WORD * ceil32(num_bytes) // 32


def _codecopy_gas_bound(num_bytes):
return GAS_CODECOPY_WORD * ceil32(num_bytes) // 32
return GAS_COPY_WORD * ceil32(num_bytes) // 32


# Copy byte array word-for-word (including layout)
Expand Down Expand Up @@ -258,7 +261,6 @@ def copy_bytes(dst, src, length, length_bound):
assert src.is_pointer and dst.is_pointer

# fast code for common case where num bytes is small
# TODO expand this for more cases where num words is less than ~8
if length_bound <= 32:
copy_op = STORE(dst, LOAD(src))
ret = IRnode.from_list(copy_op, annotation=annotation)
Expand All @@ -268,8 +270,12 @@ def copy_bytes(dst, src, length, length_bound):
# special cases: batch copy to memory
# TODO: iloadbytes
if src.location == MEMORY:
copy_op = ["staticcall", "gas", 4, src, length, dst, length]
gas_bound = _identity_gas_bound(length_bound)
if version_check(begin="cancun"):
copy_op = ["mcopy", dst, src, length]
gas_bound = _mcopy_gas_bound(length_bound)
else:
copy_op = ["staticcall", "gas", 4, src, length, dst, length]
gas_bound = _identity_gas_bound(length_bound)
elif src.location == CALLDATA:
copy_op = ["calldatacopy", dst, src, length]
gas_bound = _calldatacopy_gas_bound(length_bound)
Expand Down Expand Up @@ -876,6 +882,38 @@ def make_setter(left, right):
return _complex_make_setter(left, right)


_opt_level = OptimizationLevel.GAS


@contextlib.contextmanager
def anchor_opt_level(new_level: OptimizationLevel) -> Generator:
"""
Set the global optimization level variable for the duration of this
context manager.
"""
assert isinstance(new_level, OptimizationLevel)

global _opt_level
try:
tmp = _opt_level
_opt_level = new_level
yield
finally:
_opt_level = tmp


def _opt_codesize():
return _opt_level == OptimizationLevel.CODESIZE


def _opt_gas():
return _opt_level == OptimizationLevel.GAS


def _opt_none():
return _opt_level == OptimizationLevel.NONE


def _complex_make_setter(left, right):
if right.value == "~empty" and left.location == MEMORY:
# optimized memzero
Expand All @@ -891,11 +929,41 @@ def _complex_make_setter(left, right):
assert is_tuple_like(left.typ)
keys = left.typ.tuple_keys()

# if len(keyz) == 0:
# return IRnode.from_list(["pass"])
if left.is_pointer and right.is_pointer and right.encoding == Encoding.VYPER:
# both left and right are pointers, see if we want to batch copy
# instead of unrolling the loop.
assert left.encoding == Encoding.VYPER
len_ = left.typ.memory_bytes_required

has_storage = STORAGE in (left.location, right.location)
if has_storage:
if _opt_codesize():
# note a single sstore(dst (sload src)) is 8 bytes,
# sstore(add (dst ofst), (sload (add (src ofst)))) is 16 bytes,
# whereas loop overhead is 17 bytes.
should_batch_copy = len_ >= 32 * 3
elif _opt_gas():
# kind of arbitrary, but cut off when code used > ~160 bytes
should_batch_copy = len_ >= 32 * 10
else:
# don't care, just generate the most readable version
should_batch_copy = True
else:
# 10 words is the cutoff for memory copy where identity is cheaper
# than unrolled mloads/mstores
# if MCOPY is available, mcopy is *always* better (except in
# the 1 word case, but that is already handled by copy_bytes).
if right.location == MEMORY and _opt_gas():
should_batch_copy = len_ >= 32 * 10 or version_check(begin="cancun")
# calldata to memory, code to memory, or prioritize codesize -
# batch copy is always better.
else:
should_batch_copy = True

if should_batch_copy:
return copy_bytes(left, right, len_, len_)

# general case
# TODO use copy_bytes when the generated code is above a certain size
# general case, unroll
with left.cache_when_complex("_L") as (b1, left), right.cache_when_complex("_R") as (b2, right):
for k in keys:
l_i = get_element_ptr(left, k, array_bounds_check=False)
Expand Down
Loading