Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(forge): isolated execution #7186

Merged
merged 26 commits into from
Feb 28, 2024

Conversation

klkvr
Copy link
Member

@klkvr klkvr commented Feb 20, 2024

Motivation

aka authentic execution described in #6910

The idea is to execute all test/script-level calls as separate transactions initialized with empty journaled state and triggering all pre/post transaction actions and clean-ups, such as: including calldata, base 21k gas cost, clearing transient storage, correctly processing selfdestructs, etc

Solution

All calls of depth 1 are getting caught in InspectorStack and executed as following:

  1. Commit all changes from journal to db.
  2. Create a separate EVMImpl with TxEnv for the current call/create context.
  3. Transact
  4. Collect state changeset
  5. Merge changeset to the existing journal

When we are transacting the inner EVM, the call depth decreases from 1 to 0, and because of that, I've added logic to adjust journaled_state.depth which inspectors receive if we are in inner EVM context.

revm will also call InspectorStack hooks for the second time for the call that's being delegated (first time in main context with depth 1, second time for inner context with depth 0), but we want it to be processed only in the main context.

Open questions

I've been testing it on several codebases with a lot of various tests and it seems to work, however, there are some issues with that approach (and ci will probably show more):

  1. Performance. Earlier we could have ran some tests without ever requiring mutable access to db, because everything was kept in the journal. Right now, we have to commit changes -> clone and it becomes more expensive. I've observed 40-50% test execution time increase in some cases.
  2. Pranks. Currently, by using prank cheatcodes it's possible to craft test-level contract interaction where tx.origin != msg.sender. It's not possible when the call is a top-level transaction. Some of foundry and forge-std tests breake because of that
  3. Gas reports. Right now we collect gas usage data for function calls on all depths. With this impl, test-level calls will cost more because of 21k+other additional costs, thus we should probably only use gas cost of those calls.
  4. Commiting. Right now my impl always commits changes in-between calls and it seems to be working fine. But I am wondering if we have any logic relying on assumption that InspectorStack does not commit?

@klkvr klkvr marked this pull request as draft February 20, 2024 10:52
@Philogy
Copy link

Philogy commented Feb 21, 2024

This solution seems overly complicated, why clone and merge states? Wouldn't it be simpler to maintain two separate VMs: the test runner & the actual environment. Then you map calls from the test environment as transactions in the actual and vice versa map results/receipts as return values.

@klkvr
Copy link
Member Author

klkvr commented Feb 21, 2024

@Philogy we'd have to clone VMs if we want to commit between calls

I agree that approach with test EVM + N other EVMs might feel more intuitive, but I don't think that it's less complicated

for example, if we wouldn't merge states, we'd have to figure out workarounds for correct processing of BALANCE, DELEGATECALL, EXTCODESIZE, etc opcodes on the level of test env as changes to actual EVM won't appear in its journal

atm this PR is a draft I'm experimenting with to find out all implicit assumtions we and users have about foundry EVM behavior. once it's figured out ths impl will probably change a lot and might become closer to the approach you've mentioned

@roninjin10
Copy link
Contributor

Thanks for kicking off this work klkvr. Commenting so I am subscribed

@@ -11,13 +11,13 @@ contract Issue3653Test is DSTest {
Token token;

constructor() {
fork = vm.createSelectFork("rpcAlias", 10);
fork = vm.createSelectFork("rpcAlias", 1000000);
Copy link
Member Author

@klkvr klkvr Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block 10 had gas limit of 5000 which is not enough to deploy a contract in isolated mode

@klkvr
Copy link
Member Author

klkvr commented Feb 22, 2024

So right now CI passess except for 1 test from sablier v2 which I believe has incorrect assumptions about the foundry EVM. On latest nightly that call reverts with MemoryOOG which I believe is unrealted to gas limit, and with isolated execution call executes without halts and reaches revert in contract with custom error type, causing revert reason mismatch in expectRevert

The current impl has several workarounds to make its behavior closer to how non-isolated EVM works now. I believe that it was important to create a PoC showing that we can switch to isolated mode without breaking changes, however, not sure if all semantics of non-isolated mode should be kept:

  1. Nonces. When executing CALL as a separate transaction, it increases sender nonce, which does not happen when CALL is performed in scope of another transaction. This was breaking some external tests hardcoding addresses.
  2. tx.origin. Right now tx.origin is patched to match the tx.origin which we had before starting isolated call. This is needed for tx.origin pranks to work correctly and for some external tests with hardcoded tx.origin assertions (optimism has some)
  3. Gas limits. Not sure how we should treat tests using .transfer and .send or simply hardcoded .call{gas: XXX}() relying on default 21K cost not being present. Right now tx gas limit is being manually increased by 21K so that we have all our tests pass, but this does not feel as a decent fix
    data.env.tx.gas_limit = std::cmp::min(gas_limit + 21000, data.env.block.gas_limit.to());

In general, the final impl of inspector call chain looks like this:

-> root call/create hook is called (depth 0)
-> inspectors are called
    -> test-level call/create hook is called (depth 1)
        -> inspectors are called
        -> journaled state is commited to db
        -> tx env is adjusted and transaction is initiated
            -> call/create hook is called for test-level call again, this time in scope of isolated tx (depth 0 is adjusted to 1)
                -> inspectors are not called, because we already called them for depth 1 outside of isolation
                -> tx.origin and sender_nonce are patched to match values outside of isolation
                    ... deeper calls are processed as usual, with only difference being depth adjustment
                -> call_end/create_end hook is called for test-level call
                     -> inspectors are not called, we simply skip this inspector invocation as we want to process it outside of isolation
         -> After transaction is completed, we merge new isolated journaled state into previous state and return tx result (Halt/Success/Revert) from the inspector's `call/create` hook
     -> test-level call_end/create_end hook is called (depth 1)
         -> Here we notify inspectors about the end of the call, at this point we already have updated gas data received from call/create hook which executed tx in isolation
       
```

@klkvr
Copy link
Member Author

klkvr commented Feb 22, 2024

I will do more experiments with optimism codebase as there are some tests that are failing, and I am still not sure why.

After that, I plan to make this an opt-in, look into performance and gas metering

upd: seems like failing optimist tests are caused by hardcoded gas limits

@klkvr
Copy link
Member Author

klkvr commented Feb 22, 2024

Performance

snekmate:
latest nightly:
image
this pr:
image

solady:
latest nightly:
image
this pr:
image

optimism (with failing tests excluded):
latest nightly:
image
this pr:
image

That way, we are looking at 20-30% execution time increase

Gas reports

I've made isolated execution an opt-in, currently it is enabled only when running tests via --gas-report. Also, we are now only filtering out traces on depth 1 when constructing gas report, to avoid deeper calls executed in a non-isolated manner affecting reports.

@klkvr
Copy link
Member Author

klkvr commented Feb 23, 2024

I've added a couple tests for tstore/tload and selfdestructs which are only passing with isolation mode.

It's possible now to enable isolation by either using --isolate flag for forge test or forge script or via setting isolate = true parameter in config.

Isolation is enabled automatically when gas report is requested for tests via --gas-report

@klkvr klkvr changed the title [wip] feat(forge): isolated execution feat(forge): isolated execution Feb 23, 2024
Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only have a few pedantic nits,

imo this is still fairly readable

ptal @DaniPopes

crates/common/src/evm.rs Show resolved Hide resolved
crates/evm/evm/src/inspectors/stack.rs Show resolved Hide resolved
crates/evm/evm/src/inspectors/stack.rs Show resolved Hide resolved
@klkvr klkvr requested a review from mattsse February 26, 2024 15:03
Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

data.journaled_state.state().get_mut(&broadcast.new_origin).unwrap();

account.info.nonce += 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we are increasing nonce before broadcasting CALL to simulate nonce increase during on-chain transaction

this is incorrect for--isolate because we need nonce to be up-to-date at the point when we are creating a transaction

so the change is to increase nonce after the CALL

however, thinking of it now, with new workaround when we explicitly decrease nonces in isolation, this is not really needed as long as we touch the account when pre-increase its nonce, updated in 33814e5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah can we also doc that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

Comment on lines +79 to +88
// Only include top-level calls which accout for calldata and base (21.000) cost.
// Only include Calls and Creates as only these calls are isolated in inspector.
if trace.depth != 1 &&
(trace.kind == CallKind::Call ||
trace.kind == CallKind::Create ||
trace.kind == CallKind::Create2)
{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

Comment on lines +442 to +450
fn transact_inner<DB: DatabaseExt + DatabaseCommit>(
&mut self,
data: &mut EVMData<'_, DB>,
transact_to: TransactTo,
caller: Address,
input: Bytes,
gas_limit: u64,
value: U256,
) -> (InstructionResult, Option<Address>, Gas, Bytes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love if we unit tested this with a simple regression test, but can do in follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really run any tests for isolation rn besides two I've added for selfdestruct/tstore, not sure how we can address this without running all tests for both modes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a flag to run these on CI perhaps every 24hrs on --isolate? Something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that could work, we'd have to filter some tests out for such runs because we have some fixtures with hardcoded gas usage which doesn't match for --isolate and also that one failing sablier-v2 test

@gakonst gakonst merged commit 551bcb5 into foundry-rs:master Feb 28, 2024
19 checks passed
@frontier159
Copy link
Contributor

@klkvr When running tests with --gas-report, I think I've hit a 30mm gas limit in setUp(), which I wasn't before:

    ├─ [30000000] → new <unknown>@0x5615dEB798BB3E4dFa0139dFa1b3D433Cc23b72f
    │   └─ ← 0 bytes of code

There are a bunch of contracts being forked and setup

Adding forge test --gas-report --block-gas-limit 10000000000000000000 didn't help.

Any ideas?

@klkvr
Copy link
Member Author

klkvr commented Mar 1, 2024

So this can be reproduced with the following test:

import "forge-std/Test.sol";

contract C is Test {}

contract GasWaster {
    function waste() public {
        for (uint256 i = 0; i < 100; i++) {
            new C();
        }
    }
}

contract GasLimitTest is Test {
    function test() public {
        vm.createSelectFork("mainnet");
        
        GasWaster waster = new GasWaster();
        waster.waste();
    }
}

What happening here is that we are setting block gas limit to the real gas limit of the forked chain:

gas_limit: block.header.gas_limit,

However, without isolation we are never really validating gas usage because such checks are performed in revm::EVMImpl::preverify_transaction_innner.

With isolation, we are capping transaction gas limit at block gas limit value to not hit CallerGasLimitMoreThanBlock:

data.env.tx.gas_limit = std::cmp::min(gas_limit + 21000, data.env.block.gas_limit.to());

IMO, revert here is a correct behavior, that one failing sablier v2 test tries to test exactly this (tx exceeding block gas limit)

However, users might need to disable those checks because it's pretty easy to go beyond 30M with various utility testing contracts.

I can see 3 approaches for this:

  1. Expose CfgEnv::disable_block_gas_limit flag to allow disabling block gas limit checks.
  2. Set CfgEnv::disable_block_gas_limit to true by default to fully avoid any reverts due to block gas limit violations. Allow enabling those checks for tests which rely on that via cheatcode/config.
  3. Always override block gas limit with --block-gas-limit value, if provided.

All approaches are pretty similar, it will be easier to decide once we figure out if we want this to become a breaking change or keep backwards compatibility

@mattsse wdyt?

@frontier159
Copy link
Contributor

Thanks for the simple reproduction. imo this is already a breaking change in master since isolated-execution is on by default with --gas-report (this broke some other tests in my repo where I wanted to explicitly check gaslimit() was under a threshold)

@Evalir
Copy link
Member

Evalir commented Mar 1, 2024

fwiw @frontier159 foundry is not stable and breaking changes can happen—albeit we try hard not to rip the bandaid if necessary. But we do understand the frustration 😄

I think adding a flag to disable the block gas limit here might be the move @klkvr. I'm also for the opposite (disable it by default, enable with flag) if we want to "remove" the breaking change.

@frontier159
Copy link
Contributor

Allg - I enjoy living my namesake and on the Frontier. Appreciate the work you kings are doing here

zeroXbrock pushed a commit to flashbots/suavex-foundry that referenced this pull request Apr 2, 2024
* [wip] feat(forge): isolated execution

* small fixes

* don't panic on transaction error + fixture fix

* stricter call scheme check

* refactor and more fixes

* wip

* fix

* wip

* wip

* rm cheatcodes check

* clippy

* update commit logic

* opt-in

* enable in gas reports

* --isolate

* isolation tests

* smaller diff

* fmt

* simplify logic

* docs

* fmt

* enable isolation properly for --gas-report

* change nonce incrementing

* document why we touch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants