Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: forge coverage #1576

Merged
merged 36 commits into from
Jun 22, 2022
Merged

feat: forge coverage #1576

merged 36 commits into from
Jun 22, 2022

Conversation

onbjerg
Copy link
Member

@onbjerg onbjerg commented May 10, 2022

Adds test coverage support to Forge.

Usage

You cannot currently run this locally easily

This PR depends on gakonst/ethers-rs#1201, so to run locally you must clone ethers-rs from that branch
and update the patch section in Cargo.toml to point to your patched ethers. Then, run foundryup -p ..

Simply run forge coverage to get a coverage summary.

An LCOV reporter is also included: forge coverage --report lcov. Currently the LCOV file is written to ./lcov.info but this is going to move.

Background

I've had to rewrite this feature many times.. I wrote what I thought would be a good structure, only to find out later that something would not work and/or fit into the current structure, so I scrapped most of it and tried again.

Initially, the thought was to analyse JUMPI and JUMPDESTs to find branches (i.e. require, assert, if etc.), however it would not get us all the way in terms of the information we want - we want functions, branches, executable lines and statements.

The primary mode of operation for the coverage tool will be to collect the following pieces of information:

  • Functions (done)
  • Executable lines: lines, excluding lines that do nothing (whitespace, comments, brackets on single lines etc.)
  • Branches: If statements, calls to require and asserts (wip)
  • Statements: executable statements, which differ from lines in that you can have multiple statements per executable line (sort of done)

This information will be collected from the AST of the Solidity files. Then the tests are run with a special coverage collector that marks opcodes as hit as they are executed, which we later use (along with source maps) to map back to the coverage items.

Currently, I am working off of a combination of solidity-coverage and this Brownie blog post to figure out what to include in the reports and relevant data structures.

Report types

The above information will be output either to stdout as a simple report, or as an LCOV file that you can upload to services like CodeCov. A stretch goal would be to add a HTML report as well, but since tools already exist for converting LCOV to HTML, it isn't a deal breaker if it is not in the first version.

Compared to other coverage tools

  • DappTools's coverage tool is not super great (see comments above) as it does not support outputting to standard reporting formats like LCOV, and it does not treat require and assert as branches.
  • Hardhat Coverage is closer to what we want, but it's slow and it distorts gas since it instruments the code of the contracts

Current status

Good

  • All functions are detected
  • Most statements are detected, however there are some TODOs in the source code itself with questions about how deep in the AST we want to go, specifically around subexpressions.
  • I am collecting hit data for individual instructions

Missing

  • Executable lines are not detected: I left this for last because (fingers crossed) this would be the easiest item type to detect
  • Not all branches are detected: I am detecting if-statements, however, I do wonder if it works as intended (e.g. should we mark chained elseif's with the same branch ID?). I also do not currently detect assert and require type branches.
  • HTML reporter: There is no HTML reporter and I consider it a stretch goal.
  • Hit data is NOT hooked up yet so every item shows up as uncovered. My current thinking is that I want good detection, and then I'll start hooking up the hit data, but it is possible to hook it up first.

Needs change

  • LCOV reports do not have a fixed place yet: I need to figure out where to put report data in general: I've thought about out/coverage (e.g. out/coverage/lcov.info, out/coverage/html/index.html), but I'm not sure yet.
  • Dependencies and test contracts are not filtered out: We don't want to generate coverage for dependencies or the test contracts themselves
  • The summary reporter is verbose: Currently the full path for files is displayed, but it should work more like what @mds1 had in mind in dapptools-style coverage #99

Needs clarification

  • Subexpression clarifications: I'm not sure how deep to go for statement coverage in terms of subexpressions. See the relevant TODOs in the source code. I am also not sure if try/catch counts as a branch, or whether Conditional nodes count as branches (e.g. ternary statements).
  • Probably only works for Solidity >= 0.8.0 but I'm not sure yet. There have been some breaking changes in how the AST is structured (specifically, name used to be nodeType and function names etc. were in node.attributes.name)
  • LCOV block IDs: LCOV branches (specifically the BRDA item) have a branch ID and a block ID. I'm not sure what the block ID is supposed to mean.

Task list

Branches

  • Detect assert as a branch
  • Detect require as a branch
  • Figure out if try/catch is a branch
  • Figure out if Conditional nodes are branches

Lines

  • Detect executable lines

Statements

  • Figure out what subexpressions to include as statement items

Reporters

  • Add a HTML reporter like solidity-coverage
  • Bring summary reporter more in line with @mds1's idea in dapptools-style coverage #99
  • Figure out if the LCOV reporter is broken or not
    I've tried running the LCOV report through genhtml but it complained that the file was malformed. I'm not sure if this is to be expected, or if it is just because we don't map the relevant items to source lines and that is why it is complaining

Misc

  • Hook up hit data
  • Figure out what Solidity versions are supported and what versions we should support
  • Filter out dependency contracts
  • Filter out test contracts
  • Try to break it and fix or handle those breakages

Other

Companion PR: gakonst/ethers-rs#1201

Closes #99

@onbjerg onbjerg added the T-feature Type: feature label May 10, 2022
@onbjerg onbjerg mentioned this pull request May 10, 2022
!artifact
.get_abi()
.map(|abi| abi.functions().any(|f| f.name.starts_with("test")))
.unwrap_or_default()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should add an extension trait for this

cli/src/cmd/forge/coverage.rs Outdated Show resolved Hide resolved
cli/src/cmd/forge/coverage.rs Outdated Show resolved Hide resolved
evm/src/coverage.rs Outdated Show resolved Hide resolved
evm/src/executor/inspector/coverage.rs Show resolved Hide resolved
Comment on lines +538 to +399
// TODO: Maybe support coverage for fuzz tests
coverage: None,
Copy link
Member

@gakonst gakonst May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REF discussion here #99 (comment), + context from @mds1

I think "default no fuzz, flag to include it" is my preferred approach. Cause contrary to my comment there I often find myself writing concrete tests to test edge cases I'm worried fuzzer might not find / for gas profiling purposes

cli/src/cmd/forge/coverage.rs Outdated Show resolved Hide resolved
@brockelmore
Copy link
Member

Random thought, likely shouldnt be in current PR but:

It would be cool to get info on storage slot writes/checks. i.e. funcA writes to slots 1 and 2, any test that calls funcA should check slots 1 and 2. This can help users identify potentially missed state transitions in their tests

@brockelmore
Copy link
Member

Optimizer is always off, which can cause issues when using solidity <0.8.13 as you can hit stack too deep with optimizer off, versus forge test with optimizer on will compile and run fine (no stack too deep). Not sure if optimizer off is required or not but if we can avoid ignoring optimizer settings, it will save users some pain

@gakonst
Copy link
Member

gakonst commented May 30, 2022

I think if the optimizer is On we get discrepancies between bytecode and sourcemap?

@brockelmore
Copy link
Member

ye we do (kind of, bytecode/sourcemap is accurate for the most part but the bytecode/sourcemap/ast conversions can get mangled iirc). i think best solution is to try to turn it off and if we get a stack too deep error and the user specifies to use the optimizer, we should just compile with the optimizer on and report a warning that we had to turn on the optimizer so coverage may not be accurate. otherwise coverage is impossible to run (and a slightly incorrect coverage report is better than no coverage report imo)

@gakonst
Copy link
Member

gakonst commented May 31, 2022

I do wonder why does solc not emit an "optimized" sourcemap when the bytecode is also optimized..

@brockelmore
Copy link
Member

I do wonder why does solc not emit an "optimized" sourcemap when the bytecode is also optimized..

to be clear, it tries its best to do so. I think it just mangles the bytecode to a point that so much of the bytecode doesnt actually map to anything in the source. So it technically does. The primary issue i believe is that a lot of the optimizations that are done are done via "peephole" optimizations, optimizations done on the bytecode itself, which means it doesnt track to the AST so the AST is only partially optimized relative to the bytecode.

@onbjerg onbjerg force-pushed the onbjerg/coverage branch 2 times, most recently from 0b091f3 to b3d2aaf Compare June 3, 2022 04:18
@gakonst
Copy link
Member

gakonst commented Jun 3, 2022

👀 it's happening

@onbjerg
Copy link
Member Author

onbjerg commented Jun 5, 2022

Hey guys, an update.

First of all, to everyone following the issue, sorry for the radio silence and lack of activity. I've both had an attempt at something more complex (in order to address an issue I think might come up), which I've put on hold for now, and also had some family related issues that impacted my mental health.

On the latest commit before this comment I see that a lot of things in lil-web3 is covered:

image

I still need to:

  • Detect executable lines
  • Collect branch hit data
  • Test the LCOV reporter

That seems like a shorter to do list, but the current thinking is that I get this into a useable experimental state and create issues separately for the remaining items, such as the HTML reporter, questions around some of the statement nodes, and figuring out what we support, what we should support, and what we can support.

I've noticed some things that I need to address before this can be merged:

  • Libraries currently are not covered. I think this is because source maps are missing for these for some reason, so I'm trying to figure out why that is
  • receive() functions are not marked as covered - I think this is because of some optimization Solidity does that I don't currently handle correctly
  • The dependency and test contract checks are not super robust, and I probably need to filter on an AST level for test contracts instead of on a file level: projects like Solmate usually create concrete implementations of their abstract contracts in their test files in order to test functionality - this is not currently handled

@onbjerg
Copy link
Member Author

onbjerg commented Jun 9, 2022

Update again, I'm currently working on hooking up branch hit data and concurrently checking out issue #2 I wrote about (the receive() call). I think the receive() function is optimized away in some situations since it essentially just disables the value transfer check, and in those cases there is no source mapping. I think it might be fine to leave that case out for now, but I want to make sure that it works as expected for receive() functions that have a function body

@gakonst
Copy link
Member

gakonst commented Jun 14, 2022

I think we need a rebase per the recent refactor by @mattsse. Should we make 0/0's display as not-green so that it's not confusing?

That seems like a shorter to do list, but the current thinking is that I get this into a useable experimental state and create issues separately for the remaining items, such as the HTML reporter, questions around some of the statement nodes, and figuring out what we support, what we should support, and what we can support

I'm supportive of this, esp. the HTML reporter and edge cases around various nodes. e.g. in the Receive case, if you think it's a lot of work, let's track separately and can document it as a known limitation

@onbjerg
Copy link
Member Author

onbjerg commented Jun 14, 2022

image

Getting there - some items are considered uncovered even though they are. This is related to how I pick anchors (see 3bb07ce and 5742f53), since I am only considering the runtime source map

Additionally, branch coverage isn't really true in this current iteration, since:

  • It considers if statements with no else block to be fully covered even if the "false" branch is never hit (we need to create a "virtual" one)
  • It doesn't consider require and assert... These are more complicated

@gakonst
Copy link
Member

gakonst commented Jun 14, 2022

Everytime I see a commit w progress on this PR I jump out of my seat lol

Do you have a 1-paragraph explanation of "anchors" and why they help here?

@onbjerg
Copy link
Member Author

onbjerg commented Jun 14, 2022

@gakonst Each AST node has an attached source range: a start byte and (optionally) a length. Previously I would take each opcode we executed, look it up in the source map and see if the offset described in the source map matched the start byte of one of our coverage items.

This doesn't always work: sometimes the opcode that should mark an item as covered might be somewhere else in the range, and sometimes an opcode actually marks multiple items as covered because of the optimizer. Anchors are essentially opcodes that we pick that would mark items as covered.

Another benefit of the anchor approach is that we can change what anchor we pick for more complicated items such as branches. For example we might want to pick a specific type of opcode within the source range (like a JMPI or something).

Hope what I wrote made sense, it's a little hard to explain lol

@gakonst
Copy link
Member

gakonst commented Jun 14, 2022

This makes sense (also copying @jpopesculian and @rkrasiuk who seem to understand the AST stuff better than I frmo the work on forge fmt in case they have thoughts / comments).

Separately:

It considers if statements with no else block to be fully covered even if the "false" branch is never hit (we need to create a "virtual" one)

This makes sense.

It doesn't consider require and assert... These are more complicated

What makes them more complicated? Doesn't each require/assert(bool) translate to an if (bool) statement and is like a branch?

@onbjerg
Copy link
Member Author

onbjerg commented Jun 14, 2022

What makes them more complicated? Doesn't each require/assert(bool) translate to an if (bool) statement and is like a branch?

Ideally for branches we want to have three states: not executed, one path executed (either true or false) or both branches executed (i.e. covered).

Currently for if statements with no else block there are two states: not executed or true branch executed which is not ideal. I think we probably want to signal to the developer if they only covered 1 possible path in their test.

For require and assert though, all of this is marked by specific opcodes, and I need to figure out which. The general layout seems to be:

<some assertion>
PUSH <pc at which we continue execution if assertion holds>
JUMPI
<setup revert and revert>

So my assumption is that the false branch would be right after JUMPI. For the true branch, I'm not sure - maybe I need to record what value is on the stack every time we encounter a JUMPI so I can look it up later and check if we both got a 0 and a non-0 value? :/

Edit: I checked it out on Godbolt btw, here is my contract

// SPDX-License-Identifier: UNLICENSED
pragma solidity >=0.4.0;

contract Test {
    function test(uint256 a) public pure returns (uint256) {
        require(a > 1, "abc");
    }
}

And here is the bytecode for require:

  PUSH 1
  DUP3 
  GT 
  PUSH [tag] test_uint256_6
  JUMPI 
  PUSH 40 # if we end up here, we will end up reverting
# load revert string
  MLOAD 
  PUSH 8C379A000000000000000000000000000000000000000000000000000000000
  DUP2 
  MSTORE 
  PUSH 4
  ADD 
# set return addr
  PUSH [tag] test_uint256_5
  SWAP1
# some abi encoding whatever
  PUSH [tag] abi_encode_tuple_t_stringliteral_4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45__to_t_string_memory_ptr__fromStack_reversed_0
  JUMP [in]
tag test_uint256_5
# after abi encoding we end up here
  JUMPDEST 
  PUSH 40
  MLOAD 
  DUP1 
  SWAP2 
  SUB 
  SWAP1 
  REVERT 
tag test_uint256_6
  JUMPDEST # if we end up here we didn't revert

Previously coverage was based on the start of AST nodes
in the source code. With the anchoring system, opcodes in
relevant source maps are searched, and the first one
within the source range of the AST node is selected
as an anchor.

If the anchor is executed once, then the coverage item
is considered to be covered. This works very well, but
false positives and edge cases still need to be found.
Consider the full range instead of just the offset.

Unfortunately breaks detection of some items; suspected to
be related to not using constructor source maps.
@onbjerg onbjerg requested a review from mattsse June 22, 2022 15:04
Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smol nit

lgtm!

how can we test this properly?

evm/src/coverage/mod.rs Outdated Show resolved Hide resolved

fn visit_contract(&mut self, node: Node) -> eyre::Result<()> {
let is_contract =
node.attribute("contractKind").map_or(false, |kind: String| kind == "contract");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see,
sounds reasonable, we could however think of adding an enum (Known)Attribute that's AsRef<str> so we can do node.attribute(Attribute::Abstract)

but no real benefit to that I'd assume

evm/src/coverage/visitor.rs Outdated Show resolved Hide resolved
@onbjerg
Copy link
Member Author

onbjerg commented Jun 22, 2022

smol nit

lgtm!

how can we test this properly?

Not sure how to best do automated testing for it yet, my best guess is to do some sort of snapshot testing against external repos - so, we run forge coverage for external repos, snapshot the output and if it differs we'd have to manually check that it looks ok. For some parts of the code though we can probably be a bit smarter about it (e.g. the format of the reporters, parts of the visitor).

There's still some way to go before this is dependable (see above) but those will be in other PRs

@mattsse
Copy link
Member

mattsse commented Jun 22, 2022

SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-feature Type: feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dapptools-style coverage
5 participants