feat: forge coverage #1576

onbjerg · 2022-05-10T16:13:54Z

Adds test coverage support to Forge.

Usage

You cannot currently run this locally easily

This PR depends on gakonst/ethers-rs#1201, so to run locally you must clone ethers-rs from that branch
and update the patch section in Cargo.toml to point to your patched ethers. Then, run foundryup -p ..

Simply run forge coverage to get a coverage summary.

An LCOV reporter is also included: forge coverage --report lcov. Currently the LCOV file is written to ./lcov.info but this is going to move.

Background

I've had to rewrite this feature many times.. I wrote what I thought would be a good structure, only to find out later that something would not work and/or fit into the current structure, so I scrapped most of it and tried again.

Initially, the thought was to analyse JUMPI and JUMPDESTs to find branches (i.e. require, assert, if etc.), however it would not get us all the way in terms of the information we want - we want functions, branches, executable lines and statements.

The primary mode of operation for the coverage tool will be to collect the following pieces of information:

Functions (done)
Executable lines: lines, excluding lines that do nothing (whitespace, comments, brackets on single lines etc.)
Branches: If statements, calls to require and asserts (wip)
Statements: executable statements, which differ from lines in that you can have multiple statements per executable line (sort of done)

This information will be collected from the AST of the Solidity files. Then the tests are run with a special coverage collector that marks opcodes as hit as they are executed, which we later use (along with source maps) to map back to the coverage items.

Currently, I am working off of a combination of solidity-coverage and this Brownie blog post to figure out what to include in the reports and relevant data structures.

Report types

The above information will be output either to stdout as a simple report, or as an LCOV file that you can upload to services like CodeCov. A stretch goal would be to add a HTML report as well, but since tools already exist for converting LCOV to HTML, it isn't a deal breaker if it is not in the first version.

Compared to other coverage tools

DappTools's coverage tool is not super great (see comments above) as it does not support outputting to standard reporting formats like LCOV, and it does not treat require and assert as branches.
Hardhat Coverage is closer to what we want, but it's slow and it distorts gas since it instruments the code of the contracts

Current status

Good

All functions are detected
Most statements are detected, however there are some TODOs in the source code itself with questions about how deep in the AST we want to go, specifically around subexpressions.
I am collecting hit data for individual instructions

Missing

Executable lines are not detected: I left this for last because (fingers crossed) this would be the easiest item type to detect
Not all branches are detected: I am detecting if-statements, however, I do wonder if it works as intended (e.g. should we mark chained elseif's with the same branch ID?). I also do not currently detect assert and require type branches.
HTML reporter: There is no HTML reporter and I consider it a stretch goal.
Hit data is NOT hooked up yet so every item shows up as uncovered. My current thinking is that I want good detection, and then I'll start hooking up the hit data, but it is possible to hook it up first.

Needs change

LCOV reports do not have a fixed place yet: I need to figure out where to put report data in general: I've thought about out/coverage (e.g. out/coverage/lcov.info, out/coverage/html/index.html), but I'm not sure yet.
Dependencies and test contracts are not filtered out: We don't want to generate coverage for dependencies or the test contracts themselves
The summary reporter is verbose: Currently the full path for files is displayed, but it should work more like what @mds1 had in mind in dapptools-style coverage #99

Needs clarification

Subexpression clarifications: I'm not sure how deep to go for statement coverage in terms of subexpressions. See the relevant TODOs in the source code. I am also not sure if try/catch counts as a branch, or whether Conditional nodes count as branches (e.g. ternary statements).
Probably only works for Solidity >= 0.8.0 but I'm not sure yet. There have been some breaking changes in how the AST is structured (specifically, name used to be nodeType and function names etc. were in node.attributes.name)
LCOV block IDs: LCOV branches (specifically the BRDA item) have a branch ID and a block ID. I'm not sure what the block ID is supposed to mean.

Task list

Branches

Detect assert as a branch
Detect require as a branch
Figure out if try/catch is a branch
Figure out if Conditional nodes are branches

Lines

Detect executable lines

Statements

Figure out what subexpressions to include as statement items

Reporters

Add a HTML reporter like solidity-coverage
Bring summary reporter more in line with @mds1's idea in dapptools-style coverage #99
Figure out if the LCOV reporter is broken or not
I've tried running the LCOV report through genhtml but it complained that the file was malformed. I'm not sure if this is to be expected, or if it is just because we don't map the relevant items to source lines and that is why it is complaining

Misc

Hook up hit data
Figure out what Solidity versions are supported and what versions we should support
Filter out dependency contracts
Filter out test contracts
Try to break it and fix or handle those breakages

Other

Companion PR: gakonst/ethers-rs#1201

Closes #99

mattsse · 2022-05-12T09:39:03Z

cli/src/cmd/forge/coverage.rs

+                !artifact
+                    .get_abi()
+                    .map(|abi| abi.functions().any(|f| f.name.starts_with("test")))
+                    .unwrap_or_default()


I feel like we should add an extension trait for this

cli/src/cmd/forge/coverage.rs

evm/src/coverage.rs

evm/src/executor/inspector/coverage.rs

gakonst · 2022-05-11T22:49:17Z

forge/src/runner.rs

+            // TODO: Maybe support coverage for fuzz tests
+            coverage: None,


REF discussion here #99 (comment), + context from @mds1

I think "default no fuzz, flag to include it" is my preferred approach. Cause contrary to my comment there I often find myself writing concrete tests to test edge cases I'm worried fuzzer might not find / for gas profiling purposes

cli/src/cmd/forge/coverage.rs

brockelmore · 2022-05-30T13:46:38Z

Random thought, likely shouldnt be in current PR but:

It would be cool to get info on storage slot writes/checks. i.e. funcA writes to slots 1 and 2, any test that calls funcA should check slots 1 and 2. This can help users identify potentially missed state transitions in their tests

brockelmore · 2022-05-30T13:51:40Z

Optimizer is always off, which can cause issues when using solidity <0.8.13 as you can hit stack too deep with optimizer off, versus forge test with optimizer on will compile and run fine (no stack too deep). Not sure if optimizer off is required or not but if we can avoid ignoring optimizer settings, it will save users some pain

gakonst · 2022-05-30T21:04:17Z

I think if the optimizer is On we get discrepancies between bytecode and sourcemap?

brockelmore · 2022-05-30T21:18:18Z

ye we do (kind of, bytecode/sourcemap is accurate for the most part but the bytecode/sourcemap/ast conversions can get mangled iirc). i think best solution is to try to turn it off and if we get a stack too deep error and the user specifies to use the optimizer, we should just compile with the optimizer on and report a warning that we had to turn on the optimizer so coverage may not be accurate. otherwise coverage is impossible to run (and a slightly incorrect coverage report is better than no coverage report imo)

gakonst · 2022-05-31T05:16:20Z

I do wonder why does solc not emit an "optimized" sourcemap when the bytecode is also optimized..

brockelmore · 2022-05-31T15:01:17Z

I do wonder why does solc not emit an "optimized" sourcemap when the bytecode is also optimized..

to be clear, it tries its best to do so. I think it just mangles the bytecode to a point that so much of the bytecode doesnt actually map to anything in the source. So it technically does. The primary issue i believe is that a lot of the optimizations that are done are done via "peephole" optimizations, optimizations done on the bytecode itself, which means it doesnt track to the AST so the AST is only partially optimized relative to the bytecode.

gakonst · 2022-06-03T18:56:12Z

👀 it's happening

onbjerg · 2022-06-05T12:22:33Z

Hey guys, an update.

First of all, to everyone following the issue, sorry for the radio silence and lack of activity. I've both had an attempt at something more complex (in order to address an issue I think might come up), which I've put on hold for now, and also had some family related issues that impacted my mental health.

On the latest commit before this comment I see that a lot of things in lil-web3 is covered:

I still need to:

Detect executable lines
Collect branch hit data
Test the LCOV reporter

That seems like a shorter to do list, but the current thinking is that I get this into a useable experimental state and create issues separately for the remaining items, such as the HTML reporter, questions around some of the statement nodes, and figuring out what we support, what we should support, and what we can support.

I've noticed some things that I need to address before this can be merged:

Libraries currently are not covered. I think this is because source maps are missing for these for some reason, so I'm trying to figure out why that is
receive() functions are not marked as covered - I think this is because of some optimization Solidity does that I don't currently handle correctly
The dependency and test contract checks are not super robust, and I probably need to filter on an AST level for test contracts instead of on a file level: projects like Solmate usually create concrete implementations of their abstract contracts in their test files in order to test functionality - this is not currently handled

onbjerg · 2022-06-09T15:55:28Z

Update again, I'm currently working on hooking up branch hit data and concurrently checking out issue #2 I wrote about (the receive() call). I think the receive() function is optimized away in some situations since it essentially just disables the value transfer check, and in those cases there is no source mapping. I think it might be fine to leave that case out for now, but I want to make sure that it works as expected for receive() functions that have a function body

gakonst · 2022-06-14T08:02:07Z

I think we need a rebase per the recent refactor by @mattsse. Should we make 0/0's display as not-green so that it's not confusing?

That seems like a shorter to do list, but the current thinking is that I get this into a useable experimental state and create issues separately for the remaining items, such as the HTML reporter, questions around some of the statement nodes, and figuring out what we support, what we should support, and what we can support

I'm supportive of this, esp. the HTML reporter and edge cases around various nodes. e.g. in the Receive case, if you think it's a lot of work, let's track separately and can document it as a known limitation

onbjerg · 2022-06-14T19:48:03Z

Getting there - some items are considered uncovered even though they are. This is related to how I pick anchors (see 3bb07ce and 5742f53), since I am only considering the runtime source map

Additionally, branch coverage isn't really true in this current iteration, since:

It considers if statements with no else block to be fully covered even if the "false" branch is never hit (we need to create a "virtual" one)
It doesn't consider require and assert... These are more complicated

gakonst · 2022-06-14T19:49:08Z

Everytime I see a commit w progress on this PR I jump out of my seat lol

Do you have a 1-paragraph explanation of "anchors" and why they help here?

onbjerg · 2022-06-14T19:56:47Z

@gakonst Each AST node has an attached source range: a start byte and (optionally) a length. Previously I would take each opcode we executed, look it up in the source map and see if the offset described in the source map matched the start byte of one of our coverage items.

This doesn't always work: sometimes the opcode that should mark an item as covered might be somewhere else in the range, and sometimes an opcode actually marks multiple items as covered because of the optimizer. Anchors are essentially opcodes that we pick that would mark items as covered.

Another benefit of the anchor approach is that we can change what anchor we pick for more complicated items such as branches. For example we might want to pick a specific type of opcode within the source range (like a JMPI or something).

Hope what I wrote made sense, it's a little hard to explain lol

gakonst · 2022-06-14T19:58:27Z

This makes sense (also copying @jpopesculian and @rkrasiuk who seem to understand the AST stuff better than I frmo the work on forge fmt in case they have thoughts / comments).

Separately:

It considers if statements with no else block to be fully covered even if the "false" branch is never hit (we need to create a "virtual" one)

This makes sense.

It doesn't consider require and assert... These are more complicated

What makes them more complicated? Doesn't each require/assert(bool) translate to an if (bool) statement and is like a branch?

onbjerg · 2022-06-14T20:11:47Z

What makes them more complicated? Doesn't each require/assert(bool) translate to an if (bool) statement and is like a branch?

Ideally for branches we want to have three states: not executed, one path executed (either true or false) or both branches executed (i.e. covered).

Currently for if statements with no else block there are two states: not executed or true branch executed which is not ideal. I think we probably want to signal to the developer if they only covered 1 possible path in their test.

For require and assert though, all of this is marked by specific opcodes, and I need to figure out which. The general layout seems to be:

<some assertion>
PUSH <pc at which we continue execution if assertion holds>
JUMPI
<setup revert and revert>

So my assumption is that the false branch would be right after JUMPI. For the true branch, I'm not sure - maybe I need to record what value is on the stack every time we encounter a JUMPI so I can look it up later and check if we both got a 0 and a non-0 value? :/

Edit: I checked it out on Godbolt btw, here is my contract

// SPDX-License-Identifier: UNLICENSED
pragma solidity >=0.4.0;

contract Test {
    function test(uint256 a) public pure returns (uint256) {
        require(a > 1, "abc");
    }
}

And here is the bytecode for require:

  PUSH 1
  DUP3 
  GT 
  PUSH [tag] test_uint256_6
  JUMPI 
  PUSH 40 # if we end up here, we will end up reverting
# load revert string
  MLOAD 
  PUSH 8C379A000000000000000000000000000000000000000000000000000000000
  DUP2 
  MSTORE 
  PUSH 4
  ADD 
# set return addr
  PUSH [tag] test_uint256_5
  SWAP1
# some abi encoding whatever
  PUSH [tag] abi_encode_tuple_t_stringliteral_4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45__to_t_string_memory_ptr__fromStack_reversed_0
  JUMP [in]
tag test_uint256_5
# after abi encoding we end up here
  JUMPDEST 
  PUSH 40
  MLOAD 
  DUP1 
  SWAP2 
  SUB 
  SWAP1 
  REVERT 
tag test_uint256_6
  JUMPDEST # if we end up here we didn't revert

Previously coverage was based on the start of AST nodes in the source code. With the anchoring system, opcodes in relevant source maps are searched, and the first one within the source range of the AST node is selected as an anchor. If the anchor is executed once, then the coverage item is considered to be covered. This works very well, but false positives and edge cases still need to be found.

Consider the full range instead of just the offset. Unfortunately breaks detection of some items; suspected to be related to not using constructor source maps.

mattsse

smol nit

lgtm!

how can we test this properly?

evm/src/coverage/mod.rs

mattsse · 2022-06-22T15:28:59Z

evm/src/coverage/visitor.rs

+
+    fn visit_contract(&mut self, node: Node) -> eyre::Result<()> {
+        let is_contract =
+            node.attribute("contractKind").map_or(false, |kind: String| kind == "contract");


I see,
sounds reasonable, we could however think of adding an enum (Known)Attribute that's AsRef<str> so we can do node.attribute(Attribute::Abstract)

but no real benefit to that I'd assume

evm/src/coverage/visitor.rs

onbjerg · 2022-06-22T17:14:46Z

smol nit

lgtm!

how can we test this properly?

Not sure how to best do automated testing for it yet, my best guess is to do some sort of snapshot testing against external repos - so, we run forge coverage for external repos, snapshot the output and if it differs we'd have to manually check that it looks ok. For some parts of the code though we can probably be a bit smarter about it (e.g. the format of the reporters, parts of the visitor).

There's still some way to go before this is dependable (see above) but those will be in other PRs

mattsse · 2022-06-22T17:28:27Z

SGTM

onbjerg added the T-feature Type: feature label May 10, 2022

onbjerg mentioned this pull request May 10, 2022

dapptools-style coverage #99

Closed

mattsse reviewed May 12, 2022

View reviewed changes

onbjerg force-pushed the onbjerg/coverage branch from 9adcdb8 to 4b5196d Compare May 12, 2022 13:36

gakonst reviewed May 14, 2022

View reviewed changes

gakonst force-pushed the master branch 4 times, most recently from e4cdc0a to 8112635 Compare May 24, 2022 07:21

MathisGD mentioned this pull request May 26, 2022

Check Solidity Coverage morpho-org/morpho-optimizers#372

Closed

onbjerg force-pushed the onbjerg/coverage branch from b18b677 to e3a9757 Compare May 27, 2022 15:46

onbjerg force-pushed the onbjerg/coverage branch 2 times, most recently from 0b091f3 to b3d2aaf Compare June 3, 2022 04:18

onbjerg force-pushed the onbjerg/coverage branch from be8ec7d to fe85aea Compare June 5, 2022 09:51

onbjerg force-pushed the onbjerg/coverage branch from 24dcba4 to 1086f4f Compare June 14, 2022 18:48

onbjerg added 18 commits June 22, 2022 17:00

feat: detect executable lines

d20e050

feat: improve reporters a bit

70bf2c4

fix: don't add exec lines for function defs

a6faf22

fix: more robust anchor selection

c654db6

Consider the full range instead of just the offset. Unfortunately breaks detection of some items; suspected to be related to not using constructor source maps.

chore: clippy

6c1182e

chore: revert Cargo.toml

b807c96

chore: remove unnecessary fn

5b4e10b

feat: support --silent

bfcf841

refactor: use FoundryPathExt::is_sol_test

0875d88

refactor: unnecessary casting

f559f10

refactor: use impl Into<PathBuf>

5e98405

refactor: move assumptions to fn docs

137bcc5

refactor: more ergonomic ExecutorBuilder

782ce82

refactor: with_coverage -> set_coverage

ba42e72

refactor: simplify coverage reporters

3cc28d1

refactor: impl AddAssign for CoverageSummary

6dc7d7a

refactor: lcov reporter dest

49757dc

onbjerg force-pushed the onbjerg/coverage branch from 534ec77 to 49757dc Compare June 22, 2022 15:01

refactor: use wrapped fs

ea41b29

onbjerg requested a review from mattsse June 22, 2022 15:04

mattsse approved these changes Jun 22, 2022

View reviewed changes

chore: nits

5453969

onbjerg merged commit 8f0196e into master Jun 22, 2022

onbjerg deleted the onbjerg/coverage branch June 22, 2022 17:42

izlatkin mentioned this pull request Jun 23, 2022

[foundry] add coverage feature izlatkin/solidity_testgen#8

Closed

devtooligan mentioned this pull request Aug 28, 2022

document forge coverage foundry-rs/book#567

Closed

alexroan mentioned this pull request Oct 13, 2022

forge coverage should allow excluding contracts #2988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: forge coverage #1576

feat: forge coverage #1576

onbjerg commented May 10, 2022 •

edited

Loading

mattsse May 12, 2022

gakonst May 11, 2022 •

edited

Loading

brockelmore commented May 30, 2022

brockelmore commented May 30, 2022

gakonst commented May 30, 2022

brockelmore commented May 30, 2022

gakonst commented May 31, 2022

brockelmore commented May 31, 2022

gakonst commented Jun 3, 2022

onbjerg commented Jun 5, 2022

onbjerg commented Jun 9, 2022

gakonst commented Jun 14, 2022

onbjerg commented Jun 14, 2022 •

edited

Loading

gakonst commented Jun 14, 2022

onbjerg commented Jun 14, 2022

gakonst commented Jun 14, 2022 •

edited

Loading

onbjerg commented Jun 14, 2022 •

edited

Loading

mattsse left a comment

mattsse Jun 22, 2022

onbjerg commented Jun 22, 2022

mattsse commented Jun 22, 2022

		// TODO: Maybe support coverage for fuzz tests
		coverage: None,

feat: forge coverage #1576

feat: forge coverage #1576

Conversation

onbjerg commented May 10, 2022 • edited Loading

Usage

Background

Report types

Compared to other coverage tools

Current status

Task list

Other

mattsse May 12, 2022

Choose a reason for hiding this comment

gakonst May 11, 2022 • edited Loading

Choose a reason for hiding this comment

brockelmore commented May 30, 2022

brockelmore commented May 30, 2022

gakonst commented May 30, 2022

brockelmore commented May 30, 2022

gakonst commented May 31, 2022

brockelmore commented May 31, 2022

gakonst commented Jun 3, 2022

onbjerg commented Jun 5, 2022

onbjerg commented Jun 9, 2022

gakonst commented Jun 14, 2022

onbjerg commented Jun 14, 2022 • edited Loading

gakonst commented Jun 14, 2022

onbjerg commented Jun 14, 2022

gakonst commented Jun 14, 2022 • edited Loading

onbjerg commented Jun 14, 2022 • edited Loading

mattsse left a comment

Choose a reason for hiding this comment

mattsse Jun 22, 2022

Choose a reason for hiding this comment

onbjerg commented Jun 22, 2022

mattsse commented Jun 22, 2022

onbjerg commented May 10, 2022 •

edited

Loading

gakonst May 11, 2022 •

edited

Loading

onbjerg commented Jun 14, 2022 •

edited

Loading

gakonst commented Jun 14, 2022 •

edited

Loading

onbjerg commented Jun 14, 2022 •

edited

Loading