dapptools-style coverage #99

gakonst · 2021-10-12T11:54:37Z

Implemented at: https://github.com/dapphub/dapptools/blob/728a9245fa5f78589b0cedec0ade2da5433ad792/src/hevm/src/EVM/UnitTest.hs#L251-L382

Example: https://twitter.com/dapptools/status/1435973810545729536

transmissions11 · 2021-10-12T16:14:54Z

sadly theirs is pretty bad tbh, it doesn't have any branching info for == statements, etc

gakonst · 2021-10-12T16:43:14Z

can you recommend an algorithm, and we can start scoping it out along with how we'd utilize sourcemaps?

transmissions11 · 2021-10-12T17:18:31Z

only good one i know is solidity-coverage's but it works by instrumenting the solidity whereas we'd probably want to do it at the VM level?

gakonst · 2021-10-12T17:50:26Z

Yes, I didn't love the injected events approach...what is the hevm approach missing? is it hard?

onbjerg · 2022-01-17T12:17:45Z

More notes:

What's missing: HEVM approach does not account for branching in logical statements (||, <, &&, >, ==, etc) in requires (and in other places?)
HEVM approach is this: On each step, look up where we are in the bytecode, then map that onto the relevant line in the source code and increment the "visited" counter on that line by 1. Any line that has been visited at least once is considered covered
Solidity Coverage approach: Parsing the contract into an AST, walking the AST and finding points of interest and then injecting function calls or events they can hook into later
We should generate the coverage report in two formats: in-line text report for the CLI and an LCOV file for CI purposes/additional tooling
We should probably investigate doing the HEVM approach but making up for their shortcomings (i.e. only do the solidity-coverage emit-events approach as a last resort)

onbjerg · 2022-01-19T00:37:57Z

Notes on how we can achieve coverage:

We parse the source files using solang-parser and take a note of where we want to check for coverage in three buckets: branches, functions and lines. These ranges are saved somewhere so we can reference them later (solidity-coverage has a file with the relevant AST nodes in it we can reference)
We hook into Handler::pre_validate in Sputnik, lookup where we are in the source and check if that is a range we are watching. If it is we increment the visited count for that piece of code by 1 (regardless of if it's a branch, function or line).
Each test generates its own coverage report. At the end of testing, we merge them
This coverage report should be easily converted into at least the lcov format, since we can use that to either: dump it in a file, generate a HTML report or print status on the CLI.

Points where I'm stuck:

I'm not sure how to get the source map in the MultiContractRunnerBuilder. It seems like I somehow need to switch the artifact output type?
Generally, is there any real reason we have different representations of the artifacts as a consumer of the ethers-solc package as opposed to a generalized struct?
It seems like the debugger used a manual compile step and I think if we need to do that for coverage too we should consider at least some sort of refactoring so we don't end up having a lot of different compile pipelines scattered around as it is hard to maintain

mds1 · 2022-01-19T14:54:10Z

One thing to note is when improving fuzzing as part of #387, we may want to hook into the coverage results to enable coverage-guided fuzzing, i.e. the fuzzer would know what parts of code have or have not been covered and use that to guide input mutations. I'm not familiar enough with the internals to know if the above plan enables that but just wanted to mention it in case it affects coverage implementation

maurelian · 2022-04-05T19:47:41Z

Just want to add a strong vote for this feature. For context, our monorepo relies heavily on solidity coverage, and CI checks will fail when a change would reduce coverage below a certain threshold.

mds1 · 2022-04-05T19:50:19Z

@maurelian I'm planning to post an initial UX proposal for coverage later today (though deferring to @gakonst on actual implementation timelines), which will be based on my experience with dapptools + Instanbul style coverage reports, since those are the main coverage tools I've used. If you have any UX suggestions or tools who's reports/workflows you like/can link to, that would be great!

gakonst · 2022-04-05T21:49:12Z

ACK - coverage is next up on our prio list.

mds1 · 2022-04-05T22:25:33Z

UX Proposal

This is largely based on how Istanbul's JS coverage reports work, since that's what I've used the most and haven't had any big issues with it's approach/formatting (whereas the dapptools format is clunky and not great IMO). Istanbul is what Hardhat's coverage tool uses to generate output data/reports.

As a result, this initial proposal pretty much just describes how Istanbul works, but happy to iterate and change things based on feedback.

Standard Syntax

Coverage should not be the default since the added instrumentation will likely significantly slow down tests, so instead we should run coverage with:

forge test --coverage

It would be great if this also supported watch mode and the various match flags, so you can do

forge test --coverage -w --match-contract ContractWereIncreasingCoverageFor

CI Syntax

I'd be curious to hear more from @maurelian about how this should work, but a CI mode would be great so you can configure your CI to fail if aggregate coverage drops below some threshold. One possible syntax is instead of running the above command you'd run:

forge test --coverage-threshold 95

This will cause the process to exit with an error if coverage is < 95%

Terminal Output

At the bottom of the terminal output, append a table summary of coverage to the standard test output. Something like the below code block, where >= 80% coverage is colored green, 65% <= x < 80% coverage is yellow, and < 65% coverage is red (I believe these are the thresholds used by Istanbul).

$ forge test --coverage

[⠆] Compiling...
No files changed, compilation skipped

Running 1 test for src/test/MyContract.t.sol:MyContractConstructor
[PASS] testConstructor() (gas: 31072)
Test result: ok. 1 passed; 0 failed; finished in 10.32ms

-------------------|-----------|-----------|-----------|-----------|
File               |   % Stmts |% Branches |   % Funcs |   % Lines |
-------------------|-----------|-----------|-----------|-----------|
src/               |      100  |     100   |     100   |      100  |
  MyContract.sol   |      100  |     100   |     100   |      100  |
-------------------|-----------|-----------|-----------|-----------|
All files          |      100  |     100   |     100   |      100  |
-------------------|-----------|-----------|-----------|-----------|

=============================== Coverage summary ===============================
Statements   : 100% ( 350/350 )
Branches     : 100% ( 176/176 )
Functions    : 100% ( 81/81 )
Lines        : 100% ( 317/317 )
================================================================================

File Output

In addition to the terminal output, output files containing more details should be saved to a coverage folder. Istanbul saves this as HTML/JS/CSS files where:

The top level HTML file is a similar summary to the terminal output
You can click into each file to see summary tables for each folder/subfolder, then can click on specific files for coverage details about that file.
The JS files are so you can sort tables and navigate the pages more easily.
The CSS files are unsurprisingly for styling the report.

The syntax used by Istanbul in these reports can be found here. To summarize:

The following coverage criteria are reported: function coverage, statement coverage, branch coverage, and line coverage.
When viewing source code in an HTML file, the lines are marked with E or I to specify whether the else path or if path was not taken.
A number such as 291x indicates the line was executed 291x times.
Statements, functions, branches, and lines that were not covered are colored pink, orange, yellow, and red, respectively

Screenshots from random reports I found online are below:

onbjerg · 2022-04-05T22:41:53Z

Coverage should generate LCOV files at the very least so it is uploadable to tools like CodeCov. The LCOV file can also be used to generate the HTML reports and so on, so really, we "only" need the LCOV file and the rest can be built on top.

For coverage thresholds, I think people mostly defer to other tools that also take LCOV?

mds1 · 2022-04-05T22:56:48Z

Coverage should generate LCOV files at the very least so it is uploadable to tools like CodeCov. The LCOV file can also be used to generate the HTML reports and so on, so really, we "only" need the LCOV file and the rest can be built on top.

Ah perfect that makes sense, thanks.

For coverage thresholds, I think people mostly defer to other tools that also take LCOV?

Seems this is what Optimism does. Though is there a reason we wouldn't want to integrate at least a simple coverage threshold natively so it can be used to trigger a CI failure? Seems nice to not need a third-party tool, and it'd facilitate having a better forge init out of the box. But I haven't used Codecov or other similar tools, so I'm not sure what nice-to-have features they may have that a native approach won't have.

One thing I forgot to mention is how fuzz/invariant tests should affect coverage reports. I think there's a few ways we can handle this:

Ignore fuzz/invariant tests from coverage reports so reports are deterministic.
Include them, and settle for non-deterministic coverage. This is arguably ok since it means you may be relying too heavily on broad fuzz tests and not enough targeted unit tests?
Include fuzz runs, but only after implementing feat: enable saving and loading of corpus for deterministic fuzzing #991 so coverage can be deterministic.
Add a flag so the user can determine whether or not fuzz test runs should be included in coverage reports.

maurelian · 2022-04-06T03:47:38Z

@mds1:

I'd be curious to hear more from @maurelian about how this should work, but a CI mode would be great so you can configure your CI to fail if aggregate coverage drops below some threshold.

Yeah, I can't find it now that we've migrated from GHA to Circle CI, but we previously dependedon tooling from CodeCov to cause the CI failure.

One thing I forgot to mention is how fuzz/invariant tests should affect coverage reports. I think there's a few ways we can handle this.

When I think of coverage I think of unit tests, and I'd want to be able to use the report to direct me to where further tests are needed. So, my default is to not mix the two kinds of coverage up, at most have a separate report for each.

mds1 · 2022-04-18T01:48:53Z

When I think of coverage I think of unit tests, and I'd want to be able to use the report to direct me to where further tests are needed. So, my default is to not mix the two kinds of coverage up, at most have a separate report for each.

Hmm, interesting! I find sometimes I only write fuzz tests for certain portions of code, since having a separate concrete test for the same code is redundant. But I see your use case, so perhaps an --include-fuzz flag is the best option here. And if you always run forge coverage --include-fuzz, but see a regression in coverage, that's an indication you probably should e.g. add a concrete test to ensure consistent coverage.

(Side note: Above I suggested forge test --coverage, but when writing this I naturally went to forge coverage. I think either is fine fwiw)

Willyham · 2022-04-18T14:09:53Z

Hello, I'm very interested in helping out with this feature! I posted some discussion here #1348 before I realised this issue existed.

Initial thoughts after catching up:

Are we settled on the implementation at the solidity level via AST manipulation rather than at the VM level?
I'm definitely in favour of the tooling just producing a standardised format and leaving the reporting/html generation/etc to other tools. There are many of them which already exist. Does rust have a native coverage format? Ideally using a format which doesn't require another package manager (e.g. npm) to install dev tooling would be nice.
I would personally include fuzzing in the coverage but I can see the argument for a flag. Would prefer if the flag was opt-out though.
I would lean towards forge test --cover[age] because all the same options (e.g. --match) will be useful/needed as well. forge cover could be an alias for forge test --cover.

And finally, the gold standard here would be to integrate the coverage report with the solidity plugins for editors, so that you can see the coverage results directly in your editor every time you save. This massively plays to forge's strength as tests are feasible to run on save, which isn't really possible with other environments.

The Go plugin does this and it makes the feedback loop for tests incredibly fast

I am completely new to the codebase but keen to help out. I think the best way to parallelise would be to try and understand any dependencies or refactors needed and possibly try to create some distinct issues 🙏

onbjerg · 2022-04-18T14:39:43Z

Are we settled on the implementation at the solidity level via AST manipulation rather than at the VM level?

I think the current thinking is that we would not use the AST. We would instead on a VM-level check jump instructions and map those to the source code using source maps, but the approach hasn't been validated so that might change.

And finally, the gold standard here would be to integrate the coverage report with the solidity plugins for editors, so that you can see the coverage results directly in your editor every time you save. This massively plays to forge's strength as tests are feasible to run on save, which isn't really possible with other environments.

Do you have any additional info on how this works? I'd assume it uses LCOV - if that's the case, then integration should be simple. We probably just need a way to output the raw LCOV to stdout

Willyham · 2022-04-18T15:32:11Z

I think the current thinking is that we would not use the AST. We would instead on a VM-level check jump instructions and map those to the source code using source maps, but the approach hasn't been validated so that might change.

Ah right, thanks. I had a question on this; if the compiler makes optimisations like removing branches, can we still map the bytecode back to source lines? Users clearly operate on the source-line level, so seeing missing coverage on a line because of something the compiler does would be confusing. I've seen that some languages seem to implement coverage with LLVM so I assume this is a solved problem, but not sure how it works.

Do you have any additional info on how this works? I'd assume it uses LCOV - if that's the case, then integration should be simple. We probably just need a way to output the raw LCOV to stdout

Go has its own native coverage format (which is an extremely simple text file). The editor runs the tests and the file is output to a tmp directory, which is then read, parsed and rendered in the editor. I'd be happy to try and take this part on because I might be more use there, as I'm very unfamiliar with the vm code and rust.

onbjerg · 2022-04-18T15:52:10Z

Ah right, thanks. I had a question on this; if the compiler makes optimisations like removing branches, can we still map the bytecode back to source lines? Users clearly operate on the source-line level, so seeing missing coverage on a line because of something the compiler does would be confusing. I've seen that some languages seem to implement coverage with LLVM so I assume this is a solved problem, but not sure how it works.

Probably not. The source maps in Solidity are really flaky when you turn the optimizer on, so we would probably need the optimizer to be off. I'm not sure, though, I'll figure that out as I implement the feature

Go has its own native coverage format (which is an extremely simple text file). The editor runs the tests and the file is output to a tmp directory, which is then read, parsed and rendered in the editor. I'd be happy to try and take this part on because I might be more use there, as I'm very unfamiliar with the vm code and rust.

Ah, I misunderstood as well - you don't want Forge to necessarily support Go's coverage format, just a coverage format that can be used in editors? If so, I think most editors should be able to support LCOV (used by C and others), or at least something proximal to it

Willyham · 2022-04-18T16:32:02Z

Running it with the optimizer off makes sense 👍

For the coverage; yes LCOV format would be good. The Go description was just an example of the how the output might be used in editors. As long as forge can output the LCOV file to a specified location, I should be able to integrate it with editor plugins relatively easily.

maurelian · 2022-04-21T13:46:42Z

forge coverage feels more consistent to me with the existence of forge snapshot.

onbjerg · 2022-04-21T18:59:26Z

I also think forge coverage makes more sense to not clutter the interface of forge test with even more options. For example, we want at least --coverage-threshold (or just --threshold if on forge coverage), report types (--report <type>) and more.

There are also some clashes between forge test and the coverage behaviour; for example, we can't run coverage with optimizations turned on, but since you could conceivably do forge test --gas-report --coverage this may lead to "bug reports" even though it is intentional, since it might be unexpected behavior

gitcoinbot · 2022-04-29T03:49:00Z

Issue Status: 1. Open 2. Started 3. Submitted 4. Done

This issue now has a funding of 20000.0 DAI (20000.0 USD @ $1.0/DAI) attached to it as part of the synapsecns fund.

If you would like to work on this issue you can 'start work' on the Gitcoin Issue Details page.
Want to chip in? Add your own contribution here.
Questions? Checkout Gitcoin Help or the Gitcoin's Discord
$4,415,122.51 more funded OSS Work available on the Gitcoin Issue Explorer

0xvv · 2022-04-29T10:28:09Z

Trying to summarize the TODOs

add forge coverage
add --threshold to coverage to enforce minimal coverage
add --include-fuzz to coverage (or --exclude-fuzz if we chose opt-out)
parse sources with solang-parser and creates buckets
hook into Sputnik Handler::pre_validate and increment visited count
generate each test coverage report
merge all tests reports
generate LCOV file
generate terminal output (from LCOV ?)

Anyone already started working on this ?
I'm down to work on it but may lack some knowledge on the VM side

onbjerg · 2022-04-29T17:07:38Z

Yes, I am actively working on it. Will post a status update tomorrow (I am not home)

The todo list is no longer valid since that was based on Sputnik, and we've since moved to REVM :)

gitcoinbot · 2022-04-30T03:31:56Z

Issue Status: 1. Open 2. Started 3. Submitted 4. Done

Work has been started.

These users each claimed they can complete the work by 264 years, 6 months from now.
Please review their action plans below:

1) bastek8 has applied to start work (Funders only: approve worker | reject worker).

Program i wszystlo z hun zwiazane
2) onbjerg has been approved to start work.

I was already working on this feature before the bounty was added
3) malavekevin36 has applied to start work (Funders only: approve worker | reject worker).

I work plan and whoever don’t want to come back to work they can find a new job

Learn more on the Gitcoin Issue Details page.

onbjerg · 2022-04-30T16:16:32Z

So, as promised, here is a status update.

I was already working on this issue prior to the bounty (which is why I am assigned).

Initially, the thought was to analyse JUMPI and JUMPDESTs to find branches (i.e. require, assert, if etc.), however it would not get us all the way in terms of the information we want - we want functions, branches, executable lines and statements.

Background

The primary mode of operation for the coverage tool will be to collect the following pieces of information:

Functions
Executable lines: lines, excluding lines that do nothing (whitespace, comments, brackets on single lines etc.)
Branches: If statements, calls to require and asserts
Statements: executable statements, which differ from lines in that you can have multiple statements per executable line.

This information will be collected from the AST of the Solidity files. Then the tests are run with a special coverage collector that marks opcodes as hit as they are executed, which we later use (along with source maps) to map back to the coverage items.

Report types

The above information will be output either to stdout as a simple report, or as an LCOV file that you can upload to services like CodeCov. A stretch goal would be to add a HTML report as well, but since tools already exist for converting LCOV to HTML, it isn't a deal breaker if it is not in the first version.

Compared to other coverage tools

DappTools's coverage tool is not super great (see comments above) as it does not support outputting to standard reporting formats like LCOV, and it does not treat require and assert as branches.
Hardhat Coverage is closer to what we want, but it's slow and it distorts gas since it instruments the code of the contracts

Current status

I went with the jump analysis first, but as I found it insufficient, I had to scrap most of my work and start over with the method described above. The coverage collector is done, but I am currently having some issues constructing the higher level coverage data that the reports will use (see below)

Current issues

The AST implementation in ethers-solc is not entirely complete yet, so I am implementing nodes as I go
The AST implementation in ethers-solc only works for Solidity >=0.8.0 currently since the AST had a breaking change there, so we need to add support for <=0.8.0 but it is not a blocker since this work can be done in parallel
Source maps in Solidity identify source files with a source ID. This source ID is required to be unique, and it is if you compile all files in one go, but ethers-solc currently tries to compile multiple files in parallel, leading to source ID clashes. This is a blocker, since we need to be able to map bytecode of any contract to the coverage items using the source map.

onbjerg · 2022-05-10T16:21:21Z

Sorry guys, this is taking longer than expected. I've opened up a draft PR with a status update, and I intend to update that PR as I go with new status updates if relevant. #1576

onbjerg · 2022-06-22T17:43:01Z

Reopening this - still some edge cases and missing stuff. See this label: Cmd-forge-coverage Command: forge coverage

maurelian · 2022-09-29T12:56:56Z

May I ask what the status is with coverage generation of fuzz tests? It looks to me as though they are not yet included and I don't yet see an --include-fuzz flag as discussed above.

lucas-manuel · 2023-05-18T14:51:32Z

Close-able? @mds1

mds1 · 2023-05-19T15:30:57Z

Let's leave this open for now, eventually all coverage stuff will be aggregated into #4442 and I want to make sure things in here don't get missed

hellwolf · 2024-02-18T12:58:38Z

I have a question. I still see branch coverage misses for assertions. E.g.

I don't think it is ideal. But what is the expected behavior as of now?

zerosnacks · 2024-06-26T09:45:42Z

Marking as complete as the feature has been implemented (minus some small bugs and follow-up features). Closing in favor of #4442

onbjerg added T-feature Type: feature C-forge Command: forge Cmd-forge-test Command: forge test labels Jan 19, 2022

onbjerg self-assigned this Apr 18, 2022

onbjerg mentioned this issue May 10, 2022

feat: forge coverage #1576

Merged

14 tasks

onbjerg closed this as completed in #1576 Jun 22, 2022

onbjerg reopened this Jun 22, 2022

onbjerg added Cmd-forge-coverage Command: forge coverage and removed Cmd-forge-test Command: forge test labels Jun 28, 2022

onbjerg added this to the v1.0.0 milestone Jul 1, 2022

onbjerg modified the milestones: v1.0.0, v0.3.0 Aug 15, 2022

onbjerg modified the milestones: v0.3.0, v1.0.0 Aug 23, 2022

zerosnacks mentioned this issue Jun 26, 2024

meta(coverage): tracking issue for coverage improvements #4442

Open

11 tasks

zerosnacks closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dapptools-style coverage #99

dapptools-style coverage #99

gakonst commented Oct 12, 2021

transmissions11 commented Oct 12, 2021

gakonst commented Oct 12, 2021

transmissions11 commented Oct 12, 2021

gakonst commented Oct 12, 2021

onbjerg commented Jan 17, 2022

onbjerg commented Jan 19, 2022 •

edited

Loading

mds1 commented Jan 19, 2022

maurelian commented Apr 5, 2022 •

edited

Loading

mds1 commented Apr 5, 2022

gakonst commented Apr 5, 2022

mds1 commented Apr 5, 2022

onbjerg commented Apr 5, 2022

mds1 commented Apr 5, 2022

maurelian commented Apr 6, 2022 •

edited

Loading

mds1 commented Apr 18, 2022

Willyham commented Apr 18, 2022

onbjerg commented Apr 18, 2022

Willyham commented Apr 18, 2022

onbjerg commented Apr 18, 2022

Willyham commented Apr 18, 2022

maurelian commented Apr 21, 2022

onbjerg commented Apr 21, 2022 •

edited

Loading

gitcoinbot commented Apr 29, 2022

0xvv commented Apr 29, 2022

onbjerg commented Apr 29, 2022 •

edited

Loading

gitcoinbot commented Apr 30, 2022 •

edited

Loading

onbjerg commented Apr 30, 2022 •

edited

Loading

onbjerg commented May 10, 2022

onbjerg commented Jun 22, 2022

maurelian commented Sep 29, 2022

lucas-manuel commented May 18, 2023

mds1 commented May 19, 2023

hellwolf commented Feb 18, 2024

zerosnacks commented Jun 26, 2024

dapptools-style coverage #99

dapptools-style coverage #99

Comments

gakonst commented Oct 12, 2021

transmissions11 commented Oct 12, 2021

gakonst commented Oct 12, 2021

transmissions11 commented Oct 12, 2021

gakonst commented Oct 12, 2021

onbjerg commented Jan 17, 2022

onbjerg commented Jan 19, 2022 • edited Loading

mds1 commented Jan 19, 2022

maurelian commented Apr 5, 2022 • edited Loading

mds1 commented Apr 5, 2022

gakonst commented Apr 5, 2022

mds1 commented Apr 5, 2022

UX Proposal

Standard Syntax

CI Syntax

Terminal Output

File Output

onbjerg commented Apr 5, 2022

mds1 commented Apr 5, 2022

maurelian commented Apr 6, 2022 • edited Loading

mds1 commented Apr 18, 2022

Willyham commented Apr 18, 2022

onbjerg commented Apr 18, 2022

Willyham commented Apr 18, 2022

onbjerg commented Apr 18, 2022

Willyham commented Apr 18, 2022

maurelian commented Apr 21, 2022

onbjerg commented Apr 21, 2022 • edited Loading

gitcoinbot commented Apr 29, 2022

0xvv commented Apr 29, 2022

onbjerg commented Apr 29, 2022 • edited Loading

gitcoinbot commented Apr 30, 2022 • edited Loading

onbjerg commented Apr 30, 2022 • edited Loading

Background

Report types

Compared to other coverage tools

Current status

Current issues

onbjerg commented May 10, 2022

onbjerg commented Jun 22, 2022

maurelian commented Sep 29, 2022

lucas-manuel commented May 18, 2023

mds1 commented May 19, 2023

hellwolf commented Feb 18, 2024

zerosnacks commented Jun 26, 2024

onbjerg commented Jan 19, 2022 •

edited

Loading

maurelian commented Apr 5, 2022 •

edited

Loading

maurelian commented Apr 6, 2022 •

edited

Loading

onbjerg commented Apr 21, 2022 •

edited

Loading

onbjerg commented Apr 29, 2022 •

edited

Loading

gitcoinbot commented Apr 30, 2022 •

edited

Loading

onbjerg commented Apr 30, 2022 •

edited

Loading