Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verification loggers and report generator #38

Merged
merged 42 commits into from
Mar 11, 2022
Merged

Conversation

robin-aws
Copy link
Member

@robin-aws robin-aws commented Mar 2, 2022

This change adds a best practice we recommend for all Dafny projects: measuring the cost of each individual verification task and setting an upper bound in CI, to guard against future verification instability (see dafny-lang/dafny#1582).

This is done with two steps:

  1. Add the /verificationLogger:csv parameter when invoking dafny, which outputs a CSV file with verification results.
  2. Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

There are more details on the latter tool here: https://github.com/dafny-lang/dafny-reportgenerator

The good news is that (at least after #36), this codebase is in good shape! :) All tasks take less than 5 seconds across a few runs, so I've set 10 seconds as the upper bound for now. I would prefer to bound the resource count instead since that is a more predictable metric, but Dafny doesn't record the resource count when splitting happens until the upcoming 3.5 release.

Note that because the /verificationLogger options cause extra output, I've also changed the lit configuration to no longer assert the exact output of dafny when verifying everything. This will make the build more stable by not depending on the exact console output, as all of the code here is always expected to verify successfully.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

sarahc7 and others added 30 commits September 2, 2021 11:03
Dummy verification-breaking change
…ation-logger

# Conflicts:
#	.github/workflows/tests.yml
@robin-aws robin-aws self-assigned this Mar 3, 2022
@robin-aws robin-aws changed the title Drop lit, add verification logger Add verification logger and report generator Mar 10, 2022
@robin-aws robin-aws marked this pull request as ready for review March 10, 2022 19:24
@robin-aws robin-aws requested review from RustanLeino and a team March 10, 2022 19:28
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this won't work with the current version of Dafny since it can't generate both CSV and TRX reports at the same time. That should be easy to fix, though.

.github/workflows/tests.yml Outdated Show resolved Hide resolved
@robin-aws robin-aws changed the title Add verification logger and report generator Add verification loggers and report generator Mar 10, 2022
atomb
atomb previously approved these changes Mar 10, 2022
@cpitclaudel
Copy link
Member

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

@@ -0,0 +1,16 @@
name: 'Test Report'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test sounds like a verb here to me. Maybe Generate test report? Same in the file name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point! This is from the template at https://github.com/dorny/test-reporter so I opened a PR to fix it there too. :)

- uses: dorny/test-reporter@v1
with:
artifact: verification-results
name: Verification Tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the name "test" here? Isn't it reporting verification metrics? Or is it related to Dafny's :test somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I just did a poor job of editing the template. :) I'm changing it to "Verification Results"

run: lit --time-tests -v --param 'dafny_params=/verificationLogger:trx /verificationLogger:csv' .

- name: Generate Report
run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10
run: find . -name '*.csv' -print0 | xargs -0 --verbose dafny-reportgenerator summarize-csv-results --max-duration-seconds 10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are too many .csv reports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm a bit concerned that will be an issue in the future and cut an issue here: dafny-lang/dafny-reportgenerator#4. The current command length is 1505 so we've got a fair bit of headroom under the 4K limit at least (if it applies to the commands that xargs generates).

@keyboardDrummer
Copy link
Member

keyboardDrummer commented Mar 11, 2022

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?

Co-authored-by: Clément Pit-Claudel <[email protected]>
@robin-aws
Copy link
Member Author

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

I tried to address that in the README for the tool here (https://github.com/dafny-lang/dafny-reportgenerator): "This is better than setting a more aggressive verification cost bound through options like /timeLimit directly, as it allows users to know that their code is still correct, but still blocks code changes that are too expensive to verify and hence likely to break in the future."

@robin-aws
Copy link
Member Author

I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?

Fair question, and especially with the limited functionality the tool is providing so far you could imagine it just being yet more options on the dafny CLI. But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the dafny tool. It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke dafny repeatedly you could get different non-deterministic results (at least in terms of duration).

For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like --logger (for test results) and --collect (for things like testing coverage), and other tools are used to report on that data in various ways.

Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The improvements @cpitclaudel suggested definitely seem nice to include.

@keyboardDrummer
Copy link
Member

keyboardDrummer commented Mar 11, 2022

But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the dafny tool.

What would those features be? When I think of a feature like detecting proofs with a high verification time variability, that seems like something useful to have in the Boogie CLI.

It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke dafny repeatedly you could get different non-deterministic results (at least in terms of duration).

What do you mean by analyse in multiple ways? Do you mean take the measurements from different proof runs and apply different statistics to them? I imagine in the general case it would be fine to have Boogie apply a common statistic after which you wouldn't want to do any further analysis, which you'd use in your CI.

For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like --logger (for test results) and --collect (for things like testing coverage), and other tools are used to report on that data in various ways.

I'm all for other tools using Dafny's output, but I think that any Dafny related features customers should use in their CI should be in the Dafny CLI.

@robin-aws robin-aws merged commit 44f5891 into master Mar 11, 2022
@robin-aws robin-aws deleted the verification-logger branch March 11, 2022 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants