Add verification loggers and report generator #38

robin-aws · 2022-03-02T19:54:08Z

This change adds a best practice we recommend for all Dafny projects: measuring the cost of each individual verification task and setting an upper bound in CI, to guard against future verification instability (see dafny-lang/dafny#1582).

This is done with two steps:

Add the /verificationLogger:csv parameter when invoking dafny, which outputs a CSV file with verification results.
Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

There are more details on the latter tool here: https://github.com/dafny-lang/dafny-reportgenerator

The good news is that (at least after #36), this codebase is in good shape! :) All tasks take less than 5 seconds across a few runs, so I've set 10 seconds as the upper bound for now. I would prefer to bound the resource count instead since that is a more predictable metric, but Dafny doesn't record the resource count when splitting happens until the upcoming 3.5 release.

Note that because the /verificationLogger options cause extra output, I've also changed the lit configuration to no longer assert the exact output of dafny when verifying everything. This will make the build more stable by not depending on the exact console output, as all of the code here is always expected to verify successfully.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

This reverts commit 0d19733.

Dummy verification-breaking change

…ation-logger

…tion-logger

…ation-logger # Conflicts: # .github/workflows/tests.yml

This way there’s no need to explicitly check the output for errors, so we can nuke all the .expect files

atomb

It looks like this won't work with the current version of Dafny since it can't generate both CSV and TRX reports at the same time. That should be easy to fix, though.

.github/workflows/tests.yml

cpitclaudel · 2022-03-11T01:03:17Z

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

cpitclaudel · 2022-03-11T01:03:51Z

.github/workflows/test-report.yml

@@ -0,0 +1,16 @@
+name: 'Test Report'


Test sounds like a verb here to me. Maybe Generate test report? Same in the file name.

Fair point! This is from the template at https://github.com/dorny/test-reporter so I opened a PR to fix it there too. :)

cpitclaudel · 2022-03-11T01:04:37Z

.github/workflows/test-report.yml

+    - uses: dorny/test-reporter@v1
+      with:
+        artifact: verification-results
+        name: Verification Tests


Why are we using the name "test" here? Isn't it reporting verification metrics? Or is it related to Dafny's :test somehow?

Nope, I just did a poor job of editing the template. :) I'm changing it to "Verification Results"

cpitclaudel · 2022-03-11T01:06:53Z

.github/workflows/tests.yml

+        run: lit --time-tests -v --param 'dafny_params=/verificationLogger:trx /verificationLogger:csv' .
+
+      - name: Generate Report
+        run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10


Suggested change

run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10

run: find . -name '*.csv' -print0 | xargs -0 --verbose dafny-reportgenerator summarize-csv-results --max-duration-seconds 10

What happens if there are too many .csv reports?

Yeah I'm a bit concerned that will be an issue in the future and cut an issue here: dafny-lang/dafny-reportgenerator#4. The current command length is 1505 so we've got a fair bit of headroom under the 4K limit at least (if it applies to the commands that xargs generates).

keyboardDrummer · 2022-03-11T09:57:25Z

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?

Co-authored-by: Clément Pit-Claudel <[email protected]>

robin-aws · 2022-03-11T16:52:41Z

Invoke the dafny-reportgenerator tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.

Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing timeLimit? Is that because /timeLimit doesn't work at the task level? Or because /timeLimit sometimes just doesn't work?

I tried to address that in the README for the tool here (https://github.com/dafny-lang/dafny-reportgenerator): "This is better than setting a more aggressive verification cost bound through options like /timeLimit directly, as it allows users to know that their code is still correct, but still blocks code changes that are too expensive to verify and hence likely to break in the future."

robin-aws · 2022-03-11T17:54:11Z

I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ?

Fair question, and especially with the limited functionality the tool is providing so far you could imagine it just being yet more options on the dafny CLI. But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the dafny tool. It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke dafny repeatedly you could get different non-deterministic results (at least in terms of duration).

For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like --logger (for test results) and --collect (for things like testing coverage), and other tools are used to report on that data in various ways.

atomb

Looks good. The improvements @cpitclaudel suggested definitely seem nice to include.

keyboardDrummer · 2022-03-11T18:15:16Z

But we expect that it will gain more functionality over time, and the kind of features that can be strongly decoupled from the core functionality of the dafny tool.

What would those features be? When I think of a feature like detecting proofs with a high verification time variability, that seems like something useful to have in the Boogie CLI.

It's also very useful to record the results of verification tasks once and then analyze in multiple ways after the fact, whereas if you had to invoke dafny repeatedly you could get different non-deterministic results (at least in terms of duration).

What do you mean by analyse in multiple ways? Do you mean take the measurements from different proof runs and apply different statistics to them? I imagine in the general case it would be fine to have Boogie apply a common statistic after which you wouldn't want to do any further analysis, which you'd use in your CI.

For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like --logger (for test results) and --collect (for things like testing coverage), and other tools are used to report on that data in various ways.

I'm all for other tools using Dafny's output, but I think that any Dafny related features customers should use in their CI should be in the Dafny CLI.

sarahc7 and others added 30 commits September 2, 2021 11:03

Add NatSeq and Io libraries

aed044e

Style changes

59bb5ad

Remove explicit triggers

7f246e0

Add copyright headers

8781a8d

Update LICENSE

c6c0764

Fix copyright headers

13cf6c2

Remove Io

7906775

Change expect file

2c3bd36

Remove {:nativeType} attributes

30a8c34

Name changes

d6a2aaf

Add example with custom small and large widths

6013e6e

More name changes

0c527a2

Pick up Dafny 3.3, add verification results report

0d0649e

See if using an environment allows CI to post results from a fork

6c5d6ec

No luck, try documented workaround

03d2c9e

Merge branch 'master' of github.com:dafny-lang/libraries

caf667a

Merge pull request #1 from robin-aws/verification-logger

1fc5c6d

Dummy verification-breaking change

0d19733

Fix wrong workflow symbol

6460f3d

Revert "Dummy verification-breaking change"

0f2e495

This reverts commit 0d19733.

Merge pull request #2 from robin-aws/test-pr

089c493

Dummy verification-breaking change

Merge branch 'master' of github.com:dafny-lang/libraries into verific…

043a867

…ation-logger

Merge branch 'master' of github.com:robin-aws/libraries into verifica…

e56325b

…tion-logger

Move to Dafny 3.4 and add CSV verification output

b52462d

Merge branch 'master' of github.com:dafny-lang/libraries into verific…

f1c4890

…ation-logger # Conflicts: # .github/workflows/tests.yml

Fix bad quotes

045408a

Trying to fix ** globbing

3be0e51

Avoid NonLinearArithmetic for now

f675930

Ah Sequences is hiding some /noNLArith flags as well

9bc2e6d

Typo

aa1927e

robin-aws self-assigned this Mar 3, 2022

robin-aws added 5 commits March 9, 2022 15:57

Add Dafny verification report

548a9df

Trying to get globbing to work

717d820

Missing flag

f6ad164

Remove dafny option that overrides dafny exit code

5fc4b19

This way there’s no need to explicitly check the output for errors, so we can nuke all the .expect files

Fix jobs (hopefully)

94f9189

robin-aws changed the title ~~Drop lit, add verification logger~~ Add verification logger and report generator Mar 10, 2022

robin-aws added 2 commits March 10, 2022 09:58

Put lit back

27c350c

Set bound of 1 second

ab26e78

robin-aws marked this pull request as ready for review March 10, 2022 19:24

Too aggressive, fall back to 10 seconds for now

519f1c9

robin-aws requested review from RustanLeino and a team March 10, 2022 19:28

atomb reviewed Mar 10, 2022

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

Fix upload path pattern, fail if nothing matches

a410700

robin-aws changed the title ~~Add verification logger and report generator~~ Add verification loggers and report generator Mar 10, 2022

atomb previously approved these changes Mar 10, 2022

View reviewed changes

cpitclaudel reviewed Mar 11, 2022

View reviewed changes

Naming improvement

feedfa8

robin-aws dismissed atomb’s stale review via feedfa8 March 11, 2022 16:50

Update .github/workflows/tests.yml

48340eb

Co-authored-by: Clément Pit-Claudel <[email protected]>

atomb approved these changes Mar 11, 2022

View reviewed changes

cpitclaudel approved these changes Mar 11, 2022

View reviewed changes

robin-aws merged commit 44f5891 into master Mar 11, 2022

robin-aws deleted the verification-logger branch March 11, 2022 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add verification loggers and report generator #38

Add verification loggers and report generator #38

robin-aws commented Mar 2, 2022 •

edited

Loading

atomb left a comment

cpitclaudel commented Mar 11, 2022

cpitclaudel Mar 11, 2022

robin-aws Mar 11, 2022

cpitclaudel Mar 11, 2022

robin-aws Mar 11, 2022

cpitclaudel Mar 11, 2022

cpitclaudel Mar 11, 2022

robin-aws Mar 11, 2022

keyboardDrummer commented Mar 11, 2022 •

edited

Loading

robin-aws commented Mar 11, 2022

robin-aws commented Mar 11, 2022

atomb left a comment

keyboardDrummer commented Mar 11, 2022 •

edited

Loading

	run: find . -name '*.csv' \| xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10
	run: find . -name '*.csv' -print0 \| xargs -0 --verbose dafny-reportgenerator summarize-csv-results --max-duration-seconds 10

Add verification loggers and report generator #38

Add verification loggers and report generator #38

Conversation

robin-aws commented Mar 2, 2022 • edited Loading

atomb left a comment

Choose a reason for hiding this comment

cpitclaudel commented Mar 11, 2022

cpitclaudel Mar 11, 2022

Choose a reason for hiding this comment

robin-aws Mar 11, 2022

Choose a reason for hiding this comment

cpitclaudel Mar 11, 2022

Choose a reason for hiding this comment

robin-aws Mar 11, 2022

Choose a reason for hiding this comment

cpitclaudel Mar 11, 2022

Choose a reason for hiding this comment

cpitclaudel Mar 11, 2022

Choose a reason for hiding this comment

robin-aws Mar 11, 2022

Choose a reason for hiding this comment

keyboardDrummer commented Mar 11, 2022 • edited Loading

robin-aws commented Mar 11, 2022

robin-aws commented Mar 11, 2022

atomb left a comment

Choose a reason for hiding this comment

keyboardDrummer commented Mar 11, 2022 • edited Loading

robin-aws commented Mar 2, 2022 •

edited

Loading

keyboardDrummer commented Mar 11, 2022 •

edited

Loading

keyboardDrummer commented Mar 11, 2022 •

edited

Loading