-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add verification loggers and report generator #38
Conversation
This reverts commit 0d19733.
Dummy verification-breaking change
…ation-logger # Conflicts: # .github/workflows/tests.yml
This way there’s no need to explicitly check the output for errors, so we can nuke all the .expect files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this won't work with the current version of Dafny since it can't generate both CSV and TRX reports at the same time. That should be easy to fix, though.
Naive question: why is the bounding done by analyzing the CSV after the fact, rather than just passing |
.github/workflows/test-report.yml
Outdated
@@ -0,0 +1,16 @@ | |||
name: 'Test Report' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test
sounds like a verb here to me. Maybe Generate test report
? Same in the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point! This is from the template at https://github.com/dorny/test-reporter so I opened a PR to fix it there too. :)
.github/workflows/test-report.yml
Outdated
- uses: dorny/test-reporter@v1 | ||
with: | ||
artifact: verification-results | ||
name: Verification Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using the name "test" here? Isn't it reporting verification metrics? Or is it related to Dafny's :test
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I just did a poor job of editing the template. :) I'm changing it to "Verification Results"
.github/workflows/tests.yml
Outdated
run: lit --time-tests -v --param 'dafny_params=/verificationLogger:trx /verificationLogger:csv' . | ||
|
||
- name: Generate Report | ||
run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run: find . -name '*.csv' | xargs -t dafny-reportgenerator summarize-csv-results --max-duration-seconds 10 | |
run: find . -name '*.csv' -print0 | xargs -0 --verbose dafny-reportgenerator summarize-csv-results --max-duration-seconds 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are too many .csv
reports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm a bit concerned that will be an issue in the future and cut an issue here: dafny-lang/dafny-reportgenerator#4. The current command length is 1505 so we've got a fair bit of headroom under the 4K limit at least (if it applies to the commands that xargs generates).
I have a related question. If the report-generator is something customers should always use for CI, then isn’t it better to have it be part of the dafny CLI ? |
Co-authored-by: Clément Pit-Claudel <[email protected]>
I tried to address that in the README for the tool here (https://github.com/dafny-lang/dafny-reportgenerator): "This is better than setting a more aggressive verification cost bound through options like /timeLimit directly, as it allows users to know that their code is still correct, but still blocks code changes that are too expensive to verify and hence likely to break in the future." |
Fair question, and especially with the limited functionality the tool is providing so far you could imagine it just being yet more options on the For precedent, the .NET platform also has a similar architecture, where data is written out as a result of options like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The improvements @cpitclaudel suggested definitely seem nice to include.
What would those features be? When I think of a feature like detecting proofs with a high verification time variability, that seems like something useful to have in the Boogie CLI.
What do you mean by analyse in multiple ways? Do you mean take the measurements from different proof runs and apply different statistics to them? I imagine in the general case it would be fine to have Boogie apply a common statistic after which you wouldn't want to do any further analysis, which you'd use in your CI.
I'm all for other tools using Dafny's output, but I think that any Dafny related features customers should use in their CI should be in the Dafny CLI. |
This change adds a best practice we recommend for all Dafny projects: measuring the cost of each individual verification task and setting an upper bound in CI, to guard against future verification instability (see dafny-lang/dafny#1582).
This is done with two steps:
/verificationLogger:csv
parameter when invokingdafny
, which outputs a CSV file with verification results.dafny-reportgenerator
tool with a configured maximum cost, which will fail the build if any task's cost crosses that threshold.There are more details on the latter tool here: https://github.com/dafny-lang/dafny-reportgenerator
The good news is that (at least after #36), this codebase is in good shape! :) All tasks take less than 5 seconds across a few runs, so I've set 10 seconds as the upper bound for now. I would prefer to bound the resource count instead since that is a more predictable metric, but Dafny doesn't record the resource count when splitting happens until the upcoming 3.5 release.
Note that because the
/verificationLogger
options cause extra output, I've also changed the lit configuration to no longer assert the exact output ofdafny
when verifying everything. This will make the build more stable by not depending on the exact console output, as all of the code here is always expected to verify successfully.By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.