Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Verification diagnostics (core version) #1942

Merged
merged 127 commits into from
May 11, 2022

Conversation

MikaelMayer
Copy link
Member

Real-time verification diagnostics for the language server. Illustration on an example put in the documentation:
image

Also fixes #1918.

Features implemented:

  • Record the real-time verification diagnostics provided by Boogie into the DafnyDocument
  • Send verification diagnostics along with per-line status to the client for display
  • Migrates verification diagnostics on document change
  • Activate/deactivate verification diagnostics

Try this in VSCode IDE yourself:

  1. Checkout the branch verification-diagnostics of Dafny
  2. in dafny/, clone boogie and checkout the commit 6c703b1070fe3c96965289b7cb43c8fc79c5b2e3 (currently two above master)
  3. Compile Dafny. It automatically depends on your installed nested boogie.
  4. Clone verification-diagnostics-gutter from the VSCode extension, run code . in a terminal to open VSCode on it (or just run vSCode and open the ide-vscode folder), and press F5 to run another instance of VSCode.
  5. Ensure in the settings that the paths to DafnyLanguageServer.dll points to the dafny/Binaries/DafnyLanguageServer.dll compiled in step 3.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

WIP: send intermediate verification results
Added information to publish verification diagnostics on the gutter
All unvisited methods are set to verified by default.
Children are updated once everything finishes.
@keyboardDrummer
Copy link
Member

Personally I find that the API makes more choices about the visuals than I'd like.

It seems that he gutter icons combine information from different dimensions: status of verification pipeline, errorness, reachability of assertions. Personally I have trouble understanding all the combinations, and I think I'd prefer to see these dimensions separately. For example, I wouldn't mind having a single icon show the status of the verification pipeline, possibly on the method header. For the errorness, I don't understand why we need more information than what the diagnostics currently provide. Is that because for postcondition violations, the violated postcondition is in a related location and that doesn't get a red underline by default? For the reachability dimensions I don't understand the use-case well yet, but I guess that's on me.

Would it be possible to keep the different dimensions of information separate in the API and leave the precise choice of how to visualise things to the IDE ?

Something like:

Status of verification pipeline:

  public record FileVerificationStatus(
    DocumentUri Uri,
    int? Version,
    IReadOnlyList<MethodVerificationStatus> Methods) : IRequest;

  public record MethodVerificationStatus(Location Name, Location Body, VerificationStatus Status);

Reachability of assertions:

  public record FileAssertionInformation(
    DocumentUri Uri,
    int? Version,
    IReadOnlyList<AssertionInformation> Assertions) : IRequest;

  public record AssertionInformation(Location Body, bool WasVerified);

For the errorness, I guess that the information contained in ImmediatelyRelatedRanges is already available in the IDE through diagnostics.

For the counterexample trace, I would prefer a pull based API where the IDE has to request the information before it gets it.

/// A verification tree is an abstraction over the code to represent the verification
/// status of a region of the document, useful for IDE verification inspection.
/// A verification tree can contain other child trees.
/// It can currently be rendered linearly, e.g. for gutter display, or used as a tree in a test-like display.
Copy link
Member

@keyboardDrummer keyboardDrummer May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or used as a tree in a test-like display

Did the scope of this PR increase? For a 1800 line PR, I suggest we only try to cut the scope.

I'm unable to understand what this tree is doing. I feel like it's supposed to provide information for many different features. I'd like to have separate APIs for different features and to be able to verify that each API sends the minimal amount of required information, but with this large record I find it intractable to do that.

@MikaelMayer MikaelMayer requested a review from robin-aws May 3, 2022 20:16
@MikaelMayer
Copy link
Member Author

Personally I find that the API makes more choices about the visuals than I'd like.

Thanks for expressing your personal concern. I'll do my best to address it.

It seems that he gutter icons combine information from different dimensions: status of verification pipeline, errorness, reachability of assertions.

Errorness and reachability of assertions are the same dimension: Error, Verified, or not run.

Personally I have trouble understanding all the combinations, and I think I'd prefer to see these dimensions separately. For example, I wouldn't mind having a single icon show the status of the verification pipeline, possibly on the method header.

The icons on the gutter make it possible to see the verification status of the entire method only with the peripheral vision, which is less constraining than the focus that would be required by a single icon on the method header, icon which is almost always more far away than the gutter (and not viewing horizontally).

For the errorness, I don't understand why we need more information than what the diagnostics currently provide. Is that because for postcondition violations, the violated postcondition is in a related location and that doesn't get a red underline by default?

Yes, that's one point, violated postconditions don't get a red underline by default. And we have no interest in duplicating diagnostics (that would be required to get underlines on two different things).
But moreover, and this was even more important, since diagnostics were published at the very end, all these errors were not available until verification was finished, which was extremely frustrating. This might have changed with your recent incremental diagnostic publication PR, but I haven't checked yet.

For the reachability dimensions I don't understand the use-case well yet, but I guess that's on me.

The way the verifier works is that it will first figure out if an assertion fails, and it will normally be the first one. Then it will assume it, and figure out if other assertions fail, up to 5 times. Hence, if there are less than 5 errors, we can put a green check on all remaining assertions, because they are verified (conditionally to all assertions before them, but I assume users know that). If not, I don't display anything (to represent that the assertion wasn't even checked). Hence, this is the same dimension as the errorness.

Would it be possible to keep the different dimensions of information separate in the API and leave the precise choice of how to visualise things to the IDE ?

Something like:

Status of verification pipeline:

  public record FileVerificationStatus(
    DocumentUri Uri,
    int? Version,
    IReadOnlyList<MethodVerificationStatus> Methods) : IRequest;

  public record MethodVerificationStatus(Location Name, Location Body, VerificationStatus Status);

Reachability of assertions:

  public record FileAssertionInformation(
    DocumentUri Uri,
    int? Version,
    IReadOnlyList<AssertionInformation> Assertions) : IRequest;

  public record AssertionInformation(Location Body, bool WasVerified);

There is also the notion of assertion batch that I want to expose, so the format above seems limited. Did you have in mind two send two records?
Also, since down the road we want the tests to be triggerables, we also need the relationship between methods and assertions, so I don't think we should put them in two different buckets from an API's perspective.

For the errorness, I guess that the information contained in ImmediatelyRelatedRanges is already available in the IDE through diagnostics.

Yes, indirectly. Making it available directly makes it easier for IDE extensions not to reimplement the wheel.

For the counterexample trace, I would prefer a pull based API where the IDE has to request the information before it gets it.

Counterexample trace is another topic to be discussed in this other PR:
#1977

@robin-aws
Copy link
Member

Working on catching up and just offering high-level thoughts for now.

I agree with @keyboardDrummer's difficulty in keeping track of all the different verification statuses, but I don't think it's beneficial to separate dimensions more because they are causally linked. You can't simultaneously have a "currently verifying" pipeline state and a "failed" verification status on the same line.

Instead to simplify things, I'd suggest not keeping around obsolete data. The important thing to communicate is whether verification obligations exist (i.e. the difference between Nothing and Scheduled or any other state), and where they are currently being determined or there is an answer. I don't think there's value in knowing that a line WAS correct/invalid previously if it may or may not change once verification catches up (either automatically or once you save).

I think having a more generic tree of data works better here to a point. Methods are not THAT significant a type here, given that each gets two assertion batches anyway, and there are lots of assertions that don't live in methods. Splitting up per file/document certainly makes sense though.


namespace Microsoft.Dafny.LanguageServer.IntegrationTest.Util;

public class AssertWithDiff {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you add a dependency from DafnyLanguageServer.Test to XUnitExtensions instead of duplicating code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two projects do not have the same testing framework (one throws an XUnitException, the other one is Microsoft.VisualStudio.TestTools.UnitTesting).
Alternatively, I could refactor both codes, but I'm worried that it could cause trouble to have two different testing frameworks in the same project.

Source/DafnyLanguageServer/README.md Outdated Show resolved Hide resolved
Comment on lines +174 to +176
/// The verification status consists of two orthogonal concepts:
/// - StatusVerification: Nothing (initial), Error, Verified, or Inconclusive
/// - StatusCurrent: Current (Up-to-date), Obsolete (outdated), and Verifying (as notified by the verifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are orthogonal, especially if you agree to drop Obsolete. You just end up with Nothing (initial), Verifying, Error, Verified, or Inconclusive

Copy link
Member Author

@MikaelMayer MikaelMayer May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's value in knowing that a line WAS correct/invalid previously if it may or may not change once verification catches up (either automatically or once you save).

The problem of removing the previous verification result is that, during the time it verifies (which are a few seconds), what you suggests really means hiding the previous state. Why would you hide the previous state way before you have some more up-to-date information? Users trying to fix an error might be thinking in parallel about the code and the previous verification result in other places which they believe would not have been impacted. Removing this information would result in a frustrating experience for them.

There are precedents about that: We decided to still display previous verification diagnostics (migrate them) although there might be parse/resolution errors. Moreover, in general, diagnostics are "updated" rather than "removed" and then "re-added" a few seconds later.
So, in short, it's something that users would expect and it's desirable.

To illustrate my point, let's consider a CALC statement with four statements, where Q2 is failing but Q3 and Q4 are succeeding. Let's say a user makes an edit on Q2, which makes it verify, but now Q3 does not verify anymore. The trace under this PR would be the following:

[ ][I][S][ ]: calc {
[ ][I][S][ ]:   Q1;
[=][-][~][O]:   Q2;
[O][o][Q][=]:   Q3;
[O][o][Q][O]:   Q4;
[ ][I][S][ ]: }

With your approach of not making statuses orthogonal, I don't think how it would look better than this:

[ ] I  S [ ]: calc {
[ ] I  S [ ]:   Q1;
[=] I  S [O]:   Q2;
[O] I  S [=]:   Q3;
[O] I  S [O]:   Q4;
[ ] I  S [ ]: }

Here are notable differences in the two approaches:

  • In the first version, the switching state from "error" to "verified" is instant on the fourth update (still giving the hint to the user that the error might be obsolete, and then actively re-checked), whereas in the second version, there is an intermediate state that completely hides all the previous information.
  • If I did not pay attention about the verification result between Q3 and Q4 (which can be farther away) because I was focused on Q1 to Q2, during the whole time the verifier runs (a few seconds), I would be blind about having reasons to believe that Q3 is still equals to Q4 (maybe I missed it, and the moment I want to know about whether Q3 == Q4, the verifier hid the previous result).
  • A few seconds of completely removing the previous information would have the effect to remove alignment information for the eye, so instead of seeing if an error has been fixed and the user can move on, the eye will have to start its course again from the line being edited to the left where the gutter is.
  • During the UX development with a UX designer, she agreed with me that it's better to minimize the frequency of full visual changes made to the gutter, because it should be supportive for the eye, not distract it.
  • Ideally, the eye should be able to stay on the code and only use peripheral vision to know the verification status. With this PR, the eye will be able to see an error "going down" with the peripheral vision, and since the user previously saw that this error was on the line being edited, they won't even need to look from right to left to see if the line has an error or not. With your alternative, after few seconds during which that information disappears, the eye will not be able to use peripheral vision to determine if a line verifies or not until it does the actual movement from right to left.

I can go on and on, and also explain you why displaying the error context is important as well, rather than just repeating the verification status on every line. Just ask me.

foreach (var topLevelDecl in module.TopLevelDecls) {
if (topLevelDecl is TopLevelDeclWithMembers topLevelDeclWithMembers) {
foreach (var member in topLevelDeclWithMembers.Members) {
if (member.tok.filename != documentFilePath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenario is this if branch true ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, when we include a module from another file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document that by extracting the condition into a variable: var memberWasIncluded = member.tok.filename != documentFilePath ?

continue;
}
if (member is Field) {
if (member.BodyEndTok.line == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenario is this if branch true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking deeper, it currently means that there is no body, so the constant does not have a value. See here:
https://github.com/dafny-lang/dafny/blob/verification-diagnostics-core/Source/Dafny/Dafny.atg#L1374
However, the goal of this if was to prevent the GetLspRange next line to raise an exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to replace if (member is Field) { if (member.BodyEndTok.line == 0).. with if (member is ConstantField) ?

Alternatively, you could extract this into a variable to explain the condition, like var constantHasNoBody = member.BodyEndTok.line == 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that ConstantFields can have bodies, I'll go the second route.

Copy link
Member

@keyboardDrummer keyboardDrummer May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that ConstantFields can have bodies, I'll go the second route.

My impression was that only ConstantFields have bodies and they always have them, so if you check for ConstantField you no longer need another check. Is that not the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can declare Constant fields without a body. it's just that the compiler won't accept it, but it works for verification purposes, I think.

Source/DafnyLanguageServer/Workspace/TextDocumentLoader.cs Outdated Show resolved Hide resolved
document.Version,
document.VerificationTree.Children.Select(child => child.GetCopyForNotification()).ToArray(),
errors,
linesCount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? I expect the IDE to know how many lines are in a document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linescount is not sent to the IDE, it's only used for internal computation (i.e. creating the array of LineVerificationStatus)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would prefer it if VerificationStatusGutter was a record with a primary constructor, to make it more clear how much data it contains. However, I don't have good arguments for this so feel to leave it if you prefer that ^,^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a record, indeed, every initialization variable would be public, so that's why I did not use a record.

Copy link
Member

@keyboardDrummer keyboardDrummer May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can also add non-primary constructors to a record with a primary constructor:

record Computed(int TotalLength) {
  Computed(int bodyHeight, int shoeHeight) : this(bodyHeight + shoeHeight) { }
}

But for a DTO I would prefer having the logic that computes the DTO be in another class altogether, like

record GutterIcons(IReadOnlyList<GutterIconEnum> Icons);

class GutterIconsRenderer {
  public GutterIcons Render(..)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, as a viable alternative, add a static method to GutterIcons that will do this computation and use the primary constructor, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Or, as a viable alternative, add a static method to GutterIcons that will do this computation and use the primary constructor, right?
  • That works

Implemented here
#2143

Source/DafnyLanguageServer/Workspace/DocumentDatabase.cs Outdated Show resolved Hide resolved
Source/DafnyLanguageServer/Workspace/TextDocumentLoader.cs Outdated Show resolved Hide resolved
Source/DafnyLanguageServer/Workspace/DocumentDatabase.cs Outdated Show resolved Hide resolved
Source/DafnyLanguageServer/Workspace/TextDocumentLoader.cs Outdated Show resolved Hide resolved
@keyboardDrummer
Copy link
Member

keyboardDrummer commented May 9, 2022

Working on catching up and just offering high-level thoughts for now.

I agree with @keyboardDrummer's difficulty in keeping track of all the different verification statuses, but I don't think it's beneficial to separate dimensions more because they are causally linked. You can't simultaneously have a "currently verifying" pipeline state and a "failed" verification status on the same line.

Instead to simplify things, I'd suggest not keeping around obsolete data. The important thing to communicate is whether verification obligations exist (i.e. the difference between Nothing and Scheduled or any other state), and where they are currently being determined or there is an answer. I don't think there's value in knowing that a line WAS correct/invalid previously if it may or may not change once verification catches up (either automatically or once you save).

I think having a more generic tree of data works better here to a point. Methods are not THAT significant a type here, given that each gets two assertion batches anyway, and there are lots of assertions that don't live in methods. Splitting up per file/document certainly makes sense though.

I've discussed this PR with @MikaelMayer and @cpitclaudel and we decided to keep the gutter icon enumeration as is, so that we move towards getting customer feedback on this feature.

We're accepting the risk that we might have to change the API in the future in a backwards incompatible way. Because this is a push API, we can update the API by adding a version number dafny/verification/status/gutterv2, and that means existing IDEs will simply not pick it up anymore while newer IDEs that only listen to dafny/verification/status/gutterv2 will simply not pick up the dafny/verification/status/gutter from older language servers.

@cpitclaudel expressed performance concerns with the added LSP API in the context of Emacs, so we might also change the API later to address that. Since this PR only publishes the new notification when this feature is turned on by the client, we're not worried about performance issues affecting customers that don't use the feature.

@robin-aws
Copy link
Member

I've discussed this PR with @MikaelMayer and @cpitclaudel and we decided to keep the gutter icon enumeration as is, so that we move towards getting customer feedback on this feature.

We're accepting the risk that we might have to change the API in the future in a backwards incompatible way. Because this is a push API, we can update the API by adding a version number dafny/verification/status/gutterv2, and that means existing IDEs will simply not pick it up anymore while newer IDEs that only listen to dafny/verification/status/gutterv2 will simply not pick up the dafny/verification/status/gutter from older language servers.

This sounds like a great compromise to me. :)

@MikaelMayer MikaelMayer merged commit 52f7552 into master May 11, 2022
@MikaelMayer MikaelMayer deleted the verification-diagnostics-core branch May 11, 2022 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Enhancements, feature requests, etc. These are NOT bugs, but ways we can improve Dafny part: language server Support for LSP in Dafny (server part; client is in ide-vscode repo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash of the language server for while *
4 participants