Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Dafny format file.dfy and IDE extension #2399

Merged
merged 368 commits into from
Feb 7, 2023

Conversation

MikaelMayer
Copy link
Member

@MikaelMayer MikaelMayer commented Jul 12, 2022

This PR makes it possible to "format" one or many Dafny files on the command-line, which for now means only changing the indentation of every single line. This is a step towards #1658

Supported syntax:

  • dafny format file1.dfy file2.dfy file3.dfy... Format each file provided in arguments.
  • dafny format --print file1.dfy Format each file but instead of writing files, output them formatted in stdout
  • dafny format --check file1.dfy Gives a summary of what remains to format, with a non-zero error code if not everything is formatted.
  • dafny format . Look up for all the Dafny files recursively in the current repository and format them.

Insights

Coming up with an opinionated formatter wasn't an easy task. We want the guarantee that the reprinted program should be the same as the previous one, modulo whitespace.
Our approach in this PR consists of three phases:

  1. Convert tokens to a double linked list so that we can traverse them from the beginning to the end and store the first token in the program (Scanner.frame, and all AST nodes)
  2. Traverse the program on pre-resolved nodes, and decide on the indentation of each token when it's associated to a declaration, a node, a statement, etc.
  3. Reprint the tokens in their original order by replacing their leading and trailing whitespace by their correctly indented counterpart

Step 3. has been entirely written in Dafny, and offers the interesting guarantee that the final reprinted program will contain all the "val" of every token reachable from the first token provided in input.

What is the indentation of a "token"?

This PR considers that each token has to be associated with 3 types of indentation.

  1. The indentation about comments that are before this token
  2. The indentation of this token itself if it starts a new line
  3. The indentation of comments that are after this token.

Because the token printer will traverse the tokens in order, the indentation 1. is used only for the trivia associated to the token in the leading whitespace of that token, like /* Comment about X */ in the following example, which is in the leading trivia of const

datatype Y :=
   | C
     // Comment about C
   | D
     // Comment about D

/* Comment about X */
const X :=

Note that in this same example, the indentation of Comment about D is possible because of the indentation stored in 3.

The indentation of 2. makes is possible to have the token to be a delimiter of two indented regions, like this:

if X then
  Y
else // Here "else" has its own indentation
  Z

How this PR has been tested?

  • I added the test Formatter.cs in the pipeline with 64 different tests, where almost each test starts from a totally unindented program, verifies the new indent, and reindent it once more to ensure the indentation is idempotent.
  • I added the test FormatterTest.cs in the language server to ensure it behaves as expected.

Rules implemented:

  • Indentation is based on spaces only. Tabs are replaced with spaces. One indentation level consists of 2 spaces

Interesting cases

Assignments

The basic rule is that, after := , there is indentation, except in one case that I found were quite common: Single calls on the RHS don't have extra indentation:

y := functionCall(
  arg1,
  arg2
);

However, if there are multiple calls, there is extra indentation so that we know where this finishes

y := functionCall(
    arg1,
    arg2
  ),
  functionCall2(
    arg3
  );

Moreover, if there is a single RHS but it's on a separate line, it has to be indented.

y :=
  functionCall(
    arg1,
    arg2
  );

And, if the RHS is a closure, it should still be indented,
as the arrow "=>" is not a delimiter of an end of region

x := () reads A
        reads B // We want clauses to be aligned
  => // The arrow should still be indented

If the first right-hand-side is one one line, all right-hand-side are aligned

var y,
    z
  := OneFunctionCall(n),
     OneFunctionCall(n+1);

However, if the first right-hand-side spans multiple lines, the subsequent ones do not need to be aligned, just indented

var y,
    z
  := OneFunctionCall(
    n
  ),
  OneFunctionCall(
    n+1
  );

Comprehensions

Comprehensions like ``imap` make an extra effort to indent their inner expressions when the keyword is alone on its line.

  var a :=
    imap
      t: int ::  t % 3
              == 4
             := 5;

Otherwise, they keep the surrounding indentation. In any case, they align ":=" and "::" as much as possible and indent their arguments, if they are placed at the beginning of the line.

  b := imap
    i: int
    ::
      i % 6 == 7
    :=
      8;

  b := imap
    i: int ::
    i % 6 == 7 :=
    8;

  d := imap i: int
    |  i % 12 == 0
    :: i % 13 == 0
    := 14;

Aligned //

var x := y; // Line 1
            // Line 2

The formatter keeps the alignment of consecutive and already aligned comments starting with //, like the one above.
Adding or removing one space in the indentation of // Line 2 will make the formatter take the default indentation:

var x := y; // Line 1
// Line 2

Adding spaces to avoid changing blocks

module X {
  class A {
      const x := 2;    /* start of comment
These words aligned:      start */
}

Here const needs to be de-indented, so we need to add spaces in front of the /*

Commented cases and datatypes constructors

datatype Main =
  | Child1
    // Comment about child1
  | Child2
  //| ForgottenChild
  | Child4 

match X {
  case 1 =>
     blablah
     //Comment about blablah
  
  // Comment about case2
  case 2 =>
    something
  //case 3 =>
  case 4 =>
}

Normally, comments after a constructor are aligned with this constructor. But above, it's obvious that one comment is not about Child2, but a commented case.
Here we add the rule that comment starting with //| in such cases are indented like the other | around, Same for commented case 3, in the match statement above.

Alignment of binary comparison operators

assert (x
     == y
      < z)

In many places, multi-line equality and comparison align their arguments. The formatter implements this behavior, so even if normally things in parentheses are a single block, there is an extra de-indent for these operators. This works for chaining expressions and calc statements as well.

Else alignment

var z := if x == y then blabla
                   else blibli;
if z == 1 then bloblo
else blublu

Here we have two common cases: if-then-else where the "else" and the "then" are on two different lines, and they have contradictory indentation. We resolved this contradiction by deciding that, if the "then" is aligned with the "else" before, the formatter will keep this alignment no matter what. Otherwise, it will indent the "else" next to the "if"

Case with opening parentheses

match X
case Z => (
  match U
  case P1 =>
  case P2 =>
  )
case W =>           

The formatter assumes that ( always defines a block in which everything is in (meaning, their column is equal or greater to the column of (. But in the case above, it was obvious that the parentheses are there only to disambiguate.
The formatter will not consider that ( creates a full block if it's followed by a newline and it's preceeded by "=>"

Clear disambiguation

The Dafny formatter helps distinguishing otherwise ambiguous programs, such as this one:

method J(xs : List<int>) returns (z : int) {
  match xs
    case Cons(y,ys) =>
      z := y;
    case Nil        =>
      z := 0;

  if {
    case true => z := z;
    case true => z := 1;
  }
}

The Dafny formatter will indent it like this:

method J(xs : List<int>) returns (z : int) {
  match xs
    case Cons(y,ys) =>
      z := y;
    case Nil        =>
      z := 0;

      if {
        case true => z := z;
        case true => z := 1;
      }
}

because the last if is parsed as part of the last case. Of course, it's always better to use the {} for the match statement to avoid these ambiguities, but at least the formatter will help.

Blocks and hints.

Usually, blocks delimited with {} have a prelude, so it makes sense that comments right before a { are indented one extra level.

while X
  invariant X
  // Comment about the loop
{
}

Except in one case: Hints in calc statements

calc {
    A;
    // Comment in the calc statement
    { lemma(); }
== B
}

Expected follow-up PRs:

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Comment on lines 1 to 11
module {:extern @" Helpers {
public class HelperString {
public static readonly System.Text.RegularExpressions.Regex NewlineRegex =
new System.Text.RegularExpressions.Regex(new string(new char[]
{'(','?','m',')','(','^',')','\\','s','*'}));

public static string Reindent(string input, string newIndent) {
return NewlineRegex.Replace(input, newIndent);
}
}
} /*/"} DefaultCode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, this is super clever, but reminds me way too much of SQL injection attacks. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clearer about my opinion: I don't actually want to see any real Dafny code using this hack. It just illustrates how under-specified {:extern} is and relies on behavior that could very easily break in the future. You're much better off putting target language source code in target language source code files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will surely do, until we have something like:

unsafe "csharp" {
  // blablah
}

:-)

@MikaelMayer MikaelMayer force-pushed the feat-add-trivia-next-token-formatter branch from 5491f43 to b16651f Compare July 14, 2022 19:41
Base automatically changed from feat-add-trivia to master July 19, 2022 17:05
@MikaelMayer MikaelMayer changed the title Feat add trivia next token formatter Feat: Dafny format file.dfy and IDE extension Jul 29, 2022
@MikaelMayer MikaelMayer dismissed davidcok’s stale review February 2, 2023 21:56

Because informally David told me he was fine now.

@MikaelMayer MikaelMayer enabled auto-merge (squash) February 2, 2023 21:57
davidcok
davidcok previously approved these changes Feb 2, 2023
@@ -0,0 +1,2 @@
// RUN: %exits-with 2 %baredafny format --stdin 2> "%t"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should have an exit code of 1 as it is a command-line error.

@@ -12,7 +12,7 @@ Failing2: FAILED
RunAllTestsOption.dfy(29,2): expectation violation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the changes in RunAllTestsOption.dfy.expect and in contract-wrappers?

Copy link
Member

@keyboardDrummer keyboardDrummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It builds!! Yay. LGTM except a few changes in expect files

@MikaelMayer MikaelMayer merged commit 6cb5c5f into master Feb 7, 2023
@MikaelMayer MikaelMayer deleted the feat-add-trivia-next-token-formatter branch February 7, 2023 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants