Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Comment optimization should not cause automatic-semi insert for single-line block #326

Merged

Conversation

jackschu
Copy link
Contributor

@jackschu jackschu commented Jun 30, 2024

Problem

Today the following piece of code is valid but it is parsed incorrectly:

class C {
  name/*comment*/() {
  }
}

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript:

The output of tree-sitter parse is the following:

program [0, 0] - [4, 0]
  class_declaration [0, 0] - [3, 1]
    name: identifier [0, 6] - [0, 7]
    body: class_body [0, 8] - [3, 1]
      member: field_definition [1, 2] - [1, 6]
        property: property_identifier [1, 2] - [1, 6]
      comment [1, 6] - [1, 17]
      member: method_definition [1, 17] - [2, 3]
        name: MISSING property_identifier [1, 17] - [1, 17]
        parameters: formal_parameters [1, 17] - [1, 19]
        body: statement_block [1, 20] - [2, 3]

This is because name/*comment*/() { was being parsed as having an automatic semicolon between name and (), but this isn't right.

This was because there was an optimization that would cause automatic semicolons to be considered valid if a comment was started. This overlooked single line block comments.


Diff summary

Turns out the problem was really nuanced:

There are two callers of scan_whitespace_and_comments,

  1. an optimizing caller who wants to decide that having seen a comment is enough to consider auto-semi-insertion legal.
  2. another who just wants to march the lexer forward maximally.

This PR basically gives the ability for the first of these to be invoked (ie because the automatic semicolon scanner saw a comment) and not result in a decision on semicolon-legality in particularly the case where only a single-line-block-comment is seen.

It does this by

  1. changing scan_whitespace_and_comments to have an enum return value to indicate its three results (no-decision, reject, accept)
  2. giving scan_whitespace_and_comments a consume:false mode where it stops as soon as it pops out of (a series of) block comments

Also added tests to the corpus

@jackschu jackschu changed the title Fix: Comment optimization should not validate automatic semi insert for single-line block comments Fix: Comment optimization should not cause automatic-semi insert for single-line block Jun 30, 2024
jackschu and others added 7 commits June 30, 2024 21:37
…decide that the first comment lookahead is not enough to return true or false, introduce an enum to represent this, also we need to stop our walk forward as soon as the comment block closes if we're going to return control flow to the remainder of the scanner
@amaanq
Copy link
Member

amaanq commented Jul 5, 2024

this is really good - thank you! just pushed a couple formatting changes and typedef'd the enum

@amaanq amaanq merged commit f5af62c into tree-sitter:master Jul 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants