Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][CS2] Block comments in array literals #4541

Closed

Conversation

GeoffreyBooth
Copy link
Collaborator

Closes #4290. Now it’s possible to put multiline comments in array literals, just like was already possible for objects:

arr = [
  a
  ###
    Comment
  ###
  b
]

var arr;

arr = [
  a, /*
    Comment
   */, b
];

As part of this, I redid the grammar for array elements so that they now mirror the grammar for object properties; they were already very similar, but now they match.

The big change is that block comments are now expressions, not statements. This is probably a change worth making on its own merits, as embedded JavaScript blocks are expressions and both tokens should probably behave the same way. The only breaking change that I can see as a result of this is that block comments are no longer ignored in directive prologues (the area where you put use strict). They really shouldn’t be, since if you type a block comment there you’re probably expecting it to appear in the output. I don’t think we should be engaging in special-casing anymore to create exceptions for the use strict directive. If people need it, they can either use modules, use classes, or use the Babel transform that adds it everywhere.

…them parity with embedded JavaScript; no longer prepend a newline to block comments; make sure block comments still are output without a trailing semicolon; support block comments in arrays
…you’ll need to find another solution if you need a comment above ‘use strict’
…onger evaluates into empty braces; this is just a breaking change, though it’s a bit ridiculous why someone would rely on block comments triggering an empty block when regular comments don’t
…put undefined if the if or else blocks are nothing but a block comment expression
Copy link
Collaborator

@lydell lydell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, but currently allows emitting invalid JS:

$ ./bin/coffee -bpe 'a + ###a###'
a + /*a */;

Also, it feels odd that your array example results in [a, undefined, b]

@GeoffreyBooth
Copy link
Collaborator Author

Also, it feels odd that your array example results in [a, undefined, b]

Good catch. Don’t know why arrayEq didn’t catch that, but I’ve fixed it.

As for invalid JS, this PR allows emitting invalid JS the same way backticks do:

$ ./bin/coffee -bpe 'a + `/ 1`'
a + / 1;

I guess the question is, what to do about it. It is what the user typed, so there’s an argument to be made that the compiler should just give them what they asked for. Though I can see the point that embedded JS is assumed to interact with the code around it, and therefore can complete a hanging expression the way a comment can’t.

What would you suggest?

@vendethiel
Copy link
Collaborator

The error is expected, that's literal js.

@GeoffreyBooth
Copy link
Collaborator Author

GeoffreyBooth commented May 7, 2017

I just don’t know what the solution would be. If I change it back to a statement, the comments in arrays get wrapped:

a = [
      (function() {
        /*
          Comment
         */
      })(), 3
    ];

I think ideally we would tell the lexer to just skip over these tokens as if they didn’t exist, similar to what it’s doing with line comments (although it discards those). Then a block comment could go anywhere, and its behavior would mimic how block comments work in JavaScript. I just don’t know if such behavior is possible.

If we can’t figure out a way to achieve that, is it preferable to not allow block comments in arrays (and probably other places) so that the compiler can throw errors? I agree that we should catch a + ###a### if we can, but I’m inclined to think of that as its own issue, not one that necessarily needs to be solved in this PR.

For future reference, here’s the test:

test "Multiline comment not treated as value", ->
  throws -> CoffeeScript.compile 'a + ### Comment ###'

@GeoffreyBooth
Copy link
Collaborator Author

So I thought I had a solution for comments in general:

  1. Currently, inside lexer.coffee commentToken, both line and block comments are parsed. Nothing happens to line comments, and for block comments a HERECOMMENT token is created.

  2. What if instead of discarding the line comment, or creating a token for the block comment, we save the comment data as a property on the previous token? And if there is no previous token, e.g. because we’re at the beginning of the file, we create a TERMINATOR newline token to hold the data (or some other better placeholder).

  3. The tokens get parsed as usual, but just after generating the JavaScript for a token, we check to see if the token has stowaway comment data attached to it. If it does, add some extra JavaScript for the line or block comment.

I said “thought” because while I’ve figured out how to save the comment data to the previous token, like our current spaced or newLine token properties, I can’t seem to figure out how to retrieve it again later. It doesn’t seem accessible in nodes.coffee, I guess because the trip through Jison’s parser.js discards it. Is there some way to preserve it all the way through, like we’re currently doing for location data? Or is location data a special case, supported as understood properties by Jison, and we can’t pass through arbitrary properties?

@jashkenas
Copy link
Owner

Yeah, this particular change is a step backwards — but comment parsing is really tricky in general.

We want them to be invisible to the AST, but at the same time, maintain their position in the AST for later printing.

Your new plan sounds more correct than what we're doing now. It would be a big job, but it would work for both line comments and block comments. If you really want to tackle this change, here's what I'd suggest:

  1. Inside lexer.coffee, line comments and block comments are tokenized. But they're never added to the token stream. Instead, they're pushed onto a .comments array attached to the previous token.

  2. The AST is walked, and the output JavaScript generated as normal.

  3. The AST is walked again by a new function, this time looking only for .comments. The comments are formatted and inserted at the next valid comment location (probably after the next semicolon, before the end of the line.)

  4. (Perhaps the sourcemap location information can be leveraged to make this easier? Or perhaps doing this in a second pass will destroy all of the sourcemapping?)

In any case, this particular PR isn't a great idea. If they're parsed at all, comments are statements, not expressions.

@GeoffreyBooth GeoffreyBooth changed the title [CS2] Block comments in array literals; block comments are now expressions [CS2] Block comments in array literals May 8, 2017
@connec
Copy link
Collaborator

connec commented May 8, 2017

@jashkenas would it make more sense to attach the comments to the next token instead?

I imagine it'd be a bit more fiddly in the lexer (remember comments until you see a non-comment token), but in general comments tend to relate to the following expression(s) rather than the preceding one(s). Having that association would probably enable better output formatting and potentially be useful to intermediate tooling (thinking type annotations etc.).

@jashkenas
Copy link
Owner

As a note -- if we're going to fix block comments in array literals, we need to make sure they don't create holes in the array, or extra commas, etc.

@connec Attaching the comments to the next token would be fine too. I don't think it makes 'em much easier or harder.

What would make this all easier and more informed would be for someone to research how other tools solve this problem. There are probably other JS transpilers out there that do a good job preserving comments.

@GeoffreyBooth GeoffreyBooth changed the title [CS2] Block comments in array literals [WIP][CS2] Block comments in array literals May 9, 2017
@GeoffreyBooth
Copy link
Collaborator Author

At the moment I’m trying to get this to work with block comments back as statements, leaving the larger comments refactor for another PR. I’m struggling to get the grammar to detect what I want. Using this test:

  a = [
    ###
      Comment
    ###
    3
  ]

The debugging line in the rewriter returns these tokens:

IDENTIFIER/a =/= [/[ INDENT/2 HERECOMMENT/
  Comment
 TERMINATOR/
 NUMBER/3 OUTDENT/2 TERMINATOR/
 ]/] TERMINATOR/

And yet this grammar doesn’t seem to detect them:

  Array: [
    o '[ ArgList OptComma ]',               -> new Arr $2
  ]

  ArgList: [
    o 'Arg',                                              -> [$1]
    o 'ArgList , Arg',                                    -> $1.concat $3
    o 'ArgList OptComma TERMINATOR Arg',                  -> $1.concat $4
    o 'INDENT ArgList OptComma OUTDENT',                  -> $2
    o 'Comment TERMINATOR Arg',                           -> $1.concat $3
  ]

This befuddles me. If Comment TERMINATOR Arg is an ArgList, and INDENT ArgList OptComma OUTDENT is an ArgList, then the former inside the latter should also be an ArgList: INDENT Comment TERMINATOR Arg OptComma OUTDENT. And since the OptComma is optional, this matches the token string reported by the rewriter. So why do I get this error?

failed 1 and passed 907 tests in 2.91 seconds

  test/comments.coffee:417:8: error: unexpected newline
    ###
       ^

The line in question is the closing ### of the comment in my test.

@jashkenas
Copy link
Owner

Honestly, I wouldn't bother trying to get this to "work" with statements — it's a hacky model, and a statement cannot logically exist within an array.

If you want to fix this ticket, I'd suggest this:

  • Detect whenever a block comment node is forced to become an expression by function wrapping, and throw an error.

…’re not true statements; remove test that tries to get comments to work in arrays (basically revert back to the beginning)
@GeoffreyBooth
Copy link
Collaborator Author

When block comments are treated as statements, a block comment in an array always throws the unexpected newline error—it doesn’t get past the parser for nodes.coffee to try to compile it. So checking for it to be wrapped in a closure doesn’t do anything, as it never gets that far.

With block comments reverted to statements, there’s not much point to this PR anymore. It doesn’t solve the original problem, and all it does is make the grammar for arrays mirror the grammar for objects; which is something, but probably not enough of an improvement to justify a PR.

@xixixao
Copy link
Contributor

xixixao commented Jun 4, 2017

I'd just add that whether the comments get attached to the next or the previous token is probably an important issue that the proper PR will need to tackle correctly, to make sure the comments end up in the right place after we insert implicit tokens into the stream.

@GeoffreyBooth
Copy link
Collaborator Author

I've made a lot of progress on this, I hope to submit a new PR soon. My approach has been to treat comments as a property of a token, like the location data, and neither inline nor multiline comments become tokens of their own.

@GeoffreyBooth GeoffreyBooth mentioned this pull request Jun 14, 2017
@GeoffreyBooth
Copy link
Collaborator Author

Attempting a new approach in #4572.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants