[WIP][CS2] Block comments in array literals #4541

GeoffreyBooth · 2017-05-07T09:28:47Z

Closes #4290. Now it’s possible to put multiline comments in array literals, just like was already possible for objects:

arr = [
  a
  ###
    Comment
  ###
  b
]

var arr;

arr = [
  a, /*
    Comment
   */, b
];

As part of this, I redid the grammar for array elements so that they now mirror the grammar for object properties; they were already very similar, but now they match.

The big change is that block comments are now expressions, not statements. This is probably a change worth making on its own merits, as embedded JavaScript blocks are expressions and both tokens should probably behave the same way. The only breaking change that I can see as a result of this is that block comments are no longer ignored in directive prologues (the area where you put use strict). They really shouldn’t be, since if you type a block comment there you’re probably expecting it to appear in the output. I don’t think we should be engaging in special-casing anymore to create exceptions for the use strict directive. If people need it, they can either use modules, use classes, or use the Babel transform that adds it everywhere.

…them parity with embedded JavaScript; no longer prepend a newline to block comments; make sure block comments still are output without a trailing semicolon; support block comments in arrays

…you’ll need to find another solution if you need a comment above ‘use strict’

…onger evaluates into empty braces; this is just a breaking change, though it’s a bit ridiculous why someone would rely on block comments triggering an empty block when regular comments don’t

… for block comments in array literals

…put undefined if the if or else blocks are nothing but a block comment expression

lydell

This is really cool, but currently allows emitting invalid JS:

$ ./bin/coffee -bpe 'a + ###a###'
a + /*a */;

Also, it feels odd that your array example results in [a, undefined, b]

GeoffreyBooth · 2017-05-07T19:50:58Z

Also, it feels odd that your array example results in [a, undefined, b]

Good catch. Don’t know why arrayEq didn’t catch that, but I’ve fixed it.

As for invalid JS, this PR allows emitting invalid JS the same way backticks do:

$ ./bin/coffee -bpe 'a + `/ 1`'
a + / 1;

I guess the question is, what to do about it. It is what the user typed, so there’s an argument to be made that the compiler should just give them what they asked for. Though I can see the point that embedded JS is assumed to interact with the code around it, and therefore can complete a hanging expression the way a comment can’t.

What would you suggest?

vendethiel · 2017-05-07T20:21:23Z

The error is expected, that's literal js.

GeoffreyBooth · 2017-05-07T22:40:29Z

I just don’t know what the solution would be. If I change it back to a statement, the comments in arrays get wrapped:

a = [
      (function() {
        /*
          Comment
         */
      })(), 3
    ];

I think ideally we would tell the lexer to just skip over these tokens as if they didn’t exist, similar to what it’s doing with line comments (although it discards those). Then a block comment could go anywhere, and its behavior would mimic how block comments work in JavaScript. I just don’t know if such behavior is possible.

If we can’t figure out a way to achieve that, is it preferable to not allow block comments in arrays (and probably other places) so that the compiler can throw errors? I agree that we should catch a + ###a### if we can, but I’m inclined to think of that as its own issue, not one that necessarily needs to be solved in this PR.

For future reference, here’s the test:

test "Multiline comment not treated as value", ->
  throws -> CoffeeScript.compile 'a + ### Comment ###'

GeoffreyBooth · 2017-05-08T07:38:22Z

So I thought I had a solution for comments in general:

Currently, inside lexer.coffee commentToken, both line and block comments are parsed. Nothing happens to line comments, and for block comments a HERECOMMENT token is created.
What if instead of discarding the line comment, or creating a token for the block comment, we save the comment data as a property on the previous token? And if there is no previous token, e.g. because we’re at the beginning of the file, we create a TERMINATOR newline token to hold the data (or some other better placeholder).
The tokens get parsed as usual, but just after generating the JavaScript for a token, we check to see if the token has stowaway comment data attached to it. If it does, add some extra JavaScript for the line or block comment.

I said “thought” because while I’ve figured out how to save the comment data to the previous token, like our current spaced or newLine token properties, I can’t seem to figure out how to retrieve it again later. It doesn’t seem accessible in nodes.coffee, I guess because the trip through Jison’s parser.js discards it. Is there some way to preserve it all the way through, like we’re currently doing for location data? Or is location data a special case, supported as understood properties by Jison, and we can’t pass through arbitrary properties?

jashkenas · 2017-05-08T14:41:15Z

Yeah, this particular change is a step backwards — but comment parsing is really tricky in general.

We want them to be invisible to the AST, but at the same time, maintain their position in the AST for later printing.

Your new plan sounds more correct than what we're doing now. It would be a big job, but it would work for both line comments and block comments. If you really want to tackle this change, here's what I'd suggest:

Inside lexer.coffee, line comments and block comments are tokenized. But they're never added to the token stream. Instead, they're pushed onto a .comments array attached to the previous token.
The AST is walked, and the output JavaScript generated as normal.
The AST is walked again by a new function, this time looking only for .comments. The comments are formatted and inserted at the next valid comment location (probably after the next semicolon, before the end of the line.)
(Perhaps the sourcemap location information can be leveraged to make this easier? Or perhaps doing this in a second pass will destroy all of the sourcemapping?)

In any case, this particular PR isn't a great idea. If they're parsed at all, comments are statements, not expressions.

…d in closures

connec · 2017-05-08T16:19:49Z

@jashkenas would it make more sense to attach the comments to the next token instead?

I imagine it'd be a bit more fiddly in the lexer (remember comments until you see a non-comment token), but in general comments tend to relate to the following expression(s) rather than the preceding one(s). Having that association would probably enable better output formatting and potentially be useful to intermediate tooling (thinking type annotations etc.).

jashkenas · 2017-05-08T16:56:30Z

As a note -- if we're going to fix block comments in array literals, we need to make sure they don't create holes in the array, or extra commas, etc.

@connec Attaching the comments to the next token would be fine too. I don't think it makes 'em much easier or harder.

What would make this all easier and more informed would be for someone to research how other tools solve this problem. There are probably other JS transpilers out there that do a good job preserving comments.

GeoffreyBooth · 2017-05-10T06:22:10Z

At the moment I’m trying to get this to work with block comments back as statements, leaving the larger comments refactor for another PR. I’m struggling to get the grammar to detect what I want. Using this test:

  a = [
    ###
      Comment
    ###
    3
  ]

The debugging line in the rewriter returns these tokens:

IDENTIFIER/a =/= [/[ INDENT/2 HERECOMMENT/
  Comment
 TERMINATOR/
 NUMBER/3 OUTDENT/2 TERMINATOR/
 ]/] TERMINATOR/

And yet this grammar doesn’t seem to detect them:

  Array: [
    o '[ ArgList OptComma ]',               -> new Arr $2
  ]

  ArgList: [
    o 'Arg',                                              -> [$1]
    o 'ArgList , Arg',                                    -> $1.concat $3
    o 'ArgList OptComma TERMINATOR Arg',                  -> $1.concat $4
    o 'INDENT ArgList OptComma OUTDENT',                  -> $2
    o 'Comment TERMINATOR Arg',                           -> $1.concat $3
  ]

This befuddles me. If Comment TERMINATOR Arg is an ArgList, and INDENT ArgList OptComma OUTDENT is an ArgList, then the former inside the latter should also be an ArgList: INDENT Comment TERMINATOR Arg OptComma OUTDENT. And since the OptComma is optional, this matches the token string reported by the rewriter. So why do I get this error?

failed 1 and passed 907 tests in 2.91 seconds

  test/comments.coffee:417:8: error: unexpected newline
    ###
       ^

The line in question is the closing ### of the comment in my test.

jashkenas · 2017-05-10T13:55:20Z

Honestly, I wouldn't bother trying to get this to "work" with statements — it's a hacky model, and a statement cannot logically exist within an array.

If you want to fix this ticket, I'd suggest this:

Detect whenever a block comment node is forced to become an expression by function wrapping, and throw an error.

…ock-in-array-literal

…’re not true statements; remove test that tries to get comments to work in arrays (basically revert back to the beginning)

GeoffreyBooth · 2017-05-14T23:38:17Z

When block comments are treated as statements, a block comment in an array always throws the unexpected newline error—it doesn’t get past the parser for nodes.coffee to try to compile it. So checking for it to be wrapped in a closure doesn’t do anything, as it never gets that far.

With block comments reverted to statements, there’s not much point to this PR anymore. It doesn’t solve the original problem, and all it does is make the grammar for arrays mirror the grammar for objects; which is something, but probably not enough of an improvement to justify a PR.

xixixao · 2017-06-04T22:05:01Z

I'd just add that whether the comments get attached to the next or the previous token is probably an important issue that the proper PR will need to tackle correctly, to make sure the comments end up in the right place after we insert implicit tokens into the stream.

GeoffreyBooth · 2017-06-04T22:34:59Z

I've made a lot of progress on this, I hope to submit a new PR soon. My approach has been to treat comments as a property of a token, like the location data, and neither inline nor multiline comments become tokens of their own.

GeoffreyBooth · 2017-06-14T09:12:56Z

Attempting a new approach in #4572.

GeoffreyBooth added 9 commits May 6, 2017 16:05

Comments tests should use same toJS function as the other tests

6076b51

Add simple multiline comment in array test

36c8d0e

Array grammar should mirror object grammar

8821642

Multiline/block comments are now expressions, not statements, giving …

89ab2ed

…them parity with embedded JavaScript; no longer prepend a newline to block comments; make sure block comments still are output without a trailing semicolon; support block comments in arrays

Block comments are no longer ignored in the ‘directive prologue’, so …

956cdbf

…you’ll need to find another solution if you need a comment above ‘use strict’

Block comments are no longer statements, so an empty conditional no l…

c1b3b80

…onger evaluates into empty braces; this is just a breaking change, though it’s a bit ridiculous why someone would rely on block comments triggering an empty block when regular comments don’t

Update tests for new output that lacks a preceding newline; add tests…

fc06f2e

… for block comments in array literals

We should keep this test

1569dd4

Now that block comments are expressions, if expressions need to out…

e8286ba

…put undefined if the if or else blocks are nothing but a block comment expression

GeoffreyBooth added change enhancement labels May 7, 2017

GeoffreyBooth added this to the 2.0.0 milestone May 7, 2017

GeoffreyBooth requested a review from lydell May 7, 2017 09:28

lydell requested changes May 7, 2017

View reviewed changes

Block comment in an array shouldn’t be its own element

36f665b

Style/comment cleanup

2137f72

Make block comments statements again; make sure they don’t get wrappe…

abd33d7

…d in closures

GeoffreyBooth changed the title ~~[CS2] Block comments in array literals; block comments are now expressions~~ [CS2] Block comments in array literals May 8, 2017

GeoffreyBooth added 2 commits May 8, 2017 23:10

Add test for invalid JS

cb750a6

Reset output

d05a168

GeoffreyBooth changed the title ~~[CS2] Block comments in array literals~~ [WIP][CS2] Block comments in array literals May 9, 2017

GeoffreyBooth added 3 commits May 14, 2017 12:04

Merge branch '2' of github.com:jashkenas/coffeescript into comment-bl…

4fb39f7

…ock-in-array-literal

Throw an error on comments that would be wrapped in closures, as they…

029d71a

…’re not true statements; remove test that tries to get comments to work in arrays (basically revert back to the beginning)

Merge branch '2' into comment-block-in-array-literal

be549a6

Merge branch '2' into comment-block-in-array-literal

df5e27c

GeoffreyBooth mentioned this pull request May 18, 2017

CS2 Discussion: Project: Progress as of May 2017: 2.0.0-beta2! coffeescript6/discuss#83

Closed

GeoffreyBooth mentioned this pull request Jun 14, 2017

[CS2] Comments #4572

Merged

GeoffreyBooth closed this Jun 14, 2017

GeoffreyBooth deleted the comment-block-in-array-literal branch August 3, 2017 05:19

coffeescriptbot mentioned this pull request Feb 19, 2018

CS2 Discussion: Project: Progress as of May 2017: 2.0.0-beta2! #4976

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][CS2] Block comments in array literals #4541

[WIP][CS2] Block comments in array literals #4541

GeoffreyBooth commented May 7, 2017

lydell left a comment •

edited

Loading

GeoffreyBooth commented May 7, 2017

vendethiel commented May 7, 2017

GeoffreyBooth commented May 7, 2017 •

edited

Loading

GeoffreyBooth commented May 8, 2017

jashkenas commented May 8, 2017

connec commented May 8, 2017

jashkenas commented May 8, 2017

GeoffreyBooth commented May 10, 2017

jashkenas commented May 10, 2017

GeoffreyBooth commented May 14, 2017

xixixao commented Jun 4, 2017

GeoffreyBooth commented Jun 4, 2017

GeoffreyBooth commented Jun 14, 2017

[WIP][CS2] Block comments in array literals #4541

[WIP][CS2] Block comments in array literals #4541

Conversation

GeoffreyBooth commented May 7, 2017

lydell left a comment • edited Loading

Choose a reason for hiding this comment

GeoffreyBooth commented May 7, 2017

vendethiel commented May 7, 2017

GeoffreyBooth commented May 7, 2017 • edited Loading

GeoffreyBooth commented May 8, 2017

jashkenas commented May 8, 2017

connec commented May 8, 2017

jashkenas commented May 8, 2017

GeoffreyBooth commented May 10, 2017

jashkenas commented May 10, 2017

GeoffreyBooth commented May 14, 2017

xixixao commented Jun 4, 2017

GeoffreyBooth commented Jun 4, 2017

GeoffreyBooth commented Jun 14, 2017

lydell left a comment •

edited

Loading

GeoffreyBooth commented May 7, 2017 •

edited

Loading