Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triple backticks to allow creation of JavaScript blocks #4357

Merged
merged 9 commits into from
Nov 19, 2016

Conversation

GeoffreyBooth
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth commented Nov 14, 2016

Closes #1504 and #3760. Per Jeremy’s comment from 2011, this adds support for “heredoc” JavaScript blocks, e.g.:

```
var a = 1;
var b = 2;
```
c = 3
```var d = 4;```
eq a + b + c + d, 10

Escaped backticks are output as just backticks:

```var e = \`hello, world${'!'}\````
eq e, 'hello, world!'

```
// This is a comment with `backticks`
var a = 42;
var b = `foo ${'bar'}`;
Copy link
Contributor

@greghuc greghuc Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no 'eq' for variable b in the ""block inline JavaScript containing backticks" test. Is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I didn't want to rely on ES2015 in the 1.x branch. Though I guess I am already for it to parse the b line. Hmm...

@@ -900,7 +907,7 @@ CODE = /^[-=]>/

MULTI_DENT = /^(?:\n[^\n\S]*)+/

JSTOKEN = /^`[^\\`]*(?:\\.[^\\`]*)*`/
JSTOKEN = /^```([\s\S]*?)```|^`[^\\`]*(?:\\.[^\\`]*)*`/
Copy link
Contributor

@greghuc greghuc Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the added regex part as "capture everything between starting and end triple backquotes, stopping at first-seen ending triple quotes", which seems sane.

The original regex is rather obtuse. I read it as "greedily read everything between starting and ending single backquotes, and make sure every backslash has a character following it". I'm thinking the backslash+character part exists to stop an escaped backslash preemptively closing the Javascript block. E.g:

OK: `a\``
Not OK: `a\`

The new triple backticks regex will behave differently:

OK: ```a\````
Also OK: ```a````

I found this website useful: https://regex101.com/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3760 is actually complaining about just what you describe: a backslash not followed by a character (in that case, it's followed by a newline). Perhaps the original regex is incorrect here?

What would you use as the regex, if you don't mind taking the time to play with it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing this way is inconsistent with heredocs and heregexen, where backslash can be used to escape characters adjacent to the delimeters:

"""\"hello\"""" # '"hello"'
///\/hello\//// # /\/hello\//
```\`hello\```` # ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my last comment, I’m suggesting we convert \`` to ``` wherever `` appears inside a single- or triple-backtick block. That should cover this case, yes?

@jashkenas
Copy link
Owner

This is great! I'd only suggest adding some complicated tests involving nested levels of backticks / interpolation. That's usually where things get tricky to lex.

@GeoffreyBooth
Copy link
Collaborator Author

@jashkenas more complicated than this?

@lydell
Copy link
Collaborator

lydell commented Nov 14, 2016

Could somebody explain with words:

  • When a ``` JS block ends.
  • When a ````` JS block ends.

Do backslashes inside JS block affect where the block ends?

@jashkenas
Copy link
Owner

jashkenas commented Nov 14, 2016

I'm probably not the expert in the details of it — but I'd imagine that the idea is to be sensible about escaping the backticks in the same way that we try to be sensible about escaping quotes in a heredoc.

So: A JS block ends at the first unescaped and uninterpolated that follows.

So: A JS block ends at the first unescaped and uninterpolated that follows.

@GeoffreyBooth
Copy link
Collaborator Author

@jashkenas What do you mean by uninterpolated? I would’ve answered this way:

  • A JS block ends at the first unescaped that follows.
  • A JS block ends at the first that follows.

@lydell
Copy link
Collaborator

lydell commented Nov 14, 2016

`\`a\n\``

`a\n`;

Is the above correct? Just checking if I understand the rules or not.

@jashkenas
Copy link
Owner

jashkenas commented Nov 14, 2016

@lydell — The above looks correct to me.

@GeoffreyBooth: By uninterpolated, I mean this:

before #{"str`ing"} after

The backtick inside the interpolation doesn't kill the expression halfway. Ideally the same would be true for triple backticks.

Edit: Hah, nevermind. I just tested it and I guess we don't allow interpolations in JS. Forget it.

@GeoffreyBooth
Copy link
Collaborator Author

GeoffreyBooth commented Nov 14, 2016

@jashkenas I wasn’t aware that we supported interpolation between backticks, or wanted to.

@lydell Your example, if saved into a file that is compiled via coffee -bp test.coffee, outputs:

\`a\n\`;

So clearly somehow my “escaping works!” test is flawed. I guess some escaping is happening in the test runner itself? Anyway it appears we’re not escaping backticks (that is, outputting just ``` for ```), we’re just skipping over escaped backticks until we find a non-escaped one to end the token on. I can’t see any benefit to our current behavior.

@jashkenas
Copy link
Owner

@jashkenas I wasn’t aware that we supported interpolation between backticks, or wanted to.

Yes. We don't. My mistake.

@connec
Copy link
Collaborator

connec commented Nov 14, 2016

Might be worth noting that the main problem in the issues you linked was for tagged template literals, which will hopefully soon be merged.

@GeoffreyBooth
Copy link
Collaborator Author

@connec Yes, but this should get fixed anyway. This future-proofs us against any other weird things ECMAScript wants to use backticks for.

I figured out why my test passed when you wouldn’t expect it to: JavaScript converts ``` into ``` wherever it sees it:
image
So the test should compare a string against the expected JavaScript output, not the executed JavaScript.

I’ll update this PR soon, but just so we’re clear on the intended behavior:

  • A JS block ends at the first unescaped that follows. Escaped backticks are converted to simple backticks (i.e. ``` becomes ```).
  • A JS block ends at the first that follows. Escaped single backticks are converted to simple backticks (i.e. \`` becomes ```) so if you really want a triple-backtick in the generated JavaScript, you would write ````.

?

@connec
Copy link
Collaborator

connec commented Nov 14, 2016

That sounds fine to me. I wonder if we can leverage the code paths used to handle heredocs and heregexen? This is an area of the parser I've not looked at.

@GeoffreyBooth
Copy link
Collaborator Author

GeoffreyBooth commented Nov 15, 2016

Okay, it took quite some doing to get `````hello````` to match out to the last backtick, and not one before it (i.e. treating the ``` as not part of the closing triple backticks). @connec or anyone else, can you please check the new regex?

Escaped backticks are now replaced with just backticks in the JavaScript code, so you can write code like this:

`let greeting = \`hello, ${name}\``

I looked into somehow getting dangling backslashes at the end of a line to not stop the capture (the specific bug in #3760) but I ultimately don’t think it’s worth the effort: a backslash before a newline is invalid JavaScript, so it will never compile correctly.

I think this is everything?

@lydell
Copy link
Collaborator

lydell commented Nov 15, 2016

I think I'd approach this problem something like this:

JSTOKEN      = ///^ `(?!``) ((?: [^`\\] | \\[\s\S]           )*) `   ///
HERE_JSTOKEN = ///^ ```     ((?: [^`\\] | \\[\s\S] | `(?!``) )*) ``` ///

test = (chunk) ->
  match = HERE_JSTOKEN.exec(chunk) or JSTOKEN.exec(chunk)
  console.log chunk, match
  if match
    script = match[1].replace(/\\+(`|$)/g, (string) ->
      string[-Math.ceil(string.length / 2)..]
    )
    return script
  return null

console.log([
  test('``````') # == ''
  test('`\\\\`') # == '\\'
  test('`\\\\\\`a`') # == '\\`a'
  test('```\\```') # == null
])

@GeoffreyBooth
Copy link
Collaborator Author

@lydell I'm not sure I follow; aside from more readable regexes, what problems with the PR does your version solve?

@lydell
Copy link
Collaborator

lydell commented Nov 15, 2016

You should be able to output a backslash before a backtick, or at the end. For example \\\``` → `\`;` and \`` → \;. (We should add tests for even crazier numbers of backslashes as well, and tests for invalid stuff like \.)

@GeoffreyBooth
Copy link
Collaborator Author

@lydell I would think that \\\``` is supposed to become `\\`;`, and it does in this PR. Likewise, I think \`` should become \\;, as it does. It’s not our job to escape all JavaScript escaped characters; if the user types `\` between backticks, it should just be output as `\`. That’s the current behavior before this PR, so changing it could break backward compatibility. A backslash is normally escaped in JavaScript, so it’s probably pretty common to find `\` in backticked code.

Technically speaking, converting ``` to ``` is also breaking backward compatibility, but I think this is okay since the failure to do this conversion is more of a bug than a feature. The only time a backtick is ever escaped in JavaScript is within a template literal, but people basically haven’t been able to write those yet in CoffeeScript because of the lack of any way to escape backticks.

@lydell
Copy link
Collaborator

lydell commented Nov 15, 2016

I only meant that backslashes should be handled specially before backticks and at the end of the block (which my code does). In other words, \\ and \n, for example, are untouched. Otherwise you wouldn't be able to express anything inside a JS block (which is the point of this PR, isn't it?). For example, a template literal ending in a backslash: \`a\\\``` → a```

@GeoffreyBooth
Copy link
Collaborator Author

@lydell can you please explain

script = match[1].replace /\\+(`|$)/g, (string) ->
      string[-Math.ceil(string.length / 2)..]

? I replaced my regex with yours and adjusted the code as appropriate, but when I add this new replace function the first JavaScript literal test fails.

return 0 unless @chunk.charAt(0) is '`' and
(match = HERE_JSTOKEN.exec(@chunk) or JSTOKEN.exec(@chunk))
[js] = match
script = if js[0..2] is '```' then js[3...-3] else js[1...-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the capturing group instead of this trickery? match[1]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old regex didn’t have a capturing group (see the original code that this PR revises). But yes, match[1] is better.

Can you please explain your replace function? Or rewrite it to itself use a lot less sorcery? What exactly are we trying to replace, and with what, when it comes to escaped backticks?

Copy link
Collaborator

@lydell lydell Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the raw text inside a JSTOKEN:

\` → `
\\` → can't happen
\\\` → \`
\\\\` → can't happen
\\\\\` → \\`
...

Go from left to right. Take a backslash and the character after it.
Replace with the second of those two. That's the rule.
Or, observe the pattern: We simply end up with the second half
of a \\\\\\\` sequence!

And, for completeness, at the end of the JSTOKEN:

\ → can't happen
\\ → \
\\\ → can't happen
\\\\ → \\
\\\\\ → can't happen
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So double backslashes are left alone except when followed by an escaped backtick or the end of the JS block. Seems a bit convoluted to try to explain in documentation. Can we come up with a simpler rule?

Currently the PR always converts ``` to a backtick, and that's it. So there's no way to render an escaped backtick inside a template literal. One way to solve that problem is to convert the escaped backticks only in the single-backtick tokens, and leave the triple-backtick ones alone. If some needs an escaped backtick in the JS, they use triple backticks. If someone needs three consecutive backticks in their JS, they use single backticks and escape the triplet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@connec connec Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour makes sense to me, I'd describe it as:

A backslash (\) can be used to escape a backtick (```) or another backslash in a JS literal, and will otherwise be passed through.

`hello`                                       # hello;
`\`hello\``                                   # `hello`;
`\`Escaping backticks in JS: \\\`hello\\\`\`` # `Escaping backticks in JS: \`hello\``;
`Single backslash: \ `                        # Single backslash: \ ;
`Single backslash at EOS: \\`                 # Single backslash at EOS: \;
`Double backslash at EOS: \\\\`               # Double backslash at EOS: \\;
`Double backslash: \\\ `                      # Double backslash: \\ ;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start requiring backslashes to be escaped, that could break existing code. I think it's better to only convert ```, only within single-backtick blocks. That preserves backward compatibility, and gives people a way to output an escaped backtick (via a triple-backtick block).

Copy link
Collaborator

@connec connec Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're only required to be escaped if they appear before a ```, so that shouldn't impact backward compatibility too much.

I think there's hesitation about releasing a syntax that would prevent users from rendering something (in this case, ``` in single-backticks, and ````` in triple-backticks) when there's a familiar, established solution.

Edit: to be clear, I'm not proposing to require that backslashes are escaped, just that you can escape them if you want to render a literal backslash before a backtick.

eq '\\`', `
// Inline JS
"\\\`"
`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my proposed escaping rules, this test needs to be written like this:

eq '\\`', `
  // Inline JS
  "\\\\\`"
`

or like this:

eq '\\a`', `
  // Inline JS
  "\\a\`"
`

@GeoffreyBooth
Copy link
Collaborator Author

Okay, I’ll concede to the preferences of the crowd. I’ve updated the replace function per @lydell.

@connec, I added your examples as a new test, though I had to escape all the backslashes for them to be parsed correctly as strings. Since that made them hard to understand, I included all the before-and-afters as a big comment just above the test. To double-check it was really working, I created a test.coffee file and pasted your code in there:

`hello`                                       # hello;
`\`hello\``                                   # `hello`;
`\`Escaping backticks in JS: \\\`hello\\\`\`` # `Escaping backticks in JS: \`hello\``;
`Single backslash: \ `                        # Single backslash: \ ;
`Single backslash at EOS: \\`                 # Single backslash at EOS: \;
`Double backslash at EOS: \\\\`               # Double backslash at EOS: \\;
`Double backslash: \\\ `                      # Double backslash: \\ ;

It becomes:

hello;
`hello`;
`Escaping backticks in JS: \`hello\``;
Single backslash: \ ;
Single backslash at EOS: \;
Double backslash at EOS: \\;
Double backslash: \\\ ;

The only difference is the last one: you had it outputting only two backslashes, instead of the three I get. But there’s no backtick following these backslashes, so I don’t think the lexer should be affecting them.

Is everyone happy with this?

@lydell
Copy link
Collaborator

lydell commented Nov 17, 2016

`Double backslash: \\\ `                      # Double backslash: \\ ;

The above is nothing but wrong and misleading.


Do we have any tests for that '\\\n' becomes '\\\n' (in other words, making sure that we don't mess up other backslashes)?

@GeoffreyBooth
Copy link
Collaborator Author

@lydell Other backslashes are unaffected. I added a test.

@connec
Copy link
Collaborator

connec commented Nov 17, 2016

OK, cool.

I would definitely remove the Double backslash: \\\ ; test, or rename it to Triple backslash: \\\ ;, because as @lydell says it's currently misleading - Whoops, looks like that's already changed.

This seems good to me. I'm still slightly dubious about only escaping ```s, but it makes sense that the literal JS should be as close as possible to what's written.

@jashkenas
Copy link
Owner

I don't think that I can mentally expand these regexes well enough to comment on them materially — but if @lydell and @connec sign off on this, then I'm cool with it too.

@GeoffreyBooth
Copy link
Collaborator Author

@lydell and @connec, do you have any other notes or should we merge this in?

@vendethiel
Copy link
Collaborator

LGTM

Copy link
Collaborator

@connec connec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lydell
Copy link
Collaborator

lydell commented Nov 19, 2016

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants