-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triple backticks to allow creation of JavaScript blocks #4357
Triple backticks to allow creation of JavaScript blocks #4357
Conversation
``` | ||
// This is a comment with `backticks` | ||
var a = 42; | ||
var b = `foo ${'bar'}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no 'eq' for variable b in the ""block inline JavaScript containing backticks" test. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as I didn't want to rely on ES2015 in the 1.x branch. Though I guess I am already for it to parse the b
line. Hmm...
@@ -900,7 +907,7 @@ CODE = /^[-=]>/ | |||
|
|||
MULTI_DENT = /^(?:\n[^\n\S]*)+/ | |||
|
|||
JSTOKEN = /^`[^\\`]*(?:\\.[^\\`]*)*`/ | |||
JSTOKEN = /^```([\s\S]*?)```|^`[^\\`]*(?:\\.[^\\`]*)*`/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the added regex part as "capture everything between starting and end triple backquotes, stopping at first-seen ending triple quotes", which seems sane.
The original regex is rather obtuse. I read it as "greedily read everything between starting and ending single backquotes, and make sure every backslash has a character following it". I'm thinking the backslash+character part exists to stop an escaped backslash preemptively closing the Javascript block. E.g:
OK: `a\``
Not OK: `a\`
The new triple backticks regex will behave differently:
OK: ```a\````
Also OK: ```a````
I found this website useful: https://regex101.com/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3760 is actually complaining about just what you describe: a backslash not followed by a character (in that case, it's followed by a newline). Perhaps the original regex is incorrect here?
What would you use as the regex, if you don't mind taking the time to play with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing this way is inconsistent with heredocs and heregexen, where backslash can be used to escape characters adjacent to the delimeters:
"""\"hello\"""" # '"hello"'
///\/hello\//// # /\/hello\//
```\`hello\```` # ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my last comment, I’m suggesting we convert \`` to ``` wherever
`` appears inside a single- or triple-backtick block. That should cover this case, yes?
This is great! I'd only suggest adding some complicated tests involving nested levels of backticks / interpolation. That's usually where things get tricky to lex. |
@jashkenas more complicated than this? |
Could somebody explain with words:
Do backslashes inside JS block affect where the block ends? |
I'm probably not the expert in the details of it — but I'd imagine that the idea is to be sensible about escaping the backticks in the same way that we try to be sensible about escaping quotes in a heredoc. So: A So: A |
@jashkenas What do you mean by uninterpolated? I would’ve answered this way:
|
`\`a\n\`` ↓ `a\n`; Is the above correct? Just checking if I understand the rules or not. |
@lydell — The above looks correct to me. @GeoffreyBooth: By uninterpolated, I mean this:
The backtick inside the interpolation doesn't kill the expression halfway. Ideally the same would be true for triple backticks. Edit: Hah, nevermind. I just tested it and I guess we don't allow interpolations in JS. Forget it. |
@jashkenas I wasn’t aware that we supported interpolation between backticks, or wanted to. @lydell Your example, if saved into a file that is compiled via \`a\n\`; So clearly somehow my “escaping works!” test is flawed. I guess some escaping is happening in the test runner itself? Anyway it appears we’re not escaping backticks (that is, outputting just ``` for ```), we’re just skipping over escaped backticks until we find a non-escaped one to end the token on. I can’t see any benefit to our current behavior. |
Yes. We don't. My mistake. |
Might be worth noting that the main problem in the issues you linked was for tagged template literals, which will hopefully soon be merged. |
@connec Yes, but this should get fixed anyway. This future-proofs us against any other weird things ECMAScript wants to use backticks for. I figured out why my test passed when you wouldn’t expect it to: JavaScript converts ``` into ``` wherever it sees it: I’ll update this PR soon, but just so we’re clear on the intended behavior:
? |
That sounds fine to me. I wonder if we can leverage the code paths used to handle heredocs and heregexen? This is an area of the parser I've not looked at. |
Okay, it took quite some doing to get `````hello````` to match out to the last backtick, and not one before it (i.e. treating the ``` as not part of the closing triple backticks). @connec or anyone else, can you please check the new regex? Escaped backticks are now replaced with just backticks in the JavaScript code, so you can write code like this:
I looked into somehow getting dangling backslashes at the end of a line to not stop the capture (the specific bug in #3760) but I ultimately don’t think it’s worth the effort: a backslash before a newline is invalid JavaScript, so it will never compile correctly. I think this is everything? |
I think I'd approach this problem something like this: JSTOKEN = ///^ `(?!``) ((?: [^`\\] | \\[\s\S] )*) ` ///
HERE_JSTOKEN = ///^ ``` ((?: [^`\\] | \\[\s\S] | `(?!``) )*) ``` ///
test = (chunk) ->
match = HERE_JSTOKEN.exec(chunk) or JSTOKEN.exec(chunk)
console.log chunk, match
if match
script = match[1].replace(/\\+(`|$)/g, (string) ->
string[-Math.ceil(string.length / 2)..]
)
return script
return null
console.log([
test('``````') # == ''
test('`\\\\`') # == '\\'
test('`\\\\\\`a`') # == '\\`a'
test('```\\```') # == null
]) |
@lydell I'm not sure I follow; aside from more readable regexes, what problems with the PR does your version solve? |
You should be able to output a backslash before a backtick, or at the end. For example |
@lydell I would think that Technically speaking, converting ``` to ``` is also breaking backward compatibility, but I think this is okay since the failure to do this conversion is more of a bug than a feature. The only time a backtick is ever escaped in JavaScript is within a template literal, but people basically haven’t been able to write those yet in CoffeeScript because of the lack of any way to escape backticks. |
I only meant that backslashes should be handled specially before backticks and at the end of the block (which my code does). In other words, |
@lydell can you please explain script = match[1].replace /\\+(`|$)/g, (string) ->
string[-Math.ceil(string.length / 2)..] ? I replaced my regex with yours and adjusted the code as appropriate, but when I add this new |
482bc46
to
45b7193
Compare
45b7193
to
0d5449f
Compare
return 0 unless @chunk.charAt(0) is '`' and | ||
(match = HERE_JSTOKEN.exec(@chunk) or JSTOKEN.exec(@chunk)) | ||
[js] = match | ||
script = if js[0..2] is '```' then js[3...-3] else js[1...-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the capturing group instead of this trickery? match[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old regex didn’t have a capturing group (see the original code that this PR revises). But yes, match[1]
is better.
Can you please explain your replace
function? Or rewrite it to itself use a lot less sorcery? What exactly are we trying to replace, and with what, when it comes to escaped backticks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the raw text inside a JSTOKEN:
\` → `
\\` → can't happen
\\\` → \`
\\\\` → can't happen
\\\\\` → \\`
...
Go from left to right. Take a backslash and the character after it.
Replace with the second of those two. That's the rule.
Or, observe the pattern: We simply end up with the second half
of a \\\\\\\` sequence!
And, for completeness, at the end of the JSTOKEN:
\ → can't happen
\\ → \
\\\ → can't happen
\\\\ → \\
\\\\\ → can't happen
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So double backslashes are left alone except when followed by an escaped backtick or the end of the JS block. Seems a bit convoluted to try to explain in documentation. Can we come up with a simpler rule?
Currently the PR always converts ``` to a backtick, and that's it. So there's no way to render an escaped backtick inside a template literal. One way to solve that problem is to convert the escaped backticks only in the single-backtick tokens, and leave the triple-backtick ones alone. If some needs an escaped backtick in the JS, they use triple backticks. If someone needs three consecutive backticks in their JS, they use single backticks and escape the triplet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://xkcd.com/1638/ I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour makes sense to me, I'd describe it as:
A backslash (
\
) can be used to escape a backtick (```) or another backslash in a JS literal, and will otherwise be passed through.`hello` # hello; `\`hello\`` # `hello`; `\`Escaping backticks in JS: \\\`hello\\\`\`` # `Escaping backticks in JS: \`hello\``; `Single backslash: \ ` # Single backslash: \ ; `Single backslash at EOS: \\` # Single backslash at EOS: \; `Double backslash at EOS: \\\\` # Double backslash at EOS: \\; `Double backslash: \\\ ` # Double backslash: \\ ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start requiring backslashes to be escaped, that could break existing code. I think it's better to only convert ```, only within single-backtick blocks. That preserves backward compatibility, and gives people a way to output an escaped backtick (via a triple-backtick block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're only required to be escaped if they appear before a ```, so that shouldn't impact backward compatibility too much.
I think there's hesitation about releasing a syntax that would prevent users from rendering something (in this case, ``` in single-backticks, and ````` in triple-backticks) when there's a familiar, established solution.
Edit: to be clear, I'm not proposing to require that backslashes are escaped, just that you can escape them if you want to render a literal backslash before a backtick.
eq '\\`', ` | ||
// Inline JS | ||
"\\\`" | ||
` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my proposed escaping rules, this test needs to be written like this:
eq '\\`', `
// Inline JS
"\\\\\`"
`
or like this:
eq '\\a`', `
// Inline JS
"\\a\`"
`
…e backticks; additional tests
Okay, I’ll concede to the preferences of the crowd. I’ve updated the @connec, I added your examples as a new test, though I had to escape all the backslashes for them to be parsed correctly as strings. Since that made them hard to understand, I included all the before-and-afters as a big comment just above the test. To double-check it was really working, I created a `hello` # hello;
`\`hello\`` # `hello`;
`\`Escaping backticks in JS: \\\`hello\\\`\`` # `Escaping backticks in JS: \`hello\``;
`Single backslash: \ ` # Single backslash: \ ;
`Single backslash at EOS: \\` # Single backslash at EOS: \;
`Double backslash at EOS: \\\\` # Double backslash at EOS: \\;
`Double backslash: \\\ ` # Double backslash: \\ ; It becomes: hello;
`hello`;
`Escaping backticks in JS: \`hello\``;
Single backslash: \ ;
Single backslash at EOS: \;
Double backslash at EOS: \\;
Double backslash: \\\ ; The only difference is the last one: you had it outputting only two backslashes, instead of the three I get. But there’s no backtick following these backslashes, so I don’t think the lexer should be affecting them. Is everyone happy with this? |
The above is nothing but wrong and misleading. Do we have any tests for that |
@lydell Other backslashes are unaffected. I added a test. |
OK, cool.
This seems good to me. I'm still slightly dubious about only escaping ```s, but it makes sense that the literal JS should be as close as possible to what's written. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
Closes #1504 and #3760. Per Jeremy’s comment from 2011, this adds support for “heredoc” JavaScript blocks, e.g.:
Escaped backticks are output as just backticks: