-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syntax: unable to parse heredoc inside backtick #729
Comments
Thanks for filing this detailed issue! Parsing backticks is indeed tricky. There's also #636. In general these edge cases haven't been a huge problem, because backticks have been deprecated for a while and most people use |
By the way, I took a quick look at your code, and I see you're trying to look at what commands people are running in their scripts. You can do this directly with the syntax package, but it's pretty manual as you only have the syntax tree. Have you seen the expand package? For example: https://pkg.go.dev/mvdan.cc/sh/v3/expand#Fields |
I had not seen the expand package, but it looks pretty neat. I think it's a little more than we need at the moment since we don't know the environment that these shell files are executing in. |
Our parser assumed that a heredoc must always end with a newline. Unfortunately, the following is valid shell: `foo <<EOF body EOF` Note the lack of a newline before the closing backquote. The fix is relatively straightforward. The two methods which tokenize heredoc bodies, advanceLitHdoc and quotedHdocWord, must learn to treat (r == '`' && p.backquoteEnd()) as an equivalent to the simpler case (r == '\n'). Note that we also make backquoteEnd more aggressive; right now, it returns true even if we're in a nested quote state. This is required because heredoc bodies use their own quote state, and otherwise we wouldn't realise we're closing a backtick. This seems like a good change, because backticks are special in shell. They seem to tokenize at a much lower level, which allows for bits of code like the one quoted above, as well as: arg0 `# actually an inline comment without a newline!` \ arg1 Fixes #729.
I've sent #787, which should fix this issue. A review, or a confirmation that it fixes the problem for you, would be welcome :) |
Our parser assumed that a heredoc must always end with a newline. Unfortunately, the following is valid shell: `foo <<EOF body EOF` Note the lack of a newline before the closing backquote. The fix is relatively straightforward. The two methods which tokenize heredoc bodies, advanceLitHdoc and quotedHdocWord, must learn to treat (r == '`' && p.backquoteEnd()) as an equivalent to the simpler case (r == '\n'). Note that we also make backquoteEnd more aggressive; right now, it returns true even if we're in a nested quote state. This is required because heredoc bodies use their own quote state, and otherwise we wouldn't realise we're closing a backtick. This seems like a good change, because backticks are special in shell. They seem to tokenize at a much lower level, which allows for bits of code like the one quoted above, as well as: arg0 `# actually an inline comment without a newline!` \ arg1 Fixes #729.
I've ran scorecard with the latest code in your master branch, and we're no longer getting an error on the repo we were getting an error on earlier. Thanks Daniel! |
First off, I'd like to say thank you for maintaining this project. We're using your syntax parser in ossf/scorecard shell files from thousands of repos. We're getting an error when trying to parse a specific file, and since the syntax is valid bash, I'm opening this issue. I've created a small program that reproduces the error:
Passing this file to
shfmt
gives this error:The text was updated successfully, but these errors were encountered: