Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax: unable to parse heredoc inside backtick #729

Closed
ristomcgehee opened this issue Sep 15, 2021 · 5 comments · Fixed by #787
Closed

syntax: unable to parse heredoc inside backtick #729

ristomcgehee opened this issue Sep 15, 2021 · 5 comments · Fixed by #787

Comments

@ristomcgehee
Copy link

First off, I'd like to say thank you for maintaining this project. We're using your syntax parser in ossf/scorecard shell files from thousands of repos. We're getting an error when trying to parse a specific file, and since the syntax is valid bash, I'm opening this issue. I've created a small program that reproduces the error:

#!/usr/bin/env bash

OUTPUT=`passwd 2>&1 << EOT
old_password
old_password
new_password
EOT`

echo $OUTPUT

Passing this file to shfmt gives this error:

<standard input>:7:4: reached EOF without closing quote `
@mvdan
Copy link
Owner

mvdan commented Sep 15, 2021

Thanks for filing this detailed issue! Parsing backticks is indeed tricky. There's also #636.

In general these edge cases haven't been a huge problem, because backticks have been deprecated for a while and most people use $(). Still seems worthwhile to try to fix the parser if it's possible, though. Better compatibility is generally a good thing, as long as it doesn't bring significant penalties.

@mvdan
Copy link
Owner

mvdan commented Sep 15, 2021

By the way, I took a quick look at your code, and I see you're trying to look at what commands people are running in their scripts. You can do this directly with the syntax package, but it's pretty manual as you only have the syntax tree. Have you seen the expand package? For example: https://pkg.go.dev/mvdan.cc/sh/v3/expand#Fields

@ristomcgehee
Copy link
Author

I had not seen the expand package, but it looks pretty neat. I think it's a little more than we need at the moment since we don't know the environment that these shell files are executing in.

@mvdan mvdan changed the title Unable to parse heredoc inside backtick syntax: unable to parse heredoc inside backtick Oct 2, 2021
mvdan added a commit that referenced this issue Jan 1, 2022
Our parser assumed that a heredoc must always end with a newline.
Unfortunately, the following is valid shell:

	`foo <<EOF
	body
	EOF`

Note the lack of a newline before the closing backquote.

The fix is relatively straightforward.
The two methods which tokenize heredoc bodies,
advanceLitHdoc and quotedHdocWord,
must learn to treat (r == '`' && p.backquoteEnd())
as an equivalent to the simpler case (r == '\n').

Note that we also make backquoteEnd more aggressive;
right now, it returns true even if we're in a nested quote state.
This is required because heredoc bodies use their own quote state,
and otherwise we wouldn't realise we're closing a backtick.

This seems like a good change, because backticks are special in shell.
They seem to tokenize at a much lower level,
which allows for bits of code like the one quoted above,
as well as:

	arg0 `# actually an inline comment without a newline!` \
		arg1

Fixes #729.
@mvdan
Copy link
Owner

mvdan commented Jan 1, 2022

I've sent #787, which should fix this issue. A review, or a confirmation that it fixes the problem for you, would be welcome :)

@mvdan mvdan closed this as completed in #787 Jan 2, 2022
mvdan added a commit that referenced this issue Jan 2, 2022
Our parser assumed that a heredoc must always end with a newline.
Unfortunately, the following is valid shell:

	`foo <<EOF
	body
	EOF`

Note the lack of a newline before the closing backquote.

The fix is relatively straightforward.
The two methods which tokenize heredoc bodies,
advanceLitHdoc and quotedHdocWord,
must learn to treat (r == '`' && p.backquoteEnd())
as an equivalent to the simpler case (r == '\n').

Note that we also make backquoteEnd more aggressive;
right now, it returns true even if we're in a nested quote state.
This is required because heredoc bodies use their own quote state,
and otherwise we wouldn't realise we're closing a backtick.

This seems like a good change, because backticks are special in shell.
They seem to tokenize at a much lower level,
which allows for bits of code like the one quoted above,
as well as:

	arg0 `# actually an inline comment without a newline!` \
		arg1

Fixes #729.
@ristomcgehee
Copy link
Author

I've ran scorecard with the latest code in your master branch, and we're no longer getting an error on the repo we were getting an error on earlier. Thanks Daniel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants