scanner: Improve regular expression in "scanner".scanHeredoc(). #245

octo · 2018-04-03T15:11:02Z

This PR fixes two issues when parsing heredoc strings:

The regular expression was not anchored to the beginning of the line, allowing arbitrary garbage in front of the delimiter.
The regular expression did not accept an arbitrary number of cartridge returns. One optional cartridge return (\r) and one newline (\n) were considered a line break. If a line ends in multiple cartridge returns, e.g. EOF\r\r\n, it was not considered to end the heredoc string with delimiter EOF. However, the formatter removes one cartridge return, so that a repeated parsing would consider the same line to end the heredoc string, resulting in a different interpretation of the input.

In most cases parsing the output results in a syntax error, but it is fairly easy to create inputs that change semantic due to formatting. For example, "x=<<_\n_\r\r\ny=1\nz=<<_\n_\n" evaluates to x = <string> initially, but after reformatting it evaluates to x = <string>, y = <int>, z = <string>.

Kudos to dvyukov/go-fuzz for finding this!

When there are multiple cartridge returns at the end of the line, the regular expression will consider n-1 of them to be part of the string. Later, the last `\r` is removed. That may mean that a line that did previously *not* terminate a heredoc string may now terminate it, changing the meaning of the HCL file.

mitchellh · 2018-04-03T17:00:30Z

Thanks!

octo added 4 commits April 3, 2018 16:16

printer: Add another failing input to TestFormatParsable.

89240c3

scanner: Anchor heredoc-regexes at beginning of line.

13daa63

printer: Add another failing input to TestFormatParsable.

6a21c5a

mitchellh merged commit c247bd0 into hashicorp:master Apr 3, 2018

octo deleted the cartridge-return branch April 3, 2018 17:37

VladRassokhin mentioned this pull request Sep 29, 2018

Heredoc breaks formatting (<<EOF...EOF) VladRassokhin/intellij-hcl#164

Closed

5 tasks

apparentlymart mentioned this pull request Feb 8, 2019

0.12upgrade cannot parse heredoc when final EOF is inline hashicorp/terraform#20271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanner: Improve regular expression in "scanner".scanHeredoc(). #245

scanner: Improve regular expression in "scanner".scanHeredoc(). #245

octo commented Apr 3, 2018

mitchellh commented Apr 3, 2018

scanner: Improve regular expression in "scanner".scanHeredoc(). #245

scanner: Improve regular expression in "scanner".scanHeredoc(). #245

Conversation

octo commented Apr 3, 2018

mitchellh commented Apr 3, 2018