Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE on "lexer accepted unterminated literal with trailing slash" #62913

Closed
dwrensha opened this issue Jul 23, 2019 · 0 comments · Fixed by #62917
Closed

ICE on "lexer accepted unterminated literal with trailing slash" #62913

dwrensha opened this issue Jul 23, 2019 · 0 comments · Fixed by #62917
Assignees
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@dwrensha
Copy link
Contributor

dwrensha commented Jul 23, 2019

I'm getting an internal compiler on the following input (found by fuzz-rustc):

"\u\\"
error: incorrect unicode escape sequence
 --> main.rs:1:2
  |
1 | "\u\\"
  |  ^^^ incorrect unicode escape sequence
  |
  = help: format of unicode escape sequences is `\u{...}`

thread 'rustc' panicked at 'lexer accepted unterminated literal with trailing slash', src/libsyntax/parse/unescape_error_reporting.rs:194:13
stack backtrace:
   0: std::panicking::default_hook::{{closure}}
   1: std::panicking::default_hook
   2: rustc::util::common::panic_hook
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: syntax::parse::unescape_error_reporting::emit_unescape_error
   6: syntax::parse::unescape::unescape_str
   7: syntax::parse::lexer::StringReader::try_next_token
   8: syntax::parse::lexer::StringReader::next_token
   9: syntax::parse::lexer::tokentrees::TokenTreesReader::parse_all_token_trees
  10: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader>::into_token_trees
  11: syntax::parse::maybe_file_to_stream
  12: syntax::parse::maybe_source_file_to_parser
  13: syntax::parse::source_file_to_parser
  14: syntax::parse::parse_crate_from_file
  15: rustc_interface::passes::parse::{{closure}}
  16: rustc::util::common::time
  17: rustc_interface::passes::parse
  18: rustc_interface::queries::Query<T>::compute
  19: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::parse
  20: rustc_interface::interface::run_compiler_in_existing_thread_pool
  21: std::thread::local::LocalKey<T>::with
  22: syntax::with_globals
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
query stack during panic:
end of query stack
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.38.0-nightly (e649e9034 2019-07-22) running on x86_64-apple-darwin

I'm seeing the error on nightly, beta, and stable.

@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 23, 2019
@estebank estebank self-assigned this Jul 23, 2019
Centril added a commit to Centril/rust that referenced this issue Jul 24, 2019
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 7, 2023
Note that from_token_lit was looking for errors but never finding them!

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.
- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed.
  XXX: insert a new test covering more of that
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 8, 2023
Note that from_token_lit was looking for errors but never finding them!

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed.
  XXX: insert a new test covering more of that

- XXX: explain the tests that needed to be split

- XXX: tests/ui/parser/unicode-control-codepoints.rs: just reordered errors

- XXX: tests/rustdoc-ui/ignore-block-help.rs: relies on a parsing error
  occurring. The error present was an unescaping error, which is now
  delayed to after parsing. So the commit changes it to an "unterminated
  character literal" error which still occurs during parsing.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 8, 2023
Note that from_token_lit was looking for errors but never finding them!

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed.
  XXX: insert a new test covering more of that

- XXX: explain the tests that needed to be split

- XXX: tests/ui/parser/unicode-control-codepoints.rs: just reordered errors

- XXX: tests/rustdoc-ui/ignore-block-help.rs: relies on a parsing error
  occurring. The error present was an unescaping error, which is now
  delayed to after parsing. So the commit changes it to an "unterminated
  character literal" error which still occurs during parsing.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 11, 2023
Note that from_token_lit was looking for errors but never finding them!

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed.
  XXX: insert a new test covering more of that

- XXX: explain the tests that needed to be split

- XXX: tests/ui/parser/unicode-control-codepoints.rs: just reordered errors

- XXX: tests/rustdoc-ui/ignore-block-help.rs: relies on a parsing error
  occurring. The error present was an unescaping error, which is now
  delayed to after parsing. So the commit changes it to an "unterminated
  character literal" error which still occurs during parsing.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 11, 2023
Currently string literals are unescaped twice.
- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.
- Again in `LitKind::from_token_lit`, which is mostly called when
  lowering AST to HIR, but also in a few other places during expansion.
  This one actually constructs the unescaped string. It also has error
  checking code, but that code handling the error cases is actually dead
  (and has several bugs) because the check during lexing catches all
  errors!

This commit removes the checking during lexing, and fixes up
`LitKind::from_token_lit` so it properly does both checking and
construction. This is a language change: some programs now compile that
previously did not. For example, it is now possible for macros to be
passed "invalid" string literals like "\a\b\c". This is a continuation
of a trend of delaying semantic error checking of literals to after
expansion, e.g. rust-lang#102944 did this for some cases for numeric literals,
and the detection of NUL chars in C string literals is already delayed
in this way.

XXX: have Session::report_lit_errors?
XXX: have LitKind::from_token_lit so you don't need the .0?

Things to note:
- `LitError` has a new `EscapeError` variant.
- `LitKind::from_token_lit`'s return value changed, to produce multiple
  errors/warnings, and also to handle lexer warnings. This latter case
  is annoying but necessary to preserve existing warning behaviour.
- `report_lit_error` becomes `report_lit_errors`, in order to handle
  multiple errors in a single string literal.

Notes about test changes:

- `tests/rustdoc-ui/ignore-block-help.rs`: this relies on a parsing
  error occurring. The error present was an unescaping error, which is
  now delayed to after parsing. So the commit changes it to an
  "unterminated character literal" error which continues to occurs
  during parsing.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

  XXX: insert a new test covering more of that

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 12, 2023
XXX: need more tests

Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `tests/rustdoc-ui/ignore-block-help.rs`: this requires a parse
  error occurring. The error used was an unescaping error, which is now
  delayed to after parsing. So the commit changes it to an "unterminated
  character literal" error which still occurs during parsing.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 12, 2023
Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `ignore-block-help.rs`: this requires a parse error for the test to
  work. The error used was an unescaping error, which is now delayed to
  after parsing. So the commit changes it to an "unterminated character
  literal" error which still occurs during parsing.

- `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases,
  due to delayed literal unescaping.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 12, 2023
Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `ignore-block-help.rs`: this requires a parse error for the test to
  work. The error used was an unescaping error, which is now delayed to
  after parsing. So the commit changes it to an "unterminated character
  literal" error which still occurs during parsing.

- `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases,
  due to delayed literal unescaping.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 18, 2023
Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `ignore-block-help.rs`: this requires a parse error for the test to
  work. The error used was an unescaping error, which is now delayed to
  after parsing. So the commit changes it to an "unterminated character
  literal" error which still occurs during parsing.

- `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases,
  due to delayed literal unescaping.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 18, 2023
Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `ignore-block-help.rs`: this requires a parse error for the test to
  work. The error used was an unescaping error, which is now delayed to
  after parsing. So the commit changes it to an "unterminated character
  literal" error which still occurs during parsing.

- `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases,
  due to delayed literal unescaping.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
nnethercote added a commit to nnethercote/rust that referenced this issue Dec 19, 2023
Currently string literals are unescaped twice.

- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.

- Again in `LitKind::from_token_lit`, which is called when lowering AST
  to HIR, and also in a few other places during expansion. This one
  actually constructs the unescaped string. It also has error checking
  code, but that part of the code is actually dead (and has several
  bugs) because the check during lexing catches all errors!

This commit removes the error-check-only unescaping during lexing, and
fixes up `LitKind::from_token_lit` so it properly does both checking and
construction. This is a backwards-compatible language change: some
programs now compile that previously did not. For example, it is now
possible for macros to consume "invalid" string literals like "\a\b\c".
This is a continuation of a trend of delaying semantic error checking of
literals to after expansion:
- rust-lang#102944 did this for some cases for numeric literals
- The detection of NUL chars in C string literals is already delayed in
  this way.

Notes about test changes:

- `ignore-block-help.rs`: this requires a parse error for the test to
  work. The error used was an unescaping error, which is now delayed to
  after parsing. So the commit changes it to an "unterminated character
  literal" error which still occurs during parsing.

- `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases,
  due to delayed literal unescaping.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

- A couple of other test produce the same errors, just in a different
  order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants