Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping quote characters in raw string literals causes a tokenizer error #668

Closed
lpetre opened this issue Mar 30, 2022 · 3 comments · Fixed by #701
Closed

Escaping quote characters in raw string literals causes a tokenizer error #668

lpetre opened this issue Mar 30, 2022 · 3 comments · Fixed by #701
Labels
bug Something isn't working

Comments

@lpetre
Copy link
Contributor

lpetre commented Mar 30, 2022

This string breaks the native parser: rf"\"{feature_name.lower()}\""

$ echo 'rf"\"{feature_name.lower()}\""' | LIBCST_PARSER_TYPE=native python -m libcst.tool print - > /dev/null
Traceback (most recent call last):
  File "<string>", line 49, in <module>
  File "<string>", line 47, in __run
  File "lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "libcst/__main__.py", line 25, in <module>
    sys.exit(fbcode_main())
  File "libcst/__main__.py", line 19, in fbcode_main
    return libcst.tool.main(
  File "libcst/tool.py", line 834, in main
    return lookup.get(args.action or None, _invalid_command)(proc_name, command_args)
  File "libcst/tool.py", line 278, in _print_tree_impl
    tree = parse_module(
  File "libcst/_parser/entrypoints.py", line 109, in parse_module
    result = _parse(
  File "libcst/_parser/entrypoints.py", line 55, in _parse
    return parse(source_str)
libcst._exceptions.ParserSyntaxError: Syntax Error @ 0:1.
tokenizer error: unexpected characters after a line continuation

rf"\"{feature_name.lower()}\""
^
$
$ echo 'rf"\"{feature_name.lower()}\""' | python -m libcst.tool print - > /dev/null
$
@lpetre
Copy link
Contributor Author

lpetre commented Mar 30, 2022

Not sure if is the same issue, but this also breaks the native parser: f"regexp_like(path, '.*\{file_type}$')"

@lpetre
Copy link
Contributor Author

lpetre commented Mar 30, 2022

This is another example: f"\{{\}}" which fails both libcst parsers but python3 -m py_compile parses it.

@lpetre lpetre changed the title rf string parsing bug f-string parsing bugs Mar 30, 2022
@zsol zsol added the bug Something isn't working label Jun 14, 2022
@zsol
Copy link
Member

zsol commented Jun 14, 2022

Right, so after digging into this a bit, there are two bugs:

  1. f-strings and escaping {. I opened Escaping opening braces in f-strings causes a tokenizer error #699 and I think I have a fix for that.
  2. the original example in this issue has nothing to do with f-strings, and is a tokenizer bug where it doesn't respect escaping ' and " quotes in raw strings.

Apparently it's possible to write r"\"" in Python, and this string has two characters, a backslash and a double quote.

@zsol zsol changed the title f-string parsing bugs Escaping quote characters in raw string literals causes a tokenizer error Jun 14, 2022
@zsol zsol closed this as completed in #701 Jun 16, 2022
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Sep 16, 2022
0.4.7 - 2022-07-12

Fixed
* Fix get_qualified_names_for matching on prefixes of the given name by @lpetre in Instagram/LibCST#719

Added
* Implement lazy loading mechanism for expensive metadata providers by @Chenguang-Zhu in Instagram/LibCST#720


0.4.6 - 2022-07-04

New Contributors
- @superbobry made their first contribution in Instagram/LibCST#702

Fixed
- convert_type_comments now preserves comments following type comments by @superbobry in Instagram/LibCST#702
- QualifiedNameProvider optimizations
  - Cache the scope name prefix to prevent scope traversal in a tight loop by @lpetre in Instagram/LibCST#708
  - Faster qualified name formatting by @lpetre in Instagram/LibCST#710
  - Prevent unnecessary work in Scope.get_qualified_names_for_ by @lpetre in Instagram/LibCST#709
- Fix parsing of parenthesized empty tuples by @zsol in Instagram/LibCST#712
- Support whitespace after ParamSlash by @zsol in Instagram/LibCST#713
- [parser] bail on deeply nested expressions by @zsol in Instagram/LibCST#718


0.4.5 - 2022-06-17

New Contributors

-   @zzl0 made their first contribution in Instagram/LibCST#704

Fixed

-   Only skip supported escaped characters in f-strings by @zsol in Instagram/LibCST#700
-   Escaping quote characters in raw string literals causes a tokenizer error by @zsol in Instagram/LibCST#668
-   Corrected a code example in the documentation by @zzl0 in Instagram/LibCST#703
-   Handle multiline strings that start with quotes by @zzl0 in Instagram/LibCST#704
-   Fixed a performance regression in libcst.metadata.ScopeProvider by @lpetre in Instagram/LibCST#698


0.4.4 - 2022-06-13

New Contributors

-   @adamchainz made their first contribution in Instagram/LibCST#688

Added

-   Add package links to PyPI by @adamchainz in Instagram/LibCST#688
-   native: add overall benchmark by @zsol in Instagram/LibCST#692
-   Add support for PEP-646 by @zsol in Instagram/LibCST#696

Updated

-   parser: use references instead of smart pointers for Tokens by @zsol in Instagram/LibCST#691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants