Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.10+ support #285

Closed
isidentical opened this issue Apr 18, 2020 · 22 comments · Fixed by #597
Closed

3.10+ support #285

isidentical opened this issue Apr 18, 2020 · 22 comments · Fixed by #597

Comments

@isidentical
Copy link
Contributor

isidentical commented Apr 18, 2020

PEP 617 (if it is accepted, and it probably will) is out and I'm wondering if LibCST's underlying parser is capable of parsing PEG grammar. With 3.10, the LL(1) restriction on the grammar will be deferred and this means that lib2to3.pgen2 won't be able to parse new changes on the python grammar. I'm not sure about internals of LibCST but from what I have seen in readme that it uses something that bases on lib2to3.pgen2. Does LibCST will continue to support newer python versions and their grammar? (We are currently using lib2to3 as our refactoring tool on unimport but we might need to migrate another tool to support 3.10+ which is why I am asking)

@carljm
Copy link
Contributor

carljm commented Apr 18, 2020

I haven't discussed with the other maintainers yet, but here are some thoughts:

  1. My reading of the PEP is that it specifies that no LL(1)-incompatible change will be made to the language grammar until after 3.10 is released. That gives us some time to adapt and should mean that supporting both 3.9 and 3.10 with the current LibCST parser will be straightforward, so this issue should probably be titled "3.11+ support."

  2. In the long run I think we would want/need to rebase LibCST on top of a PEG parser in order to support Python grammar post-3.10. We will need to see how much of the code of the PEP 617 parser we can reuse for LibCST, given that we need syntactic trivia (whitespace, comments) and Python doesn't.

@carljm
Copy link
Contributor

carljm commented Apr 18, 2020

Oops, looks like I mis-read the PEP. It seems the old parser will be removed in 3.10 and non-LL(1) constructs may be added then. So we will need to address this in order to support Python 3.10.

@isidentical
Copy link
Contributor Author

In the long run I think we would want/need to rebase LibCST on top of a PEG parser in order to support Python grammar post-3.10. We will need to see how much of the code of the PEP 617 parser we can reuse for LibCST, given that we need syntactic trivia (whitespace, comments) and Python doesn't.

Do you know a good PEG parser for python that also constructs a CST, I'm not much familiar with PEG :/

@carljm
Copy link
Contributor

carljm commented Apr 19, 2020

No, I doubt any such thing exists. We, or someone, would have to build it. I just sent a message to the python-dev thread about PEP 617 inquiring about the potential for reusing some of pegen (the new parser in PEP 617) for this: https://mail.python.org/archives/list/[email protected]/thread/HOZ2RI3FXUEMAT4XAX4UHFN4PKG5J5GR/

@carljm
Copy link
Contributor

carljm commented Apr 22, 2020

It seems like one option would be to use https://github.com/gvanrossum/pegen as a starting point for a new Python 3.10 compatible parser.

@isidentical
Copy link
Contributor Author

Yes, that would be a great idea. If there is anything I can help you folks, just let me know.

@kamahen
Copy link

kamahen commented Jul 8, 2020

I'm thinking about augmenting stdlib ast module to have some lib2to3 features (because lib2to3 is going away). If this is of interest to you:
https://mail.python.org/archives/list/[email protected]/thread/X2HJ6I6XLIGRZDB27HRHIVQC3RXNZAY4/
(No promises that I'll do the work of course, but I have high hopes)

@sobolevn
Copy link

sobolevn commented Mar 4, 2021

This issue is a blocker for us:

  • We are trying to integrate libcst into wemake-python-styleguide which supports all versions from 3.6
  • We even have a PR ready: Issue #1140 wemake-services/wemake-python-styleguide#1147
  • But, since 3.9+ is not supported, we have to reconsider this idea
  • The same is true for several other projects of mine which potentially could use libcst for code auto-formatting 😞

Are there any updates? Is there anything I can help with?

@zsol
Copy link
Member

zsol commented May 25, 2021

No updates yet apart from some of us hacking around on our spare time to get LibCST onto a peg parser. I'll update here as soon as work starts on this in earnest. I'm hoping for good news in a month or so.

@zsol
Copy link
Member

zsol commented Jun 25, 2021

Good news! :) I'm working on this as part of my main job now. The current status is: in my branch there's a rust implementation of a peg parser for a small part of the python grammar, but complete with whitespace handling (thanks to the awesome work on whitespace parsing and tokenization by @bgw). It can successfully roundtrip simple functions like this without any extra string allocations (the built CST shares memory with the input source string), and it seems to be real fast (it will be interesting to see if this holds as I add more and more of the grammar).

I've opted for implementing the heavy lifting in Rust, and then expose the resulting CST as Python objects as part of a relatively high level interface that will be compatible with https://github.com/Instagram/LibCST/blob/master/libcst/_parser/entrypoints.py
I'm hoping this will let us get significant speedups in both CPU and memory usage while being memory-safe.

There's still lots of work to do, for example:

  • better error messages
  • comprehensive testing
  • more grammar
  • wrap the API to be usable from Python

I'll keep updating this issue as I go along

@zsol
Copy link
Member

zsol commented Aug 7, 2021

Here's the current status of the rewrite: I think I have 99% of the Python 3.8 grammar implemented, and the parser can roundtrip (parse -> serialize losslessly; i.e. input bytes are same as output bytes) all of the LibCST python implementation (i.e. this repo). This is a big milestone but there's still some work to be done. But first, some details in case you want to play around:

The code currently lives still in https://github.com/zsol/LibCST/tree/parser/native, you should be able to cargo build in that directory to get a debug build. This will put a binary called parser (parser.exe on Windows) in target/debug/ which

  • accepts python code on stdin, then parses it into a CST and serializes it back onto stdout
  • optionally (when given the -d flag) dumps the internal CST representation before the generated source - this should look familiar if you've used python -m libcst.tool print before
  • if the first argument is -n it doesn't output anything (except parse errors), just parses the input

Here's what I know still needs to be done:

  • wrap the API in a thin python layer to be usable from LibCST
  • hunt down bugs in the grammar by parsing all the python source code ever
  • implement missing grammar features like match statement, walrus, or parenthesis around with statements
  • there's a bug with whitespace parsing around the end of multiple nested indented blocks; this results in duplicate whitespace in certain situations - but not for any of the LibCST source code apparently
  • figure out how to make a LibCST release that can switch between the new and old parser backends
  • CI for the rust code (@lpetre is helping with this)

And then on top of this it'd be nice to look into:

  • Perf optimizations. I haven't done any :)
  • Make the parser error-recoverable; that is, for a syntactically invalid Python document, try and produce something that resembles a CST and is usable. This would be great for interactive use cases like IDE linting. (@manav-a is helping with this)
  • Improve the parser error messages. They are really basic at the moment, just outputting expected characters at the furthest parsed position.
    • There has been some great work in CPython recently around this front, and we should be able to port most of it over easily, if rust-peg can be enhanced to support early termination

@lpetre
Copy link
Contributor

lpetre commented Aug 10, 2021

CI for the rust code

I've setup an example using github actions here: lpetre@1089e28

Sample run: https://github.com/lpetre/LibCST/runs/3293881094

@Luttik
Copy link

Luttik commented Dec 5, 2021

Hi, What is the advice for libraries that depend on this project? Can we expect 3.10 support in the near future or should people start looking for alternatives? It seems like Black has solved this with their lib2to3 fork.

@zsol
Copy link
Member

zsol commented Dec 5, 2021

I plan to release a version with opt-in 3.10 support by the end of the year. If that goes well, a new release in January will have 3.10 support by default.

@zsol
Copy link
Member

zsol commented Dec 16, 2021

Progress update: I have a working CI job that produces binary wheels for LibCST with the new rust-based parser for any combination of python (3.6, 3.7, 3.8, 3.9, 3.10) & (macos x86_64, macos arm64, 32bit linux, 64bit linux, 32bit windows, 64bit windows).

Example artifacts at https://github.com/zsol/LibCST/suites/4682738347/artifacts/127571774 (it's a zipfile with wheels in it).
You can pip install the relevant wheel from the zip on the above platforms and run

LIBCST_PARSER_TYPE=native python -c 'import libcst; print(libcst.parse_module("foo(a, b)"))'

To see the new parser in action. Note: match statement is not implemented yet in these.

@zsol
Copy link
Member

zsol commented Dec 17, 2021

#566 proposes to merge the Rust-based parser. After merging I'll follow up with (much simpler) PRs to implement match statement, and parenthesized context managers.

@zsol zsol linked a pull request Jan 12, 2022 that will close this issue
@zsol zsol closed this as completed in #597 Jan 12, 2022
@FFY00
Copy link

FFY00 commented Jan 12, 2022

Seems like the README is in order to be updated! 🎉

@zsol
Copy link
Member

zsol commented Jan 12, 2022

Note that for 3.10 (and 3.11) support, you still need to set LIBCST_PARSER_TYPE=native env var for now. We'll be removing this restriction in a future release (soon (tm)).

@Zac-HD
Copy link
Contributor

Zac-HD commented Apr 29, 2023

  • It's been a while - any plan for when the native parser will become the default?

  • I've discovered a fair few bugs while using it, and could take a day to do a more thorough search if there are resources to fix them. What's the maintenance status of libcst, and is this worth trying?

@tusharsadhwani
Copy link

  • Unsure on when rust parser will become the default, but export LIBCST_PARSER_TYPE=native is ok for now.
  • I haven't faced any bugs in practice with libcst on the latest release. Can you mention what bugs? Have you created github issues for them?

@Zac-HD
Copy link
Contributor

Zac-HD commented Apr 29, 2023

@zsol
Copy link
Member

zsol commented May 25, 2023

#929 is landing in a moment, after which I'll release it with a major version bump 🎉

@zsol zsol unpinned this issue May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants