Libraries and "Standardization" #25

sigmavirus24 · 2018-04-10T11:22:36Z

Hi all,

We recently had PyCQA/pydocstyle#307 and PyCQA/pycodestyle#739 opened on two projects looking for a new feature that already exists in pylint. It got me thinking. We're a loose coalition of folks with the same mission building similar tools and with more-or-less the same audience. Is there value in us producing some amount of documentation (a little or a lot, I don't really mind either way) around best practices for command-line options, and other behaviours? Should we be aggressively creating libraries to help each other deal with things like this? For example, pycodestyle, pydocstyle, and Flake8 all handle # noqa and they all do so a little differently. Should that be a distinct library or should it be part of a bigger one? Are the subtle differences there fine? Pylint has a far more advanced way of disabling checks, should we all be moving to that? Do people think this conversation is even worth having?

cc @PyCQA/pylint-dev @PyCQA/pydocstyle-dev @PyCQA/pep8-dev

The text was updated successfully, but these errors were encountered:

AWhetter · 2018-04-10T20:57:01Z

Collaborating more seems like a great idea to me. I think that the tools that we're making are used together so often that striving for some shared documentation and consistency across the tools makes sense to me. Plus, as you say, we're aiming for the same audience.

One thing to include is that we could give a comparison of each tool. What is their aim and what are they better at compared to the other tools? This would help users think about which tools they want to use for their purpose, and would also help us, as developers, to think more carefully about "is this feature appropriate for this tool" and "is this feature better suited in another tool"?

There's a certain amount of overlap in what each tool provides (in terms of the messages that they output at least) and either reducing that or at least documenting it seems like a good idea to me. If someone is using a combination of PyCQA tools, they'll want to know what messages to turn off so that they aren't getting duplicates.

sigmavirus24 · 2018-04-10T22:58:59Z

I think it would also be helpful for us to share more general learnings too. For example, I was talking to @ambv about black configuration and shared the pain that has been managing the precedence/merging/discovery of files for pycodestyle and flake8. We could generate advice to new PyCQA (or PyCQA-alike) projects to restrain themselves to 1 project file that they read to manage things.

It's also looking like the community is moving towards tooling being configured in pyproject.toml. As a community of tools, how quickly should we adopt that for our own configuration? Does that mean we can drop the older files we were supporting?

PCManticore · 2018-04-17T15:53:02Z

This definitely sounds like a good idea, and would enforce more the concept of PyCQA as a community rather than a loose group with tangentially related interests on code quality.

I also agree with Ashley on having a comparison for each tool, so that the users could pick the right tool for the job. I also have a theory that overall it would be better for Python to just bring everything together under the same umbrella, with the user deciding what to needs from a code quality tool, be it formatting checks or more advanced checks. We have now both flake8 and prospector, and there's also pylint which does some additional stuff, couldn't just we have one code quality tool to rally behind for the Python world? Like the aforementioned tool, in my vision I imagine a third one (or maybe just prospector), from which an user could request either formatting (be it black or just what flake8 currently does), or either to find subtle bugs (pylint) or to refactor some files (redbaron) or some other category of code quality assurances that I'm missing.

For something actionable, we could decide as a group what libraries do we actually want to reuse? At least from pylint's perspective, some sort of tree library that could work on both Python 2 and 3 would be something that we want, and if it had refactoring support, that would also be amazing, as we could provide automated refactoring for error messages. As it is right now, using the builtin ast limits us in this regard. The same question put in different would be what things from pylint you might find useful as a library, that you can reuse in other projects?

sigmavirus24 · 2018-04-18T01:47:26Z

This definitely sounds like a good idea, and would enforce more the concept of PyCQA as a community rather than a loose group with tangentially related interests on code quality.

Right. I think I was hesitant to encourage that to start with because OpenStack started that way and became a clumsy, lumbering group with a lot of politics. I wanted us to avoid those politics but in reality, we share a lot of things anyway.

I also agree with Ashley on having a comparison for each tool, so that the users could pick the right tool for the job.

I agree. Something like compare.pycqa.org would be interesting. We should start a separate discussion of that to figure out how we want to compare things.

I also have a theory that overall it would be better for Python to just bring everything together under the same umbrella, with the user deciding what to needs from a code quality tool, be it formatting checks or more advanced checks.

I agree. I've been thinking about that. I've also been thinking about the fact that Python suffers from the fact that we don't have a tool like gofmt that is the standard. If we could all band together, possibly even with @ambv, we could do that.

For something actionable, we could decide as a group what libraries do we actually want to reuse? At least from pylint's perspective, some sort of tree library that could work on both Python 2 and 3 would be something that we want, and if it had refactoring support, that would also be amazing, as we could provide automated refactoring for error messages.

By tree, you mean something to provise the AST right? I think it's hard to find a good one. There's jedi, ast, baron, and redbaron but the one that seems to work best w/r/t refactoring is lib2to3 which I've heard mixed reviews about.

As it is right now, using the builtin ast limits us in this regard.

It also limits whether Flake8 would be able to refactor/reformat comments and other things that aren't included in an AST by default

The same question put in different would be what things from pylint you might find useful as a library, that you can reuse in other projects?

That's a good question. While Flake8 runs pylint against itself, I don't franky know pylint well enough to know what it could use.

The-Compiler · 2018-04-18T05:27:53Z

Especially for the AST part, also see the discussion at davidhalter/jedi#630 - i.e. parso which was pulled out from jedi by @davidhalter.

sigmavirus24 · 2018-04-18T13:27:25Z

I'd be happy to adopt that in Flake8. It would possibly require some effort to make PyFlakes happy using that as an AST though.

davidhalter · 2018-04-19T08:16:45Z

By tree, you mean something to provise the AST right? I think it's hard to find a good one. There's jedi, ast, baron, and redbaron but the one that seems to work best w/r/t refactoring is lib2to3 which I've heard mixed reviews about.

parso was pulled out of jedi and is now a separate project. The parser is a fork of lib2to3 with most of the parser generator's code base still matching lib2to3's parser. I guess the idea was to give it a better API and some more features. The only thing that's lacking there at the moment is that it's a bit annoying to refactor comments. Since whitespace/comments are preserved, it's of course possible, but not as convenient as it could be.

parso is pretty much the only tool around that works with refactoring and Python 2 & 3 support (it's debatable though if Python 2 support is still that important at this point for code analysis tools in the future).

parso is by the way also able to show you multiple syntax errors in your files. This could come in really handy for something like pyflakes. The only real issue I see with it is that it's significantly slower than ast, because it's written in Python not C.

ambv · 2018-04-19T15:28:51Z

Do you keep a list of changes from lib2to3? Are you still looking at development that happens on lib2to3?

Black also has a vendored version of lib2to3 but I only do that because my patches to lib2to3 often will only be present in 3.6.6 or 3.7.0 or sometimes go straight to 3.8 which puts them years away from users' hands.

But Black does upstream all patches. Currently we have a few things that aren't yet but only since they don't apply cleanly so it's a bit more work (like Unicode identifiers support, eh?).

So it seems basing off of lib2to3 is a common thing that we could standardize. I would encourage you to not deviate from from the version in cpython master too much so that you can at least accept upstream changes to the grammar and the tokenizer. I hate it when third-party parsers are unable to parse latest bits of Python grammar because the authors weren't aware of them.

As for me, I am planning to document lib2to3 for Python 3.7.0 so that parts of its API at least will become "stable" and more than just a 2to3 implementation detail. I'll be maintaining lib2to3 in the standard library going forward. Would be great for parso to upstream some of its improvements and start tracking lib2to3's updates to the grammar and the tokenizer.

@sigmavirus24, lib2to3's pytree is a CST, not an AST. While you can totally write a linter on top of it, it's going to be more annoying for people to use because low-level implementation details bleed through to the tree. You will see naked grammar's nodes and not the abstractions that an AST gives you. This might or might not be worthwhile.

On the one hand, using this CST will unbundle the version of Python you are able to analyze from the version of Python flake8 runs within, which was always my biggest complaint. You will also be able to warn on things related to dangerous usage of commas (like one-tuples without parentheses or missing commas in multiline lists of strings).

On the other hand, this is not backwards compatible. The tree is way more verbose. The parser is slower. pyflakes would need to change. Plugins would need to change. Authors would for sure cry out for docs.

davidhalter · 2018-04-19T23:14:27Z

Do you keep a list of changes from lib2to3? Are you still looking at development that happens on lib2to3?

No I don't. It was a hard fork. However the parser generator was written by Guido and is pretty much finished. It's not that I copied all of lib2to3. I just copied pgen2. That's the part that doesn't need changes.

The only big change is probably how I parse f-strings (however if you guys don't like it we can fall back to the old way). But then again this was not scope for lib2to3, f-strings are simply tokens and only relevant in the Python 3 part. Also lib2to3 is using a kind of weird grammar that allows both Python 2 and Python 3 code to be parsed IIRC. parso either parses Python 2.7 code or one of the Python 3 releases.

Another thing that was added to parso is error recovery. This is necessary for completions and other IDE stuff. It's pretty neat, you can turn it on or off and it's not really a modification of pgen2.

I would encourage you to not deviate from from the version in cpython master too much so that you can at least accept upstream changes to the grammar and the tokenizer.

Well I think that has already happened. I think the only changes that are a bit hard to port are the tokenizer changes. But then again, I don't really care, the changes there are usually so minimal and only happen every 1.5 years (or even less). parso still uses pretty much the original grammar files so we're fine there.

Would be great for parso to upstream some of its improvements and start tracking lib2to3's updates to the grammar and the tokenizer.

The problem is probably especially the tokenizer. parso's tokenizer allows tokenizing different Python versions in one version. I have removed the potential errors that the tokenizer generates. It's now all error tokens and there's no exceptions anymore. I think this is way better and Python should probably also move into that direction. Have a look here: https://github.com/davidhalter/parso/blob/fa0bf4951c329a2ebdd60dbeabe070ce24f4d272/parso/python/tokenize.py (I'm intentionally showing you an older version, to avoid the f-string shenanigans parso's now doing there).

ambv · 2018-04-20T23:52:03Z

The only big change is probably how I parse f-strings

In fact, I would prefer f-strings to be proper nodes in lib2to3. I made them a stupid string prefix to bandaid parse failures after 3.6's release. For a CST it would make a lot of sense to return proper nodes for formatted values within f-strings. This is unfortunately too late for Python 3.7 but would be a worthwhile contribution for 3.8.

All in all, I'm pretty impressed with Parso and some of its behavior should find its way to lib2to3. Given a higher-level "AST-like" API over it will be a good match for Flake8 and enable writing back.

davidhalter · 2018-04-22T17:06:53Z

In fact, I would prefer f-strings to be proper nodes in lib2to3. I made them a stupid string prefix to bandaid parse failures after 3.6's release. For a CST it would make a lot of sense to return proper nodes for formatted values within f-strings. This is unfortunately too late for Python 3.7 but would be a worthwhile contribution for 3.8.

100% agreed. The ast.c implementation is just suboptimal. I was thinking about contributing this back to CPython. I'm just a bit scared of patching the C code of the tokenizer. It's quite a big modification (being more aware of the context). Do you think a PEP would be needed or is this a modification that could just happen?

All in all, I'm pretty impressed with Parso and some of its behavior should find its way to lib2to3.

Thanks! I really hope we can get some changes back. IMO I especially would like to replace the tokenizer with parso's, but that might be a bit difficult considering backwards compatibility and stuff. I have had quite a few discussions with people using the tokenizer that were unhappy with both the "StringIO.readline" API and the errors raised in some rare cases when code was invalid.

So consider this an offer. I'm happy to contribute things back, I'm just not really well connected with Core Devs and then there's also my "own" projects that use quite a bit of my time. We could also have a quick call.

ambv · 2018-04-22T17:25:32Z

Do you think a PEP would be needed or is this a modification that could just happen?

No PEP needed, just a bugs.python.org issue. Please consider doing it! I'll help you with the implementation and review. I patched the current C implementation's lineno, column reporting so I'm aware it's a big hack. What I meant also is that lib2to3 implements f-strings simply as a string prefix. At the moment the tree doesn't include formatted values at all. Replacing this behavior with what you're doing in Parso would make a lot of sense for Python 3.8.

@davidhalter, speaking of contributing the new tokenizer wholesale, yeah, it's going to be problematic due to backwards compatibility and that would require a bit more discussion. I know you're in Switzerland but will you be at PyCon US by any chance? In any case, I'm @llanga on Twitter, let's schedule further collaboration there. I hijacked this issue enough (sorry @sigmavirus24!).

sigmavirus24 · 2018-04-22T19:38:57Z

I'd be happy to create a AST/CST PyCQA team so you could take advantage of GitHub discussions if you'd all like or you could take this to [email protected]

sigmavirus24 mentioned this issue Sep 9, 2020

How do projects join? #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Libraries and "Standardization" #25

Libraries and "Standardization" #25

sigmavirus24 commented Apr 10, 2018

AWhetter commented Apr 10, 2018

sigmavirus24 commented Apr 10, 2018

PCManticore commented Apr 17, 2018

sigmavirus24 commented Apr 18, 2018

The-Compiler commented Apr 18, 2018 •

edited

Loading

sigmavirus24 commented Apr 18, 2018

davidhalter commented Apr 19, 2018

ambv commented Apr 19, 2018

davidhalter commented Apr 19, 2018

ambv commented Apr 20, 2018

davidhalter commented Apr 22, 2018

ambv commented Apr 22, 2018

sigmavirus24 commented Apr 22, 2018

Libraries and "Standardization" #25

Libraries and "Standardization" #25

Comments

sigmavirus24 commented Apr 10, 2018

AWhetter commented Apr 10, 2018

sigmavirus24 commented Apr 10, 2018

PCManticore commented Apr 17, 2018

sigmavirus24 commented Apr 18, 2018

The-Compiler commented Apr 18, 2018 • edited Loading

sigmavirus24 commented Apr 18, 2018

davidhalter commented Apr 19, 2018

ambv commented Apr 19, 2018

davidhalter commented Apr 19, 2018

ambv commented Apr 20, 2018

davidhalter commented Apr 22, 2018

ambv commented Apr 22, 2018

sigmavirus24 commented Apr 22, 2018

The-Compiler commented Apr 18, 2018 •

edited

Loading