Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regrtest collect test line coverage using sys.monitoring #110722

Open
ambv opened this issue Oct 11, 2023 · 17 comments
Open

Make regrtest collect test line coverage using sys.monitoring #110722

ambv opened this issue Oct 11, 2023 · 17 comments
Assignees
Labels
tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@ambv
Copy link
Contributor

ambv commented Oct 11, 2023

Feature or enhancement

Currently, there is no good way to collect test line coverage of the standard library.

python -m test -T

python -m test -T is using sys.settrace through the trace module, which slows things down significantly. Moreover, it is set relatively late, after a ton of standard library modules were already imported, making collecting full coverage from them hard. What's more, this way of collecting coverage doesn't work with -j, and we want to switch to -j0 by default in the test runner since the battle for tests to be fully idempotent is impossible.

coverage.py?

We had a hack in coverage.py called "fullcoverage", which mimicked the encodings module to install early, but it was recently removed because it didn't work in 3.13 and we haven't been tracking coverage in CI since 2021.

Use sys.monitoring

I want to use PEP 669 for this because it's faster and works across threads, which will become relevant with PEP 703. The idea is to gather partial coverage in each regrtest worker subprocess and then combine the results through the main process when tests are done.

Set up monitoring before site.py

To gather coverage, we need to register a callback for the line event with sys.monitoring. Moreover, to get reliable data for this, this gathering needs to start very early in the interpreter, before site.py executed, as that triggers a large amount of standard library imports.

This will need a small addition to the interpreter environment, an environment variable called PYTHONPRESITE, which takes a module name. Setting it makes the interpreter attempt to import this module before site.py and the __main__ module are created. To allow for input/output, the module is imported after encodings were loaded, stdout/stderr/stdin have been set, and open() was exposed. The module is not considered __main__, and stays imported after loading, making it possible for tooling loaded later to use data gathered by it.

Why even gather coverage?

I don't plan to beat people with a stick because line coverage fell from 81% to 79%.

There's been a lot of churn lately in terms of test refactors, and it's not entirely clear while reviewing those changes whether we are still running the same checks as before. In effect, I don't feel very confident approving such large changes with rearranged TestCase base classes. In those cases, I run coverage.py with some hacks to confirm that we're not losing coverage with the change, but ultimately this is slow, painful to do, and still generates data that isn't trustworthy.

An easy way to get simple test line coverage stats without installing anything would solve my issue, and hopefully, more people would then use it.

Linked PRs

@ambv ambv added type-feature A feature request or enhancement tests Tests in the Lib/test dir labels Oct 11, 2023
@ambv ambv self-assigned this Oct 11, 2023
@nedbat
Copy link
Member

nedbat commented Oct 11, 2023

Is there a possibility to use coverage.py with PEP 669 support, or are these independent concerns?

@ambv
Copy link
Contributor Author

ambv commented Oct 11, 2023

Thanks for reaching out, @nedbat. Currently the way I see it is that it will be a little awkward to make coverage.py support our unique use case.

For this usage internal to CPython, we would need coverage.py to support PYTHONPRESITE when that's implemented. That env variable is supposed to be official support for what Brett's encoding hack was trying to achieve in the past in fullcoverage. Maybe this environment variable will be generally useful for coverage.py too, but I guess most people now run with coverage run, which circumvents the import ordering issue for everyone except for the standard library itself, because coverage.py obviously imports from it. However, for CPython to be able to use a hook provided by coverage.py, it would have to be a minimal hook. Something like coverage.process_startup() won't work because that's part of the coverage package, which:

  • imports sub-packages of coverage from python3.x/site-packages/, which won't be available yet;
  • imports standard libraries, which we want to avoid.

The other thing I need to support here is support for partial coverage gathering from worker processes, and combining results after the fact. regrtest is our custom runner that doesn't use multiprocessing, so coverage.py won't support that natively unless we add that. In that case, the two implementations are again tied together, which I'm not certain will work out long-term.

I thought that maybe it will be easier for both sides if we meet in the middle and I make regrtest generate files that coverage.py can read and combine. Now I see the SQLite schema is documented but this includes the sentence:

The exact meaning of the bytes in the blobs should be considered an implementation detail that might change in the future.

I don't want to set this format in stone just for internal CPython use's sake.

Correct me if I'm wrong here, but my current conclusion is that the use case of gathering coverage for user code is quite different from gathering coverage for the standard library itself.

@nedbat
Copy link
Member

nedbat commented Oct 11, 2023

Thanks for thinking this through, and digging into coverage.py details. The SQLite schema wasn't designed for others to create, but the CoverageData object was. Meeting in the middle might mean writing your own file in whatever format is convenient, then as an after-step reading those files and using CoverageData to create coverage.py-compatible files.

@gaogaotiantian
Copy link
Member

It would be nice to have a webpage like codecov.io to show coverage of the stdlib at least for main branch. There are new contributors that want to start with improving test coverage but don't know where to start. It's also helpful for library maintainers to take a quick look.

Speaking of which, this might conflict with the existing trace/profile/monitoring tests.

@terryjreedy
Copy link
Member

terryjreedy commented Oct 11, 2023

I don't plan to beat people with a stick because line coverage fell from 81% to 79%.

Changing in the relative coverage is only properly interpreted by itself, without looking at the line-by-line coverage report, if the denominator is unchanged, which is to say, if the PR only changes tests and not substantive code. But in that special case, any decrease is bad* because the substantive coverage must have gone down. And one could argue that uncovering some lines while covering other is not great even if the total number covered has a net increase.

*EDIT An exception would be if a flaky test with false failures is disabled. I would call this an unfortunate necessity.

The problem with running coverage on all PRs, as once done, is that code changes can introduce spurious ratio changes. If code is merely reformated, with no test changes, any coverage change has no meaning. If a completely covered but buggy function is fixed and the result has fewer lines (and maybe a new test of the function is added), coverage will go down if there are uncovered lines elsewhere, even though code quality and effective coverage has gone up. Ditto if a function is sped up with fewer lines.

@encukou
Copy link
Member

encukou commented Oct 12, 2023

Per https://discuss.python.org/t/35180, you might want to name it PYTHON_PRESITE.

But the use case seems too specialized. It practically invites people to use it for other reasons, for which it would e.g. be better to putting the modules in __main__ as if they were imported.
Maybe it should be -X presite?

@gpshead
Copy link
Member

gpshead commented Oct 12, 2023

Yeah, I don't want other (ab)uses of such a feature.

My initial gut feeling leans towards something that isn't a feature anyone else could use and just an internal implementation detail for us alone (perhaps debug builds only?) that would have us capture bare bones execution data while executing before the time that could be handed off once the interpreter is up and running sufficiently for coverage to load and make the one time API call to get such early-startup coverage data to seed its own stats with.

I'm clearly thinking non-general purpose startup coverage specific not a line callback stuff here. I'm not sure I like this idea. But I never said my gut brainstorms were smart.

@ambv
Copy link
Contributor Author

ambv commented Oct 12, 2023

Good point about calling it PYTHON_PRESITE. Adding -X presite=package.module is also a good idea, there are existing -X options with values. I'll have to check what are the precedence rules then.

If we're worried about abuse, making this only work with pydebug builds is fine by me. In practice coverage with or without pydebug can have subtle differences, I'll measure that but I don't expect the differences to be large enough for us to care.

ambv added a commit to ambv/cpython that referenced this issue Oct 12, 2023
… before site.py is run

This is only available --with-pydebug.
@nedbat
Copy link
Member

nedbat commented Oct 12, 2023

capture bare bones execution data while executing before the time that could be handed off once the interpreter is up and running sufficiently for coverage to load and make the one time API call to get such early-startup coverage data to seed its own stats with.

This is how the "fullcoverage" part of coverage.py worked.

@FFY00
Copy link
Member

FFY00 commented Oct 13, 2023

@ambv, would it be feasible for the regrtest implementation to run the interpreter with -S, disabling the import site during initialization, setup sys.monitoring and whatever else needed, and then finally just import site manually?

ambv added a commit to ambv/cpython that referenced this issue Oct 13, 2023
@ambv
Copy link
Contributor Author

ambv commented Oct 13, 2023

We could do that but this doesn't cleanly solve the issue because it would require regrtest to be restructured to not initially import anything from the stdlib, and to perform the site.py import if -S is used.

PYTHON_PRESITE is a clean pydebug-only solution for this that guarantees that it happens before almost all standard library imports. The only non-C sys.modules members that are populated at this time are 'builtins', 'sys', 'codecs', 'encodings', 'io', 'posix', 'marshal', and 'abc'. We won't be able to reach this minimal set with other hacks. In fact, PYTHON_PRESITE could be imported even sooner, before encodings are initialized, but that would take away its ability to open files and use standard input/output. I think that's taking it a little too far.

@FFY00
Copy link
Member

FFY00 commented Oct 13, 2023

That makes sense, thank you.

I'd consider dropping the "site" reference, in favor of something like "preload", though, as this doesn't really have any ties to site, and the module is loaded regardless if site is imported or not.

@gaogaotiantian
Copy link
Member

Hi @ambv , are you working on the actual coverage part? It seems like the module needs to be standalone (not importing other modules to maximize the coverage) so it probably won't be part of the libregtest? Are we talking about a new compact coverage module under Lib/? Will it be public to replace part of the old trace? I can help with the coverage part if needed :)

@ambv
Copy link
Contributor Author

ambv commented Oct 26, 2023

I already have the coverage gathering. It's a tiny module ran with PRESITE. Now I'm working on accumulation of results from worker processes to a form that existing trace understands so I don't have to rewrite that part.

My ambition isn't to make a reusable public-facing coverage gatherer. This is specifically about stdlib coverage needs.

@gaogaotiantian
Copy link
Member

I already have the coverage gathering. It's a tiny module ran with PRESITE. Now I'm working on accumulation of results from worker processes to a form that existing trace understands so I don't have to rewrite that part.

My ambition isn't to make a reusable public-facing coverage gatherer. This is specifically about stdlib coverage needs.

Sounds good. So we don't have any plans to replace/renovate the trace module. I guess not a lot of users are using it as coverage.py is pretty good on that front.

@ambv
Copy link
Contributor Author

ambv commented Oct 26, 2023

Exactly right. Any new user-facing profiler, debugger, coverage gatherer should be started on PyPI.

@hugovk
Copy link
Member

hugovk commented Oct 26, 2023

Relatedly, we have this page on the devguide: https://devguide.python.org/testing/coverage/

It starts of with useful general info about the benefits of coverage, then goes into specifics of measuring coverage. Are the specifics now out of date and need removing? Or shall we wait until we have something new to replace it with?

ambv added a commit to ambv/cpython that referenced this issue Nov 6, 2023
ambv added a commit that referenced this issue Nov 10, 2023
Now all results from worker processes are aggregated and
displayed together as a summary at the end of a regrtest run.

The traditional trace is left in place for use with sequential
in-process test runs but now raises a warning that those
numbers are not precise.

`-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…1710)

Now all results from worker processes are aggregated and
displayed together as a summary at the end of a regrtest run.

The traditional trace is left in place for use with sequential
in-process test runs but now raises a warning that those
numbers are not precise.

`-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…1710)

Now all results from worker processes are aggregated and
displayed together as a summary at the end of a regrtest run.

The traditional trace is left in place for use with sequential
in-process test runs but now raises a warning that those
numbers are not precise.

`-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

8 participants