Make regrtest collect test line coverage using sys.monitoring #110722

ambv · 2023-10-11T15:05:21Z

Feature or enhancement

Currently, there is no good way to collect test line coverage of the standard library.

`python -m test -T`

python -m test -T is using sys.settrace through the trace module, which slows things down significantly. Moreover, it is set relatively late, after a ton of standard library modules were already imported, making collecting full coverage from them hard. What's more, this way of collecting coverage doesn't work with -j, and we want to switch to -j0 by default in the test runner since the battle for tests to be fully idempotent is impossible.

coverage.py?

We had a hack in coverage.py called "fullcoverage", which mimicked the encodings module to install early, but it was recently removed because it didn't work in 3.13 and we haven't been tracking coverage in CI since 2021.

Use `sys.monitoring`

I want to use PEP 669 for this because it's faster and works across threads, which will become relevant with PEP 703. The idea is to gather partial coverage in each regrtest worker subprocess and then combine the results through the main process when tests are done.

Set up monitoring before `site.py`

To gather coverage, we need to register a callback for the line event with sys.monitoring. Moreover, to get reliable data for this, this gathering needs to start very early in the interpreter, before site.py executed, as that triggers a large amount of standard library imports.

This will need a small addition to the interpreter environment, an environment variable called PYTHONPRESITE, which takes a module name. Setting it makes the interpreter attempt to import this module before site.py and the __main__ module are created. To allow for input/output, the module is imported after encodings were loaded, stdout/stderr/stdin have been set, and open() was exposed. The module is not considered __main__, and stays imported after loading, making it possible for tooling loaded later to use data gathered by it.

Why even gather coverage?

I don't plan to beat people with a stick because line coverage fell from 81% to 79%.

There's been a lot of churn lately in terms of test refactors, and it's not entirely clear while reviewing those changes whether we are still running the same checks as before. In effect, I don't feel very confident approving such large changes with rearranged TestCase base classes. In those cases, I run coverage.py with some hacks to confirm that we're not losing coverage with the change, but ultimately this is slow, painful to do, and still generates data that isn't trustworthy.

An easy way to get simple test line coverage stats without installing anything would solve my issue, and hopefully, more people would then use it.

Linked PRs

The text was updated successfully, but these errors were encountered:

nedbat · 2023-10-11T15:07:04Z

Is there a possibility to use coverage.py with PEP 669 support, or are these independent concerns?

ambv · 2023-10-11T15:44:58Z

Thanks for reaching out, @nedbat. Currently the way I see it is that it will be a little awkward to make coverage.py support our unique use case.

For this usage internal to CPython, we would need coverage.py to support PYTHONPRESITE when that's implemented. That env variable is supposed to be official support for what Brett's encoding hack was trying to achieve in the past in fullcoverage. Maybe this environment variable will be generally useful for coverage.py too, but I guess most people now run with coverage run, which circumvents the import ordering issue for everyone except for the standard library itself, because coverage.py obviously imports from it. However, for CPython to be able to use a hook provided by coverage.py, it would have to be a minimal hook. Something like coverage.process_startup() won't work because that's part of the coverage package, which:

imports sub-packages of coverage from python3.x/site-packages/, which won't be available yet;
imports standard libraries, which we want to avoid.

The other thing I need to support here is support for partial coverage gathering from worker processes, and combining results after the fact. regrtest is our custom runner that doesn't use multiprocessing, so coverage.py won't support that natively unless we add that. In that case, the two implementations are again tied together, which I'm not certain will work out long-term.

I thought that maybe it will be easier for both sides if we meet in the middle and I make regrtest generate files that coverage.py can read and combine. Now I see the SQLite schema is documented but this includes the sentence:

The exact meaning of the bytes in the blobs should be considered an implementation detail that might change in the future.

I don't want to set this format in stone just for internal CPython use's sake.

Correct me if I'm wrong here, but my current conclusion is that the use case of gathering coverage for user code is quite different from gathering coverage for the standard library itself.

nedbat · 2023-10-11T16:10:05Z

Thanks for thinking this through, and digging into coverage.py details. The SQLite schema wasn't designed for others to create, but the CoverageData object was. Meeting in the middle might mean writing your own file in whatever format is convenient, then as an after-step reading those files and using CoverageData to create coverage.py-compatible files.

gaogaotiantian · 2023-10-11T18:07:05Z

It would be nice to have a webpage like codecov.io to show coverage of the stdlib at least for main branch. There are new contributors that want to start with improving test coverage but don't know where to start. It's also helpful for library maintainers to take a quick look.

Speaking of which, this might conflict with the existing trace/profile/monitoring tests.

terryjreedy · 2023-10-11T18:43:24Z

I don't plan to beat people with a stick because line coverage fell from 81% to 79%.

Changing in the relative coverage is only properly interpreted by itself, without looking at the line-by-line coverage report, if the denominator is unchanged, which is to say, if the PR only changes tests and not substantive code. But in that special case, any decrease is bad* because the substantive coverage must have gone down. And one could argue that uncovering some lines while covering other is not great even if the total number covered has a net increase.

*EDIT An exception would be if a flaky test with false failures is disabled. I would call this an unfortunate necessity.

The problem with running coverage on all PRs, as once done, is that code changes can introduce spurious ratio changes. If code is merely reformated, with no test changes, any coverage change has no meaning. If a completely covered but buggy function is fixed and the result has fewer lines (and maybe a new test of the function is added), coverage will go down if there are uncovered lines elsewhere, even though code quality and effective coverage has gone up. Ditto if a function is sped up with fewer lines.

encukou · 2023-10-12T06:11:26Z

Per https://discuss.python.org/t/35180, you might want to name it PYTHON_PRESITE.

But the use case seems too specialized. It practically invites people to use it for other reasons, for which it would e.g. be better to putting the modules in __main__ as if they were imported.
Maybe it should be -X presite?

gpshead · 2023-10-12T08:14:57Z

Yeah, I don't want other (ab)uses of such a feature.

My initial gut feeling leans towards something that isn't a feature anyone else could use and just an internal implementation detail for us alone (perhaps debug builds only?) that would have us capture bare bones execution data while executing before the time that could be handed off once the interpreter is up and running sufficiently for coverage to load and make the one time API call to get such early-startup coverage data to seed its own stats with.

I'm clearly thinking non-general purpose startup coverage specific not a line callback stuff here. I'm not sure I like this idea. But I never said my gut brainstorms were smart.

ambv · 2023-10-12T08:29:22Z

Good point about calling it PYTHON_PRESITE. Adding -X presite=package.module is also a good idea, there are existing -X options with values. I'll have to check what are the precedence rules then.

If we're worried about abuse, making this only work with pydebug builds is fine by me. In practice coverage with or without pydebug can have subtle differences, I'll measure that but I don't expect the differences to be large enough for us to care.

… before site.py is run This is only available --with-pydebug.

nedbat · 2023-10-12T23:22:11Z

capture bare bones execution data while executing before the time that could be handed off once the interpreter is up and running sufficiently for coverage to load and make the one time API call to get such early-startup coverage data to seed its own stats with.

This is how the "fullcoverage" part of coverage.py worked.

FFY00 · 2023-10-13T08:26:35Z

@ambv, would it be feasible for the regrtest implementation to run the interpreter with -S, disabling the import site during initialization, setup sys.monitoring and whatever else needed, and then finally just import site manually?

ambv · 2023-10-13T09:19:13Z

We could do that but this doesn't cleanly solve the issue because it would require regrtest to be restructured to not initially import anything from the stdlib, and to perform the site.py import if -S is used.

PYTHON_PRESITE is a clean pydebug-only solution for this that guarantees that it happens before almost all standard library imports. The only non-C sys.modules members that are populated at this time are 'builtins', 'sys', 'codecs', 'encodings', 'io', 'posix', 'marshal', and 'abc'. We won't be able to reach this minimal set with other hacks. In fact, PYTHON_PRESITE could be imported even sooner, before encodings are initialized, but that would take away its ability to open files and use standard input/output. I think that's taking it a little too far.

FFY00 · 2023-10-13T09:31:55Z

That makes sense, thank you.

I'd consider dropping the "site" reference, in favor of something like "preload", though, as this doesn't really have any ties to site, and the module is loaded regardless if site is imported or not.

#110769)

gaogaotiantian · 2023-10-26T18:50:16Z

Hi @ambv , are you working on the actual coverage part? It seems like the module needs to be standalone (not importing other modules to maximize the coverage) so it probably won't be part of the libregtest? Are we talking about a new compact coverage module under Lib/? Will it be public to replace part of the old trace? I can help with the coverage part if needed :)

ambv · 2023-10-26T18:52:39Z

I already have the coverage gathering. It's a tiny module ran with PRESITE. Now I'm working on accumulation of results from worker processes to a form that existing trace understands so I don't have to rewrite that part.

My ambition isn't to make a reusable public-facing coverage gatherer. This is specifically about stdlib coverage needs.

gaogaotiantian · 2023-10-26T18:57:55Z

I already have the coverage gathering. It's a tiny module ran with PRESITE. Now I'm working on accumulation of results from worker processes to a form that existing trace understands so I don't have to rewrite that part.

My ambition isn't to make a reusable public-facing coverage gatherer. This is specifically about stdlib coverage needs.

Sounds good. So we don't have any plans to replace/renovate the trace module. I guess not a lot of users are using it as coverage.py is pretty good on that front.

ambv · 2023-10-26T18:59:34Z

Exactly right. Any new user-facing profiler, debugger, coverage gatherer should be started on PyPI.

hugovk · 2023-10-26T19:28:10Z

Relatedly, we have this page on the devguide: https://devguide.python.org/testing/coverage/

It starts of with useful general info about the benefits of coverage, then goes into specifics of measuring coverage. Are the specifics now out of date and need removing? Or shall we wait until we have something new to replace it with?

Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.

… is run (python#110769)

…1710) Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.

… is run (python#110769)

…1710) Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.

ambv added type-feature A feature request or enhancement tests Tests in the Lib/test dir labels Oct 11, 2023

ambv self-assigned this Oct 11, 2023

ambv added a commit to ambv/cpython that referenced this issue Oct 12, 2023

pythongh-110722: Add PYTHON_PRESITE=package.module to import a module…

3daa65a

… before site.py is run This is only available --with-pydebug.

bedevere-app bot mentioned this issue Oct 12, 2023

gh-110722: Add PYTHON_PRESITE to import a module before site.py is run #110769

Merged

ambv added a commit to ambv/cpython that referenced this issue Oct 13, 2023

Merge branch 'main' into pythongh-110722-presite

f192e38

ambv added a commit that referenced this issue Oct 14, 2023

gh-110722: Add PYTHON_PRESITE to import a module before site.py is run (

84b7e9e

#110769)

bedevere-app bot mentioned this issue Nov 3, 2023

gh-110722: Make -m test -T -j use sys.monitoring #111710

Merged

ambv added a commit to ambv/cpython that referenced this issue Nov 6, 2023

Merge branch 'main' into pythongh-110722-regrtest

84263a3

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024

pythongh-110722: Add PYTHON_PRESITE to import a module before site.py…

0eeb817

… is run (python#110769)

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024

pythongh-110722: Add PYTHON_PRESITE to import a module before site.py…

47a884d

… is run (python#110769)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make regrtest collect test line coverage using sys.monitoring #110722

Make regrtest collect test line coverage using sys.monitoring #110722

ambv commented Oct 11, 2023 •

edited by bedevere-app bot

Loading

nedbat commented Oct 11, 2023

ambv commented Oct 11, 2023

nedbat commented Oct 11, 2023

gaogaotiantian commented Oct 11, 2023

terryjreedy commented Oct 11, 2023 •

edited

Loading

encukou commented Oct 12, 2023

gpshead commented Oct 12, 2023

ambv commented Oct 12, 2023

nedbat commented Oct 12, 2023

FFY00 commented Oct 13, 2023

ambv commented Oct 13, 2023

FFY00 commented Oct 13, 2023

gaogaotiantian commented Oct 26, 2023

ambv commented Oct 26, 2023

gaogaotiantian commented Oct 26, 2023

ambv commented Oct 26, 2023

hugovk commented Oct 26, 2023

Make regrtest collect test line coverage using sys.monitoring #110722

Make regrtest collect test line coverage using sys.monitoring #110722

Comments

ambv commented Oct 11, 2023 • edited by bedevere-app bot Loading

Feature or enhancement

python -m test -T

coverage.py?

Use sys.monitoring

Set up monitoring before site.py

Why even gather coverage?

Linked PRs

nedbat commented Oct 11, 2023

ambv commented Oct 11, 2023

nedbat commented Oct 11, 2023

gaogaotiantian commented Oct 11, 2023

terryjreedy commented Oct 11, 2023 • edited Loading

encukou commented Oct 12, 2023

gpshead commented Oct 12, 2023

ambv commented Oct 12, 2023

nedbat commented Oct 12, 2023

FFY00 commented Oct 13, 2023

ambv commented Oct 13, 2023

FFY00 commented Oct 13, 2023

gaogaotiantian commented Oct 26, 2023

ambv commented Oct 26, 2023

gaogaotiantian commented Oct 26, 2023

ambv commented Oct 26, 2023

hugovk commented Oct 26, 2023

ambv commented Oct 11, 2023 •

edited by bedevere-app bot

Loading

`python -m test -T`

Use `sys.monitoring`

Set up monitoring before `site.py`

terryjreedy commented Oct 11, 2023 •

edited

Loading