-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make regrtest collect test line coverage using sys.monitoring #110722
Comments
Is there a possibility to use coverage.py with PEP 669 support, or are these independent concerns? |
Thanks for reaching out, @nedbat. Currently the way I see it is that it will be a little awkward to make For this usage internal to CPython, we would need
The other thing I need to support here is support for partial coverage gathering from worker processes, and combining results after the fact. I thought that maybe it will be easier for both sides if we meet in the middle and I make regrtest generate files that
I don't want to set this format in stone just for internal CPython use's sake. Correct me if I'm wrong here, but my current conclusion is that the use case of gathering coverage for user code is quite different from gathering coverage for the standard library itself. |
Thanks for thinking this through, and digging into coverage.py details. The SQLite schema wasn't designed for others to create, but the CoverageData object was. Meeting in the middle might mean writing your own file in whatever format is convenient, then as an after-step reading those files and using CoverageData to create coverage.py-compatible files. |
It would be nice to have a webpage like codecov.io to show coverage of the stdlib at least for Speaking of which, this might conflict with the existing trace/profile/monitoring tests. |
Changing in the relative coverage is only properly interpreted by itself, without looking at the line-by-line coverage report, if the denominator is unchanged, which is to say, if the PR only changes tests and not substantive code. But in that special case, any decrease is bad* because the substantive coverage must have gone down. And one could argue that uncovering some lines while covering other is not great even if the total number covered has a net increase. *EDIT An exception would be if a flaky test with false failures is disabled. I would call this an unfortunate necessity. The problem with running coverage on all PRs, as once done, is that code changes can introduce spurious ratio changes. If code is merely reformated, with no test changes, any coverage change has no meaning. If a completely covered but buggy function is fixed and the result has fewer lines (and maybe a new test of the function is added), coverage will go down if there are uncovered lines elsewhere, even though code quality and effective coverage has gone up. Ditto if a function is sped up with fewer lines. |
Per https://discuss.python.org/t/35180, you might want to name it But the use case seems too specialized. It practically invites people to use it for other reasons, for which it would e.g. be better to putting the modules in |
Yeah, I don't want other (ab)uses of such a feature. My initial gut feeling leans towards something that isn't a feature anyone else could use and just an internal implementation detail for us alone (perhaps debug builds only?) that would have us capture bare bones execution data while executing before the time that could be handed off once the interpreter is up and running sufficiently for I'm clearly thinking non-general purpose startup coverage specific not a line callback stuff here. I'm not sure I like this idea. But I never said my gut brainstorms were smart. |
Good point about calling it If we're worried about abuse, making this only work with pydebug builds is fine by me. In practice coverage with or without pydebug can have subtle differences, I'll measure that but I don't expect the differences to be large enough for us to care. |
… before site.py is run This is only available --with-pydebug.
This is how the "fullcoverage" part of coverage.py worked. |
@ambv, would it be feasible for the regrtest implementation to run the interpreter with |
We could do that but this doesn't cleanly solve the issue because it would require regrtest to be restructured to not initially import anything from the stdlib, and to perform the site.py import if
|
That makes sense, thank you. I'd consider dropping the "site" reference, in favor of something like "preload", though, as this doesn't really have any ties to |
Hi @ambv , are you working on the actual coverage part? It seems like the module needs to be standalone (not importing other modules to maximize the coverage) so it probably won't be part of the libregtest? Are we talking about a new compact coverage module under |
I already have the coverage gathering. It's a tiny module ran with PRESITE. Now I'm working on accumulation of results from worker processes to a form that existing My ambition isn't to make a reusable public-facing coverage gatherer. This is specifically about stdlib coverage needs. |
Sounds good. So we don't have any plans to replace/renovate the |
Exactly right. Any new user-facing profiler, debugger, coverage gatherer should be started on PyPI. |
Relatedly, we have this page on the devguide: https://devguide.python.org/testing/coverage/ It starts of with useful general info about the benefits of coverage, then goes into specifics of measuring coverage. Are the specifics now out of date and need removing? Or shall we wait until we have something new to replace it with? |
Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
…1710) Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
…1710) Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.
Feature or enhancement
Currently, there is no good way to collect test line coverage of the standard library.
python -m test -T
python -m test -T
is usingsys.settrace
through thetrace
module, which slows things down significantly. Moreover, it is set relatively late, after a ton of standard library modules were already imported, making collecting full coverage from them hard. What's more, this way of collecting coverage doesn't work with-j
, and we want to switch to-j0
by default in the test runner since the battle for tests to be fully idempotent is impossible.coverage.py?
We had a hack in coverage.py called "fullcoverage", which mimicked the
encodings
module to install early, but it was recently removed because it didn't work in 3.13 and we haven't been tracking coverage in CI since 2021.Use
sys.monitoring
I want to use PEP 669 for this because it's faster and works across threads, which will become relevant with PEP 703. The idea is to gather partial coverage in each regrtest worker subprocess and then combine the results through the main process when tests are done.
Set up monitoring before
site.py
To gather coverage, we need to register a callback for the line event with
sys.monitoring
. Moreover, to get reliable data for this, this gathering needs to start very early in the interpreter, beforesite.py
executed, as that triggers a large amount of standard library imports.This will need a small addition to the interpreter environment, an environment variable called
PYTHONPRESITE
, which takes a module name. Setting it makes the interpreter attempt to import this module beforesite.py
and the__main__
module are created. To allow for input/output, the module is imported after encodings were loaded, stdout/stderr/stdin have been set, andopen()
was exposed. The module is not considered__main__
, and stays imported after loading, making it possible for tooling loaded later to use data gathered by it.Why even gather coverage?
I don't plan to beat people with a stick because line coverage fell from 81% to 79%.
There's been a lot of churn lately in terms of test refactors, and it's not entirely clear while reviewing those changes whether we are still running the same checks as before. In effect, I don't feel very confident approving such large changes with rearranged TestCase base classes. In those cases, I run
coverage.py
with some hacks to confirm that we're not losing coverage with the change, but ultimately this is slow, painful to do, and still generates data that isn't trustworthy.An easy way to get simple test line coverage stats without installing anything would solve my issue, and hopefully, more people would then use it.
Linked PRs
-m test -T -j
use sys.monitoring #111710The text was updated successfully, but these errors were encountered: