Overbroad xfail marks will eventually make CI fail #1728

EliahKagan · 2023-11-03T15:20:03Z

Background

#1679 included improvements to a number of tests that are known to fail on some platforms, by marking them xfail instead of skip so they are still run and their status is reported, but without a failing status causing the whole test run to fail. However, it applied xfail to too many tests, due to limitations on granularity when applying pytest marks to unittest test cases generated by @ddt parameterization.

GitPython/test/test_util.py

Lines 221 to 228 in 340da6d

    
               # FIXME: Mark only the /proc-prefixing cases xfail, somehow (or fix them). 
        
               @pytest.mark.xfail( 
        
                   reason="Many return paths prefixed /proc/cygdrive instead.", 
        
                   raises=AssertionError, 
        
               ) 
        
               @skipUnless(sys.platform == "cygwin", "Paths specifically for Cygwin.") 
        
               @ddt.idata(_norm_cygpath_pairs + _unc_cygpath_pairs) 
        
               def test_cygpath_ok(self, case):

GitPython/test/test_util.py

Lines 233 to 245 in 340da6d

    
               @pytest.mark.xfail( 
        
                   reason=R'2nd example r".\bar" -> "bar" fails, returns "./bar"', 
        
                   raises=AssertionError, 
        
               ) 
        
               @skipUnless(sys.platform == "cygwin", "Paths specifically for Cygwin.") 
        
               @ddt.data( 
        
                   (R"./bar", "bar"), 
        
                   (R".\bar", "bar"),  # FIXME: Mark only this one xfail, somehow (or fix it). 
        
                   (R"../bar", "../bar"), 
        
                   (R"..\bar", "../bar"), 
        
                   (R"../bar/.\foo/../chu", "../bar/chu"), 
        
               ) 
        
               def test_cygpath_norm_ok(self, case):

Upcoming impact

Although this was known and discussed in #1679, and FIXME comments about it were included in the code, the problem turns out to be somewhat more serious than I had anticipated: if not addressed, it will eventually lead to test failures in a future version of pytest. This is because the default behavior of an unexpectedly passing test--one that is marked xfail but passes--will most likely change in pytest 8. Because GitPython does not specify upper bounds on most of its development dependencies, and pytest is one of the development dependencies for which no upper bound is specified, pytest 8 will be automatically installed once it is (stably) released.

Specifically, and in the absence of configuration or command-line options to pytest that override the behavior:

A test marked xfail that fails, and fails in the expected way, produces an XFAIL status, which is treated similarly to PASS. We always want this.
A test marked xfail that fails in a detectably unexpected way--where a different exception results than the one that was expected--produces a FAIL status. We always want this.
A test marked xfail that passes produces an XPASS status. How this status is treated is more complicated. The xfail mark supports an optional strict parameter. Where present, it determines whether the XPASS fails the test run like a FAIL status would, or does not fail the test run (thus behaving like PASS or XFAIL). If absent, the xfail_strict configuration option provides the default. Currently, as of pytest 7, xfail_strict defaults to False when not specified.

As noted in pytest-dev/pytest#11467, which was opened by a pytest maintainer and is listed for pytest's 8.0 milestone, the default is planned to be changed from False to True starting in pytest 8.0. (See also pytest-dev/pytest#11499.)

Possible fixes

Breakage could be avoided (at least for a while, since strict=False may eventually be removed as a feature) by passing strict=False or setting xfail_strict=false for pytest in pyproject.toml. It is also possible to set an upper bound like <8 for pytest in test-requirements.txt.

However, I recommend this instead be fixed by reorganizing the tests in test_util.py so that the tests of cygpath and decygpath--which are the ones that have the insufficiently precise xfail markings that mark some generated test cases xfail even though they are known to pass--can be pure pytest tests. Because they are currently unittest tests, they cannot be generated by @pytest.mark.parametrize (hence @ddt is used). But if they could be generated with the parametrize mark then they could have per-case markings, because parametrize supports an optional marks argument. They could then have the xfail mark applied to exactly the cases where failure is really expected.

That approach – which I mentioned in #1679 itself and in #1700 (comment), and more recently alluded to in #1725 and #1726 (comment) – has the following advantages over other approaches that effectively just suppress the problem:

Any XPASS will be a sign that something has changed and should be looked into, thereby building on the improvements in #1679.
Although we have FIXME comments, the current situation is still misleading in the test results themselves, which indicate that some tests are unexpectedly passing.
When the default treatment of XPASS in pytest changes--but also even before that, once it is documented to change--the presence of expected XPASSes will be more misleading than it is already, even if GitPython is not using a version of pytest affected by the change. This is because that change will further solidify people's expectations about what XPASS indicates, including for people who are trying to become familiar with GitPython.
Reorganizing the tests in test_util.py can also help clarify the tests of rmtree behavior, and help make them easier to modify. This is useful because it will allow building on #1700 toward an eventual complete fix for #790. (In addition, I want to make sure the planned native Windows CI jobs don't have the effect of calcifying cleanup logic in rmtree that otherwise could or should change, or at least that this does not happen in ways that impinge on non-Windows platforms. I think such a reorganization will help with that, too.)

I have opened #1729, which fixes this issue by reorganizing tests in test_util.py in this way.

The text was updated successfully, but these errors were encountered:

Byron · 2023-11-03T18:24:16Z

I am sorry not to have more feedback, but this part of the tests is way over my head 😅.

EliahKagan · 2023-11-03T18:41:20Z

Did it become so as a result of this change? If so, then probably a further change should be made to add comments, docstrings, something in doc/, or to rework something to be clearer. Alternatively, if it became so due to this change but for a pytest-related reason, then I may be able to provide information about that, and then whether any further change should be made to clarify it could be determined. On the other hand, if it was already fully that way and did not become so as a result of this change, then maybe none of that has to be done but I could continually look for opportunities to make the test code clearer next time I work on it.

Byron · 2023-11-03T19:09:36Z

I think it's that many of these tests generally are containing multiple steps and thus take some state to keep track of, which makes them harder to follow. But in this case I think the issue was that the change was so broad that the diff didn't actually help this time around. It would have been better for me to look at it in an editor, I then again, I do trust that you leave the place better than you found it and overall, the complexity of the code is lower now. So, all good, it's me, I am having one of these days as well I guess, in a good way, yet, I am more fuzzy than usual.

EliahKagan · 2023-11-04T16:56:37Z

I think it's that many of these tests generally are containing multiple steps and thus take some state to keep track of, which makes them harder to follow.

Yes, many tests are long and test many things, and one of my worries about the situation with native Windows tests is that, in some cases, when a test fails, it never tests many of the numerous other things it was written to test. Changing skip to xfail markings runs more tests and tracks information about known-failing tests' first failures, but many tests in GitPython's test suite will, in effect, skip most of themselves on the first failure, no matter how they are run. Sometimes this is arguably valuable, in that, after some failures, the subsequent checks don't make much sense. But often it would not have to be done this way.

But in this case I think the issue was that the change was so broad that the diff didn't actually help this time around.

When I looked over #1729 in the GitHub interface (this was just a brief look before I marked the PR ready for review), I found the default "Unified" view, which is often useful, to be incomprehensible, but the "Split" view usable. However, even for the "Split" view, it may be that my knowledge of the changes--having myself made them, and recently--was a big factor.

It would have been better for me to look at it in an editor

The web editor, which can be opened for a PR by pressing . from the PR page, can sometimes be a convenient way to get a richer review experience without leaving the browser and without needing to be on a machine with an editor or other development tools. I'm not sure if that's one of the tools you sometimes use, but if so, then in this specific case, it might've contributed to making your experience worse, because GitHub suffered an outage around the time #1729 was approved (showing 404 pages for repositories, and with some interruption to accessing GitHub-hosted remotes and various other services). Usually, though, I think it works fairly well... though whether or not one likes it is a separate question.

then again, I do trust that you leave the place better than you found it [...]

Thanks! Of course, it is quite possible for me to make mistakes. It is also possible for me to introduce intermediate states of improvement without full insight into whether they are worth merging at that time. For example, #1661 was greatly improved due to #1661 (review). The area where I think my pull requests might carry the greatest risk of making GitPython worse is when they embody consequential design decisions that might be wrong (not applicable in #1729, which is fairly narrow in its effects and affects only the test suite).

So, all good, it's me, I am having one of these days as well I guess [...]

No problem!

Byron · 2023-11-05T20:38:11Z

The web editor, which can be opened for a PR by pressing . from the PR page, can sometimes be a convenient way to get a richer review experience without leaving the browser and without needing to be on a machine with an editor or other development tools.

🙀 That's amazing! How could I not know that? Thanks so much, I will definitely use that more. I actually use VSCode for GitPython, mainly because getting some level of code-intelligence is easy and it has a usable vim mode as well.

The area where I think my pull requests might carry the greatest risk of making GitPython worse is when they embody consequential design decisions that might be wrong (not applicable in #1729, which is fairly narrow in its effects and affects only the test suite).

It's true, the test-suite is the only place where changes can even be made, and there I think it's just about hitting a spot where it's idiomatic and modern, maybe along with change to the away the tests actually work, i.e. make them more focussed where possible and properly isolated. An no matter what, I don't think it can ever be worse :D.

This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)

EliahKagan · 2024-02-20T20:04:57Z

Some updated information: pytest 8 was released on 27 January 2024 and does not contain the change to the default xfail behavior, but this change is still planned. It has been moved from the 8.0 to 9.0 milestone.

Major version 8 of pytest is able to run on all of GitPython's CI test jobs except the 3.7 jobs. A compatible version is automatically selected and the release has not caused any tests to fail or otherwise malfunction.

Pytest 8 improves reporting, including of XFAIL. However, with strict=True, an XPASS is still presented as FAILED such that it may not be obvious to developers less familiar with the tool that the failure may represent a bug that has been fixed such that a test should no longer be marked xfail, rather than a bug that has been introduced. See 0f8cd4c and 0ae5dd1 in #1745 for context on this. So I think it is still the case that if GitPython adds xfail_strict = true in pyproject.toml then an accompanying clarification should be made in CONTRIBUTING.md, and I am continuing to hold off on proposing such changes.

🙀 That's amazing!

In addition to the web editor, there is also Codespaces, where the editor user interface frontend runs either in a browser or in the VS Code desktop application, but the editor backend and the rest of the development environment--which is a fully functional environment--run in a container on a cloud-hosted virtual machine. (There are other things like this, of which the most prominent seems to be Gitpod.) Codespaces likewise integrate with GitHub, including with pull requests.

The web editor supports some extensions, including the Python extension and GitLens. Only some features work in the web editor, since there is no full backend, but this includes the GitLens commit graph feature. However, the best visual commit graph tool I've ever used--even though it is no longer maintained--is the Git Graph extension. A while ago you had mentioned that you don't have a tool to visualize complex git graphs. If you're using VS Code and haven't tried the Git Graph extension, I recommend it; it produces by far the clearest visualizations for this that I have seen. The Git Graph extension does not support the web editor, and in practice I sometimes use a codespace rather than the web editor just to use Git Graph.

Codespaces can be customized with a different Docker image, VS Code extensions, VS Code configuration (e.g., to automatically find unit tests and allow them to be run in the editor), development tools, startup scripts, and so forth. Really it is the dev container that runs in the codespace that is customized. Dev containers can also be run locally with VS Code and Docker (which should not be confused with connecting to a codespace from the VS Code desktop application, which can also be done).

I have thought about proposing a dev container configuration for GitPython to allow users to try it out, and developers to work on changes, with effectively zero extra steps. To be clear, codespaces and local dev containers can already be used to try out, and to develop changes to, GitPython; I am talking about customizing the environment to make it a bit nicer and also make it so one need not do any setup steps. The reason I have opened such a PR, though, is that to be useful enough to justify itself, I think it would have to get things to the point where unit tests can be run...which currently entails running init-tests-after-clone.sh. This can easily be done in the dev container's startup scripts, but the problem is that, in one of the ways of running a local dev container, this would create a risk of data loss.

In practice, dev containers are mostly used in GitHub Codespaces. Local dev containers are also useful. They are most useful when one clones (or re-clones) the repository into the dev container, for maximal isolation from the host machine, which the VS Code extension for dev containers can do (and will even offer to do if one opens a local clone of the repository). However, it is also possible to map the local repository directory on the host as storage for the dev container. This is not recommended on Windows and macOS, because it is much slower than cloning into container storage. Even on GNU/Linux, one forgoes some of the benefits of a dev container by sacrificing isolation. However, it can be done, and people sometimes do it. In this situation, running init-tests-after-clone.sh will delete files outside the container.

This can be solved, but it may not be worthwhile to do so, because there would still be the need to switch to a master branch for some of the tests and back from it to develop feature changes, which would limit how much a custom dev container configuration could help new contributors. Instead, this is a further reason to fix #914. Once that is done, so tests are isolated, both the problem with having the dev container automatically run the init script, and the problem that the need to switch branches for some tests to pass makes onboarding harder, go away.

Byron · 2024-02-21T20:14:55Z

So I think it is still the case that if GitPython adds xfail_strict = true in pyproject.toml then an accompanying clarification should be made in CONTRIBUTING.md, and I am continuing to hold off on proposing such changes.

I agree, it seems easier to 'maintenance mode' to have less CI failures, even if they are for the right reasons. With that said, it seems to be a communication problem more than anything else. If pytest would make clear that xfail can be removed for a certain condition, then 'strict' mode should be more beneficial.

A while ago you had mentioned that you don't have a tool to visualize complex git graphs.

Thanks for the hint! Maybe what I meant is that none of these tools simplify the graph to what truly matters, where 'truly matters' is certainly the hard part to get right. Visualizing is one part, but doing so in a manner that is possible to follow is another. Maybe such tool just doesn't exist unless one day I write it, just to test the hypothesis that this visualization could be better.

This can easily be done in the dev container's startup scripts, but the problem is that, in one of the ways of running a local dev container, this would create a risk of data loss.

I see. This container configuration could only be safe in the (maybe not so unusual) local setup if unit tests would indeed be isolated. Achieving this shouldn't even be this hard. I think I mentioned how gitoxide does it, and think a similar approach would work here too. Thus, the tests could still use the GitPython repository, it just wouldn't be the user's checkout anymore.

EliahKagan changed the title ~~Spurious XPASS from overbroad xfail marks will eventually make CI fail~~ Overbroad xfail marks will eventually make CI fail Nov 3, 2023

EliahKagan mentioned this issue Nov 3, 2023

Reorganize test_util and make xfail marks precise #1729

Merged

Byron added acknowledged help wanted labels Nov 3, 2023

Byron closed this as completed in #1729 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overbroad xfail marks will eventually make CI fail #1728

Overbroad xfail marks will eventually make CI fail #1728

EliahKagan commented Nov 3, 2023 •

edited

Loading

Byron commented Nov 3, 2023

EliahKagan commented Nov 3, 2023

Byron commented Nov 3, 2023

EliahKagan commented Nov 4, 2023

Byron commented Nov 5, 2023

EliahKagan commented Feb 20, 2024

Byron commented Feb 21, 2024

Overbroad xfail marks will eventually make CI fail #1728

Overbroad xfail marks will eventually make CI fail #1728

Comments

EliahKagan commented Nov 3, 2023 • edited Loading

Background

Upcoming impact

Possible fixes

Byron commented Nov 3, 2023

EliahKagan commented Nov 3, 2023

Byron commented Nov 3, 2023

EliahKagan commented Nov 4, 2023

Byron commented Nov 5, 2023

EliahKagan commented Feb 20, 2024

Byron commented Feb 21, 2024

EliahKagan commented Nov 3, 2023 •

edited

Loading