Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tests robust and independent #543

Merged
merged 28 commits into from
Sep 12, 2024
Merged

Make tests robust and independent #543

merged 28 commits into from
Sep 12, 2024

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Sep 11, 2024

This is a continuation from #521 that puts the branch in the upstream repo rather than my fork to enable windows test runners to access the GAMS_LICENSE secret. Please see the original PR for a detailed description, to-dos, etc.

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • [ ] Add, expand, or update documentation. Just CI tests.
  • [ ] Update release notes. Just CI tests.

@glatterf42 glatterf42 added enh New features & functionality ci Continuous integration labels Sep 11, 2024
@glatterf42 glatterf42 self-assigned this Sep 11, 2024
Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.8%. Comparing base (90cfe46) to head (749709e).
Report is 28 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #543   +/-   ##
=====================================
  Coverage   98.8%   98.8%           
=====================================
  Files         44      44           
  Lines       4813    4819    +6     
=====================================
+ Hits        4756    4765    +9     
+ Misses        57      54    -3     
Files with missing lines Coverage Δ
ixmp/backend/jdbc.py 97.4% <100.0%> (+<0.1%) ⬆️
ixmp/core/platform.py 98.9% <ø> (ø)
ixmp/testing/data.py 97.3% <100.0%> (+<0.1%) ⬆️
ixmp/testing/jupyter.py 100.0% <ø> (+6.0%) ⬆️
ixmp/tests/backend/test_base.py 98.9% <100.0%> (ø)
ixmp/tests/backend/test_jdbc.py 100.0% <100.0%> (ø)
ixmp/tests/core/test_scenario.py 100.0% <100.0%> (ø)
ixmp/tests/core/test_timeseries.py 100.0% <100.0%> (ø)
ixmp/tests/report/test_operator.py 100.0% <100.0%> (ø)
ixmp/tests/report/test_reporter.py 100.0% <100.0%> (ø)
... and 6 more

... and 1 file with indirect coverage changes

@glatterf42
Copy link
Member Author

I haven't seen test_export_timeseries_data being flaky in a while, but I also had not included anything here that could fix its flakiness. The error seems to arise from the fact that export_timeseries_data() to a tmp_path and reading it back in sometimes produces an empty df (reading back in via pd.read_csv()).
I'm not entirely sure why this is happening. Pytest's docs mention that "[c]oncurrent invocations of the same test function are supported by configuring the base temporary directory to be unique for each concurrent run." But the details below state that care is already taken when tests are distributed on a local machine using pytest-xdist. Still, I figure it can't hurt trying, so I've now included a pytest option that discards all directories created from tmp_path after each test (if that's true, note the singular). If this is indeed the source of flakiness, it might be resolved now, or if some test is still trying to read in files that have now been deleted, we will at least have more information about what's going on (assuming the error message is helpful).

@glatterf42
Copy link
Member Author

I'm just noticing: the windows tests consume much more time than all others. This is not specific to this PR; it was true also for recent dependabot PRs. Not sure if that is worth investigating, definitely not high in priority, I'm afraid.

@glatterf42
Copy link
Member Author

glatterf42 commented Sep 11, 2024

I'm rerunning these tests a few times now to see if any flakiness comes up. If it doesn't, I'm happy with the current state of the PR. Just to be sure: my current solution for the flakiness of the tutorial tests is to run these tests on the same worker, which might run counter to our desire to distribute the runs using pytest-xdist. However, the tutorial suite in ixmp is small enough so that this doesn't worry me. Most test on recent Python versions still finish in ~3 minutes, which I think is fine.

@glatterf42
Copy link
Member Author

glatterf42 commented Sep 11, 2024

The second rerun already revealed something interesting: On windows-py3.11, both of the first tutorial tests timed out when trying to reach the platform, but the second ones where happy to start. Unfortunately, for the R tutorials, the first test is required as the second tutorial requires the scenario from the first tutorial to be available. I have now introduced the pytest-dependency plugin to skip the second tutorial if the first one failed, but this still doesn't explain why the cells couldn't connect to the platform in the first place.

@glatterf42 glatterf42 linked an issue Sep 11, 2024 that may be closed by this pull request
@glatterf42
Copy link
Member Author

pytest-dependency did not work as intended, so I'm now trying to include the helper function for creating the scenario that the first tutorial would create in the second one so that it can run independently of the first.

@glatterf42
Copy link
Member Author

#467 is also still open, despite there being a workaround in our code base: the tutorial tests are skipped on ubuntu because they're taking too long. Not sure if we need to keep this issue open, I'm not working on running the tutorials on ubuntu with this PR. We could still use it as a tracking issue if we want to remove the skips one day.

@glatterf42
Copy link
Member Author

I've run these tests a number of times now and the only tests I saw failing (but twice, unfortunately) were the first parts of the tutorial tests (i.e. the tests not called *_scenario). The second parts seemed to work, which makes me wonder: maybe there's an increased time-out risk for the first test that runs? Could it still interfere with some background task?
This also only happened on windows-py3.11, which might be coincidence.
As far as async is concerned, I didn't find any option to enable or disable it for NotebookClient. However, we could theoretically overwrite the default kernel manager, which is an AsyncKernelManager. This is the default, though, likely faster, tested, and working for all our other cases, so I'm hesitant here.
Maybe the flakiness of the first parts of the tutorial tests is for the moment best addressed with a flaky marker.

@glatterf42
Copy link
Member Author

test_del_ts is flaky again, too, though I'm not sure if not something else is going on there, too. The traceback includes lots of lines that seem to be logging output (or so) of something else.

@glatterf42
Copy link
Member Author

glatterf42 commented Sep 12, 2024

We discussed test_del_ts extensively in #521.
I'm now choosing to not dive deep into this or the tutorial-related failures since they may well be due to very specific technological details that are not relevant anymore once we transition to ixmp4. As far as I know, ixmp4 is currently only tests on ubuntu, but doesn't contain a single flaky test (despite having ~90% coverage). So for now, I think our best course of action is to mark these tests as flaky (rerunning them when they are failing; only on Windows and GHA) and migrating to ixmp4 as soon as we can.
I'd not take the same approach with message_ix or message-ix-models, which won't be replaced in the future as far as I can tell.

@glatterf42
Copy link
Member Author

I have rerun the tests now four times and did not observe any flaky behaviour, not even reruns due to flakiness. One time, the setup-graphviz action failed due to the backend server being overloaded, but there's not a lot we can do about that in this PR.
I did notice, though, that consistently this happens: XPASS ixmp/tests/backend/test_jdbc.py::test_del_ts - #463
This test got excluded for all Python versions <= 3.10 because of an incompatibility with jpype 1.4.1 and Python 1.5.0, which does not seem to persist with pandas > 2.x. I'll remove the xfail mark for this and see what happens.

Copy link
Member

@khaeru khaeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough effort here.

Some comments inline, and I pushed one commit to satisfy mypy in .testing.data. Also:

  • R_transport_scenario.ipynb appears to have added cell metadata that are editor-specific. I'd prefer we not commit these unless they are needed for a specific reason.

Aside from those items, good to go.

@@ -7,53 +7,73 @@

from ixmp.testing import get_cell_output, run_notebook

group_base_name = platform.system() + platform.python_version()

GHA = "GITHUB_ACTIONS" in os.environ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we'd define this in ixmp.testing. If it's only used here for now, fine to leave it; but if it's to be re-used, we can move it there.

ixmp/tests/test_tutorials.py Outdated Show resolved Hide resolved
@glatterf42 glatterf42 merged commit 080ad2b into main Sep 12, 2024
21 checks passed
@glatterf42 glatterf42 deleted the enh/ci-test-stability branch September 12, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration enh New features & functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Address tests that are flaky with pytest-xdist
2 participants