Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to fix flaky test suite #164

Merged
merged 4 commits into from
Jan 19, 2023

Conversation

danielhollas
Copy link
Contributor

@danielhollas danielhollas commented Jan 11, 2023

I was trying to hunt down the issue behing the GHA CI timeout failures, reported in #163.

The hypothesis was to see if issue #154 was behind this. See my last comment on #154 for details about my investigation. One think I noticed is that the asyncio echo_logs Task would not get cancelled if the container failed to start. I don't know if this is the reason for the test flakiness, but it is worth a shot I think.

EDIT: So this obviously did not help, but it still seems like a change worth doing. I've also decreased the timeout on GHA so that at least we're not wasting CI minutes.

@codecov
Copy link

codecov bot commented Jan 11, 2023

Codecov Report

Base: 86.72% // Head: 86.72% // No change to project coverage 👍

Coverage data is based on head (123339e) compared to base (b8d4da5).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #164   +/-   ##
=======================================
  Coverage   86.72%   86.72%           
=======================================
  Files           9        9           
  Lines         904      904           
=======================================
  Hits          784      784           
  Misses        120      120           
Flag Coverage Δ
py-3.10 86.61% <100.00%> (ø)
py-3.8 86.57% <100.00%> (ø)
py-3.9 86.68% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiidalab_launch/__main__.py 79.93% <100.00%> (ø)
aiidalab_launch/instance.py 87.17% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -158,7 +158,7 @@ def pull(self) -> docker.models.images.Image | None:
self._image = image
return image
except docker.errors.NotFound:
LOGGER.warn(f"Unable to pull image: {self.profile.image}")
LOGGER.warning(f"Unable to pull image: {self.profile.image}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.warn is deprecated, and we were getting a warning from pytest about this.

@danielhollas danielhollas marked this pull request as ready for review January 11, 2023 20:08
Copy link
Member

@yakutovicha yakutovicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @danielhollas.

One test keeps failing with an "out of runtime error". I suggest increasing the timeout to 15. Otherwise, good to go from my side.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
Co-authored-by: Aliaksandr Yakutovich <[email protected]>
@danielhollas
Copy link
Contributor Author

One test keeps failing with an "out of runtime error". I suggest increasing the timeout to 15. Otherwise, good to go from my side.

If you're referring to for example https://github.com/aiidalab/aiidalab-launch/actions/runs/3896105969/jobs/6698675263, yes, the test suite randomly times out for unknown reason, as reported in #163. I've increased it to 15 minutes but this will not solve the issue as it has been happening for some time even with 30 minutes. (the idea of decreasing the timeout was to fail early in those cases).

In the link above, there's a following pytest warning:

=============================== warnings summary ===============================
tests/test_cli.py::TestInstanceLifecycle::test_start_stop_reset
  /opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/asyncio/base_events.py:641: RuntimeWarning: coroutine 'Queue.put' was never awaited
    self._ready.clear()

This points to perhaps that the issue #154 is behind these timeouts, see also my last comments on that issue where I tried to analyze where it is coming from. This prompted this PR, which obviously didn't fix the issue so more investigation is needed.

Sorry for not being clear in the OP.

Copy link
Member

@yakutovicha yakutovicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@danielhollas I let you merge it.

@danielhollas
Copy link
Contributor Author

@yakutovicha can't merge, sorry. Maybe you can try to add the deployment team in this repo settings?

@yakutovicha yakutovicha merged commit 426cbf8 into aiidalab:main Jan 19, 2023
@yakutovicha
Copy link
Member

@danielhollas, please remind me next week (I am currently at a conference). I will give my more rights to the AiiDAlab organisation.

@danielhollas danielhollas deleted the test-flakiness branch February 23, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants