Skip to content

Commit

Permalink
Attempt to fix the flaky TestTriggererJob test (#34075)
Browse files Browse the repository at this point in the history
As documented in #33323, we have frequent failures of the flaky
triggerer job tests.

The flaky failures are about some errors when we close all the
sessions in teardown of the test. It turns out that the tests
had side-effect - they have not waited for the TriggererJob thread
to complete, they merely marked them to be stopped, but they have
not waited for those to complete - this is quite plausible
explanation of the flaky test failures - since those threads have
1 second sleep, it's more than likely that the session has been
created and used by the thread while the teardown has been attempting
to close all the sessions.

This side effect could also have an effect for other tests that
were run after - because in a busy test run machine, the side
effects could propagate further than just to the teardown, so it
could also explain why sometimes (very rarely) other job tests
failed with similar errors.

The fix is to join the runner after marking it to be stopped.

Fixes: #33323 (Hopefully)
  • Loading branch information
potiuk authored Sep 4, 2023
1 parent 601b9cd commit 47f79b9
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions tests/jobs/test_triggerer_job.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ def __init__(self, password, **kwargs):
finally:
# We always have to stop the runner
triggerer_job_runner.trigger_runner.stop = True
triggerer_job_runner.trigger_runner.join(30)

# Since we have now an in-memory process of forwarding the logs to stdout,
# give it more time for the trigger event to write the log.
time.sleep(0.5)
Expand Down Expand Up @@ -257,6 +259,7 @@ def test_trigger_lifecycle(session):
finally:
# We always have to stop the runner
job_runner.trigger_runner.stop = True
job_runner.trigger_runner.join(30)


class TestTriggerRunner:
Expand Down Expand Up @@ -408,7 +411,7 @@ def handle_events(self):
pytest.fail("did not observe 2 loops in the runner thread")
finally:
job_runner.trigger_runner.stop = True
job_runner.trigger_runner.join()
job_runner.trigger_runner.join(30)
thread.join()
instances = path.read_text().splitlines()
assert len(instances) == 1
Expand Down Expand Up @@ -514,7 +517,7 @@ async def create_triggers(self):
finally:
job_runner.trigger_runner.stop = True
# with suppress(MockTriggerException):
job_runner.trigger_runner.join()
job_runner.trigger_runner.join(30)
thread.join()


Expand Down Expand Up @@ -545,6 +548,7 @@ def test_trigger_firing(session):
finally:
# We always have to stop the runner
job_runner.trigger_runner.stop = True
job_runner.trigger_runner.join(30)


def test_trigger_failing(session):
Expand Down Expand Up @@ -578,6 +582,7 @@ def test_trigger_failing(session):
finally:
# We always have to stop the runner
job_runner.trigger_runner.stop = True
job_runner.trigger_runner.join(30)


def test_trigger_cleanup(session):
Expand Down

0 comments on commit 47f79b9

Please sign in to comment.