Flaky tests #731

melonwater211 · 2021-07-21T22:05:34Z

Introduction

Several tests, including test_composite_trustworthiness in umap/tests/test_composite_models.py, seem to be flaky when all seed-setting code (e.g. np.random.seed(0) or tf.random.set_seed(0)) is commented out or when a random value is assigned to a seed-setting function (e.g. sklearn.utils.check_random_state()).

For instance, in commit ae5255b, test_composite_trustworthiness failed ~12% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed or altered.

test_composite_trustworthiness tests the trustworthiness of combinations of UMAP models.

Motivation

Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.

Environment

The tests were run using pytest 6.2.3 in a conda environment with Python 3.6.13. The OS used was Ubuntu 16.04.

Possible Solutions

One possible solution to reduce flakiness is to change the parameters used for prediction. We tried changing the following parameters.

Increasing n_epochs for both model1 and model2 from 50 to 70 reduced flakiness to ~6%.

Increasing n_epochs for both model1 and model2 from 50 to 100 reduced flakiness to ~2%.

Increasing n_epochs for only model1 from 50 to 100 reduced flakiness to ~2%.

Another possible solution is to change the values used in the assertions since values used in assertions may be unnecessarily conservative. Both assertions (i.e. line 27 and line 32) are flaky.

Decreasing the values checked in both assertions from .82 to .80 reduced flakiness to ~5%.

Decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.

Increasing n_epochs for only model1 from 50 to 100 and decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.

These changes did not increase the runtime of the test significantly.

Please let us know if these solutions are feasible or if there are any other solutions that should be incorporated. If you are interested, we can send the details of other tests demonstrating similar behavior. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.

The text was updated successfully, but these errors were encountered:

crawlingcub mentioned this issue Sep 28, 2021

Fix flaky test #773

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky tests #731

Flaky tests #731

melonwater211 commented Jul 21, 2021

Flaky tests #731

Flaky tests #731

Comments

melonwater211 commented Jul 21, 2021

Introduction

Motivation

Environment

Possible Solutions