You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several tests, including test_composite_trustworthiness in umap/tests/test_composite_models.py, seem to be flaky when all seed-setting code (e.g. np.random.seed(0) or tf.random.set_seed(0)) is commented out or when a random value is assigned to a seed-setting function (e.g. sklearn.utils.check_random_state()).
For instance, in commit ae5255b, test_composite_trustworthiness failed ~12% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed or altered.
test_composite_trustworthiness tests the trustworthiness of combinations of UMAP models.
Motivation
Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.
Environment
The tests were run using pytest 6.2.3 in a conda environment with Python 3.6.13. The OS used was Ubuntu 16.04.
Possible Solutions
One possible solution to reduce flakiness is to change the parameters used for prediction. We tried changing the following parameters.
Increasing n_epochs for both model1 and model2 from 50 to 70 reduced flakiness to ~6%.
Increasing n_epochs for both model1 and model2 from 50 to 100 reduced flakiness to ~2%.
Increasing n_epochs for only model1 from 50 to 100 reduced flakiness to ~2%.
Another possible solution is to change the values used in the assertions since values used in assertions may be unnecessarily conservative. Both assertions (i.e. line 27 and line 32) are flaky.
Decreasing the values checked in both assertions from .82 to .80 reduced flakiness to ~5%.
Decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.
Increasing n_epochs for only model1 from 50 to 100 and decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.
These changes did not increase the runtime of the test significantly.
Please let us know if these solutions are feasible or if there are any other solutions that should be incorporated. If you are interested, we can send the details of other tests demonstrating similar behavior. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.
The text was updated successfully, but these errors were encountered:
Introduction
Several tests, including
test_composite_trustworthiness
inumap/tests/test_composite_models.py
, seem to be flaky when all seed-setting code (e.g.np.random.seed(0)
ortf.random.set_seed(0)
) is commented out or when a random value is assigned to a seed-setting function (e.g.sklearn.utils.check_random_state()
).For instance, in commit ae5255b,
test_composite_trustworthiness
failed ~12% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed or altered.test_composite_trustworthiness
tests the trustworthiness of combinations of UMAP models.Motivation
Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.
Environment
The tests were run using
pytest 6.2.3
in aconda
environment withPython 3.6.13
. The OS used wasUbuntu 16.04
.Possible Solutions
One possible solution to reduce flakiness is to change the parameters used for prediction. We tried changing the following parameters.
Increasing
n_epochs
for bothmodel1
andmodel2
from 50 to 70 reduced flakiness to ~6%.Increasing
n_epochs
for bothmodel1
andmodel2
from 50 to 100 reduced flakiness to ~2%.Increasing
n_epochs
for onlymodel1
from 50 to 100 reduced flakiness to ~2%.Another possible solution is to change the values used in the assertions since values used in assertions may be unnecessarily conservative. Both assertions (i.e. line 27 and line 32) are flaky.
Decreasing the values checked in both assertions from .82 to .80 reduced flakiness to ~5%.
Decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.
Increasing
n_epochs
for onlymodel1
from 50 to 100 and decreasing the values checked in both assertions from .82 to .78 reduced flakiness to ~0%.These changes did not increase the runtime of the test significantly.
Please let us know if these solutions are feasible or if there are any other solutions that should be incorporated. If you are interested, we can send the details of other tests demonstrating similar behavior. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.
The text was updated successfully, but these errors were encountered: