[FEA] Test estimators with hypothesis #4960
Labels
2 - In Progress
Currenty a work in progress
? - Needs Triage
Need team to review and classify
feature request
New feature or request
Many of cuml's tests are aimed at comparing results between different implementations, e.g., the GPU and CPU implementation of the same estimator and against third-party implementations, notably scikit-learn. We expect estimators to overall behave very similarly and their results to be identical up to numerical precision.
Further, estimators are usually tested only against a specific combination of inputs and example datasets, an approach that likely fails to test rare edge cases and cannot provide confidence for the equivalence of a wide-range of inputs and datasets. Using hypothesis to test estimators and compare results has therefore two positive effects:
A potential downside is an increase in test implementation complexity and test runtime. The former can be mitigated through a well-designed abstraction of hypothesis-strategies and may then actually lead to a reduction of complexity, the latter can be mitigated through limiting the number of hypothesis iterations and potentially only running hypothesis tests as part of the stress tests.
I suggest the following break-down for implementation:
#4960 (comment)
The text was updated successfully, but these errors were encountered: