You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
This issue is not really a bug, but more of a description of the results of a quick analysis of why sometimes some tests that locally work super fast, end up in the list of the slowest pytests by orders of magnitude. It probably does not explain entirely why does this happen since it was a naive fast test. But is a good place to start the analysis, and can serve as a good tool for developers to analyze tests locally and reproduce CI conditions.
The spoiler (TL;DR) of the issue is: pytests that have CPU processing (particularly if it is multi-threaded) get slowed down immensely , up to 2 orders of magnitude, more than pytests that are pure GPU (or close to) bound when a system is loaded by other tasks. CI probably is exhibiting and magnifying this behavior.
Steps/Code to reproduce bug and full results
I built cuML from source, but using nightly conda packages should work just as well. First I ran the test suite on my workstation (details at the end of the issue, but quick specs: 3950x 16 core CPU, 3080 10GB GPU, 64 GB RAM), and got the usual fast results. Then I created 31 processes that load the CPU (naive simple yes > /dev/null &) and reniced them (with different priorities, but for the worst result it was a negative value) and then ran the test suite. This was to emulate the fact that CI can be very loaded (beyond CPU, the hardware probably is heavily loaded in RAM, bandwidth and disk usage beyond my naive test).
test_kernel_shap_standalone: in this tests kernel shap does l1_regularization using Scikit-learn's lars_path, that even if its only one part of the algorithm, it is not only is CPU based, but is heavily multi threaded.
test_explainer_permutation_shap: permutation SHAP is pretty much pure GPU code when explaining cuML models, so this contrasts siginificantly with the above.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
Describe the bug
This issue is not really a bug, but more of a description of the results of a quick analysis of why sometimes some tests that locally work super fast, end up in the list of the slowest pytests by orders of magnitude. It probably does not explain entirely why does this happen since it was a naive fast test. But is a good place to start the analysis, and can serve as a good tool for developers to analyze tests locally and reproduce CI conditions.
The spoiler (TL;DR) of the issue is: pytests that have CPU processing (particularly if it is multi-threaded) get slowed down immensely , up to 2 orders of magnitude, more than pytests that are pure GPU (or close to) bound when a system is loaded by other tasks. CI probably is exhibiting and magnifying this behavior.
This started because locally I was seeing this:
while CI showed this:
which ended up delaying the merge of #3126
Steps/Code to reproduce bug and full results
I built cuML from source, but using nightly conda packages should work just as well. First I ran the test suite on my workstation (details at the end of the issue, but quick specs: 3950x 16 core CPU, 3080 10GB GPU, 64 GB RAM), and got the usual fast results. Then I created
31
processes that load the CPU (naive simpleyes > /dev/null &
) andreniced
them (with different priorities, but for the worst result it was a negative value) and then ran the test suite. This was to emulate the fact that CI can be very loaded (beyond CPU, the hardware probably is heavily loaded in RAM, bandwidth and disk usage beyond my naive test).l1_regularization
using Scikit-learn'slars_path
, that even if its only one part of the algorithm, it is not only is CPU based, but is heavily multi threaded.Results for system without additional load:
Results for system with the
yes
load:Results with no load:
Results with
yes
load:As can be seen, there is some impact on the permutation SHAP (pure GPU based) pytests, but the impact on Kernel SHAP was way higher.
Environment details (please complete the following information):
Additional context
Full environment details:
Click here to see environment details
The text was updated successfully, but these errors were encountered: