-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KNN error message. #4782
Fix KNN error message. #4782
Conversation
Ah, the XGBoost error is still here. Is there anything I can help?
|
@trivialfis I'm actually backing the xgboost package out to |
Not an expert on container, XGBoost starts honoring the thread limit from CFS |
@trivialfis weird thing is that only 1 of the hanging tests is related to FIL or xgboost. The other is FAISS' |
While I'm waiting- is it possible there might somehow be a rogue thread or xgboost process which might not be getting properly cleaned up but which might end up causing a deadlock for cuml pytests downstream? I'm not completely sure how this could happen in the python layer if pytest is properly cleaning up, but I suppose in the C++ layer there could be a process forked off that isn't cleaned up after the tests execute. Both of the tests which are failing are using |
I don't think XGBoost itself can cause such an issue as I'm not entirely sure how to make an OpenMP thread go rogue without causing the process to abort. You mentioned that the hang is reproducible, I can try it locally and attach gdb to the hanging processes if there's document/guidance on how to reproduce. |
@trivialfis unfortunately, I've only been able to reproduce it in CI, however I have not tried limiting the resources by specifying |
Got it. I will try the |
rerun tests |
@gpucibot merge |
Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4782
No description provided.